What is Convolution in Computer Vision

• Last Updated : 28 Feb, 2022

In this article, we are going to see what is Convolution in Computer Vision.

The Convolution Procedure

We will see the basic example to understand the procedure of convolution

Snake1: Bro this is an apple (FUSS FUSS)

Snake2: Okay but can you give me any proof? (FUSS FUSS FUSS)

Snake1: What do you mean, mamma snake told us (FUSS FUSS)

Snake2: ������������

These two poor snakes fought with each other, but have you ever considered we humans often do this? Think once?

Let’s compare human vision with computer vision for a second & think about it peacefully.

Compare human vision and computer vision

Did you remember in our childhood we have been taught while pointing out the fruits & at the same time visualizing something like ‘A’ for apple, ’B’ for the ball so on….if you see this our eyes get convolved with different objects with visualization & storing that information that how apple looks like in our brain known to be as a convolution in simple words.

Same as if you don’t see the object which means you are not convolving through to identify or classify the surroundings.

Did you sometimes think that as human beings we don’t give that much importance to the colors of an object at some point instead of focusing on the shape, size & texture of the object to understand how it looks like that is why if you see the apple of green color, red color,yellow-reddish color you can still understand that it’s an apple because you can differentiate between shape, size, the texture of different objects like apple, mango, etc.

What if we focus only on colors which could have fed wrong information like apples are always red, etc….are you getting it!!!! but it doesn’t matter that color doesn’t have importance at all but we focus on shape, size & texture instead at first to understand much better because the same object could be of different colors which may confuse our brain at some point.

If you are looking to buy a refined oil of fortune company then how do you behave: we already have the information stored in our brain about what package of fortune looks like, will you pick every refined oil package, then read the name after getting to know oh man it’s not fortune one(we as a human being doesn’t behave like this) instead here our eyes are focused on extracting the item looking for(fortune refined oil), we already have the stored information in our brain, so: in simple, we jump instantly from one rack to another to find the specific product looking for & in simple you are convolving through different items based on the specific oil with the help of already stored information & extracting the right one information instantly hence know to be as convolution(one of the important aspect of human life).

Now let us take this to the next level,

Consider the above image, you can see an elephant, a dog, a cat, and a donkey without any issue, consider it like normal human vision, but what if I tell you there are 15+ animals in this image, confused right?

Well now to figure it out, instead of solving this riddle, try to understand the process that your brain is following right now, let’s analyze it.

On the first hand we are trying to see the edges of these animals in case they are building any other animal or not, for example, if you analyse the trunk(nose) of elephants edges are building a fish, but did you notice it on the first hand? No right

Now analyze more you will see all these animals in the below image

Well, till now we were trying to connect the dots between human vision and computer vision, but put computer vision aside, and think about human vision, didn’t we apply a convolution layer first which was extracting edges of animals in order to build the shape same as we do in a convolution neural network. Assume you are entering a room which you have never seen before, will you directly jump on the place you want to go or you will analyze the things first, the small objects available in the room & then reach the place you want to go.

Another example can be when you cross a busy road full of traffic, don’t you first notice each vehicle.

Well since convolution neural networks learn from data, they follow the same step, first, they try to analyze the smallest component, the edges, and understand the basic build of objects, as we know starting convolution layers extract the edges and gradients first, later the patterns, parts of objects and finally the full object comes in the picture. Since now we have a clear understanding of the procedure of vision in our hands, let us talk about something interesting.

“Convolution operation importance is same as the water for a human being”

In childhood when a child is just a few months old, they never visualize the objects like how we do. A small child can only see a few edges and patterns of surroundings, this is one of the reasons why kids cry a lot when they are babies. They see blurry shapes of different objects & even science says they only have eyesight in the range of 20/200 to 20/400. Do you think about what is going on there, well they are convolving things, you can consider that early age as starting convolution layers.

My Personal Notes arrow_drop_up
Related Articles