Denken
Menu
  • About Me
  • Deep Learning with Pytorch
  • Generative AI: Tutorial Series
  • Python Tutorials
  • Contact Me
Menu

Deep Learning with Pytorch-CNN – Getting Started – 2.0

Posted on April 29, 2019June 10, 2019 by Aritra Sen

In Deep Learning , we use Convolutional Neural Networks (ConvNets or CNNs) for Image Recognition or Classification. Computer Vision using ConvNets is one of the most exciting fields in current Deep Learning research. For more detail understanding of ConvNets , I will encourage you to go through this amazing Videos tutorials by Andrew Ng

In this tutorial I will try to give you brief intuition about different ConvNets components. From the next tutorial onward we will build a full functioning ConvNets for Image Classification.


Motivation : Why we need ConvNets instead of MLPs?

Large no of Weights to maintain: One of the main drawbacks of MLPs that we need one perception for each inputs , in case of 224*224* 3 RGB image no of weights needs to maintained is more than 1.5 million.

Spatial Correlation is hard to maintain: In case MLPs we need to first flatten the image and then pass it to the MLPs , in this process we can’t keep intact the spatial correlation (example information between two near by pixels)

ConvNets helps us solve this problem using Parameter sharing / Lesser no of parameters to train. First things first – let’s understand few terminologies related to images below –

An image is an matrix of pixel values as shown below –


Channel is a conventional term used to refer to a certain component of an image. An image from a standard digital camera will have three channels – red, green and blue – you can imagine those as three 2d-matrices stacked over each other (one for each color), each having pixel values in the range 0 to 255.

Grayscale image, on the other hand, has just one channel. We will have a single 2d matrix representing an image. The value of each pixel in the matrix will range from 0 to 255 – zero indicating black and 255 indicating white.

There are mainly four basic building blocks of ConvNets –

1. Convolution Operator:
ConvNets works on the principle that nearby pixels are more strongly related than distant ones. In Convolution operation we run through a filter (which can have different purposes like detecting edges , eyes , wheels) from top left to bottom right. In Convolution operation the math happens is element wise multiplication then summation.
This operation significantly reduces the number of weights that the neural network must learn compared to an MLP, and also means that when the location of these features changes it does not throw the neural network off.
Filters can also be called ‘kernel’ or ‘feature detector’. Network learns the best value of these filters. Below is an example –


Fig 1:Convolution Operation
Fig 2:Running through the Filter across Image

Different values of the filter will detect different features of the images , below are few of the examples –

Fig 3: Types of Filters

Strided Convolutions : Suppose we choose a stride of 2. So, while convoluting through the image, we will take two steps – both in the horizontal and vertical directions separately.

2. Padding:
So now if you look closely at the Fig-2 , you will see that edges are only visited once by the filters , though other pixels are visited multiple times , for some of the images this can lead to leaving out the information present at the edges. Introducing zero padding around the edges ensures that information from the edges are also collected during convolution. Below are details of two types of padding –

Fig 4: Types of Padding in Convolution operation


3. Non Linearity (ReLU):
Convolution operation is a linear operation , to introduce non linearity in the network we introduce ReLU. ReLU removes all the negative from the input as shown below. ReLU also tackles vanishing gradient problems.

Fig 5: ReLU Opeation Example

4.Pooling:
Pooling (also called down sampling) reduces the size of feature map(o/p after convolution operation). By reducing the height and width of the feature map , pooling helps us to reduce over fitting and keeps the dimensions sizes manageable.
There are mainly two types of pooling –
Max Pooling and Average Pooling. Below is an example of average pooling where we run through a 2*2 window and take Max value from that window .In Average pooling instead of taking the max value we take the average value in that windows.

Fig 6: Max Pooling

5.Fully Connected Layer:
Output of the pooling is flatted and passed onto to the Fully Connected layer and then to a softmax layer for classification.

Fig 7: Putting it all together (Source: Andrew Ng Deep Learning Course)

Generalized dimensions can be given as:

  • Input: n X n X nc
  • Filter: f X f X nc
  • Padding: p
  • Stride: s
  • Output: [(n+2p-f)/s+1] X [(n+2p-f)/s+1] X nc’

nc is the number of channels in the input and filter, while nc’ is the number of filters.

Some of the popular CNN architecture:

  1. LeNet-5
  2. AlexNet
  3. VGG
  4. ResNet

Image Credits/Reference materials:
https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
https://towardsdatascience.com/simple-introduction-to-convolutional-neural-networks-cdf8d3077bac

Do like , share and comment if you have any questions.

Category: Machine Learning, Python

Post navigation

← Deep Learning with Pytorch-Speeding up the training – 1.4
Deep Learning with Pytorch -CNN from Scratch with Data Augmentation – 2.1 →

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RSS Feeds

Enter your email address:

Delivered by FeedBurner

Pages

  • About Me
  • Contact Me
  • Deep Learning with Pytorch
  • Generative AI: Tutorial Series
  • Python Tutorials

Tag Cloud

Announcements Anrdoid BERT Bias Celebration Cricket CyanogenMod deep-learning Denken Experience Facebook Features Finetuning GCN GenerativeAI GNN Google HBOOT HBOOT downgrading HTC Wildfire huggingface India Launch Life LLM Lumia 520 MachineLearning mobile My Space nlp Orkut People Python pytorch pytorch-geometric Rooting Sachin Share Social Network tranformers transformers Tutorials Twitter weight-initialization Windows Phone

WP Cumulus Flash tag cloud by Roy Tanck and Luke Morton requires Flash Player 9 or better.

Categories

Random Posts

  • Deep Learning with Pytorch -CNN – Transfer Learning – 2.2
  • Domain Fuss
  • Android Apps-You must have
  • Python Tutorials – 1.5 – Numpy
  • The Alma Mater & Reunion

Recent Comments

  • Generative AI: LLMs: Reduce Hallucinations with Retrieval-Augmented-Generation (RAG) 1.8 – Denken on Generative AI: LLMs: Semantic Search and Conversation Retrieval QA using Vector Store and LangChain 1.7
  • vikas on Domain Fuss
  • Kajal on Deep Learning with Pytorch -Text Generation – LSTMs – 3.3
  • Aritra Sen on Python Tutorials – 1.1 – Variables and Data Types
  • Aakash on Python Tutorials – 1.1 – Variables and Data Types

Visitors Count

AmazingCounters.com

Archives

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Copyright

AritraSen’s site© This site has been protected from copyright by copyscape.Copying from this site is stricktly prohibited. Protected by Copyscape Original Content Validator
© 2025 Denken | Powered by Minimalist Blog WordPress Theme