Date: September 12, 2025
Topic: Neural Network Overview
Recall
Notes
Recap of Neural Networks
- Building and optimizing deep feedforward architectures (containing linear layers like fully connected layers & non-linear layers like ReLU) can be generalized to arbitrary computation graphs
- Back-propagation and automatic differentiation can be used to optimize all parameters with gradient descent
In images, nodes should look at small patches of inputs instead since image features tend to be localized. By looking at a window around the image, these features can be picked up.
Nodes with Local Receptive Fields

- Layers do not need to be fully connected
- In images, makes sense for output nodes to consider only small patches of inputs
- Features in images tend to be localized (e.g., extract edges/color/texture)
- If the node looks at a window around the image, it can pick out those features
The convolution operation uses kernels which act as feature extractors when these kernels are convolved with the image.
By using multiple kernels, we can have feature maps, where each kernel extracted a different feature (e.g., edge, color, texture, etc)
Convolution Operation

Extracting edges using an edge extractor kernel
- Looking at small patches can be done through convolution, which is another layer in the network
- We have an image and a kernel, where the kernel $K$ is a feature extractor
- For the $K$ above, we have dark values on one side and light values on the other, acting as an edge extractor/detector
- When the kernel is convolved with different image patches across the image, strong edges result in a high value and vice versa
- Output of convolution is an output map (right) ****which is a spatially organized set of values
- High values represent image patches that were very strongly indicative of the feature
Feature Extraction Across Multiple Features

- Convolution layer can take any input 3D tensor (e.g., RGB image with width and height) and output another similarly-shaped output
- Feature extraction is done across multiple feature (have multiple kernels)
- Depth of the output $=$ number of kernels, width and height is slightly different than the input image
- Some kernels may extract edges while others extract colors, texture, etc — different kernels extract different features
- Hence the convolution layer performs some linear transformation of the input to produce an output of similar shape
Other layers that introduce non-linearities or pooling can help reduce data dimensionality.
<aside>
📌 SUMMARY: By introducing kernels, we get convolutional layers that act as feature extractors. The extracted features are then inputted into pooling layers for dimensionality reduction, and by continuously alternating them, we get small enough tensors suitable for fully connected layers to classify.
</aside>
Date: September 12, 2025
Topic: Convolution Layer