Recap of Neural Networks

Building and optimizing deep feedforward architectures (containing linear layers like fully connected layers & non-linear layers like ReLU) can be generalized to arbitrary computation graphs
Back-propagation and automatic differentiation can be used to optimize all parameters with gradient descent

In images, nodes should look at small patches of inputs instead since image features tend to be localized. By looking at a window around the image, these features can be picked up.

Nodes with Local Receptive Fields

Layers do not need to be fully connected
In images, makes sense for output nodes to consider only small patches of inputs
- Features in images tend to be localized (e.g., extract edges/color/texture)
- If the node looks at a window around the image, it can pick out those features

The convolution operation uses kernels which act as feature extractors when these kernels are convolved with the image.

By using multiple kernels, we can have feature maps, where each kernel extracted a different feature (e.g., edge, color, texture, etc)

Convolution Operation

Extracting edges using an edge extractor kernel

Looking at small patches can be done through convolution, which is another layer in the network
We have an image and a kernel, where the kernel $K$ is a feature extractor
- For the $K$ above, we have dark values on one side and light values on the other, acting as an edge extractor/detector
When the kernel is convolved with different image patches across the image, strong edges result in a high value and vice versa
Output of convolution is an output map (right) ****which is a spatially organized set of values
- High values represent image patches that were very strongly indicative of the feature

Feature Extraction Across Multiple Features

Convolution layer can take any input 3D tensor (e.g., RGB image with width and height) and output another similarly-shaped output
Feature extraction is done across multiple feature (have multiple kernels)
- Depth of the output $=$ number of kernels, width and height is slightly different than the input image
- Some kernels may extract edges while others extract colors, texture, etc — different kernels extract different features
Hence the convolution layer performs some linear transformation of the input to produce an output of similar shape

Other layers that introduce non-linearities or pooling can help reduce data dimensionality.

<aside> 📌 SUMMARY: By introducing kernels, we get convolutional layers that act as feature extractors. The extracted features are then inputted into pooling layers for dimensionality reduction, and by continuously alternating them, we get small enough tensors suitable for fully connected layers to classify.

</aside>

Date: September 12, 2025

Topic: Neural Network Overview

Recall

Notes

Recap of Neural Networks

Nodes with Local Receptive Fields

Convolution Operation

Feature Extraction Across Multiple Features

Date: September 12, 2025

Topic: Convolution Layer