Date: September 19, 2025

Topic: Backwards Pass for Convolution Layer

Recall

For simplification, assume the output and input shape are the same — input has been padded by 2 pixels at bottom and right.

Notes

Backwards Pass for Convolution Layer

Cross-Correlation

image.png

Definitions

Back-prop Chain Rule

image.png

$\text{Upstream Gradient} = \text{Downstream Gradient} \times \text{Local Gradient}$ — assume backward pass flow where downstream is to inputs and upstream from outputs


Since the kernel is passed through the input image to generate the output image, we need to incorporate all upstream gradients by summing over the gradients of the entire output image.

Due to weight sharing, the same kernel element $k[a',b']$ is used at every spatial location, so its gradients must accumulate contributions from all output locations.

This lets us calculate the weight updates.

Gradients wrt. Weights

image.png

Chain Rule over All Output Pixels

image.png

Calculating $\frac{\partial y(r,c)}{\partial k(a',b')}$ for a Specific Pixel

image.png

Calculating $\frac{\partial L}{\partial k(a',b')}$

image.png





<aside> 📌 SUMMARY: Backwards pass is convolution is the forward is a cross-correlation.

</aside>


Date: September 20, 2025

Topic: Simple CNNs