Date: September 19, 2025
Topic: Backwards Pass for Convolution Layer
Recall
For simplification, assume the output and input shape are the same — input has been padded by 2 pixels at bottom and right.
Notes
Backwards Pass for Convolution Layer
Cross-Correlation

- Simplifications in calculation:
- 1 channel input
- 1 kernel (i.e., 1 channel output)
- Padding (2 pixels on right and bottom) so that output is the same size
Definitions
- Output $y = H \times W$ — due to padding, output and input is same size
- $\frac{\partial L}{\partial y}$: Assume $H \times W$ — add padding & change convention for convenience
- $\frac{\partial L}{\partial y(r,c)}$: For access element derivative
Back-prop Chain Rule

$\text{Upstream Gradient} = \text{Downstream Gradient} \times \text{Local Gradient}$ — assume backward pass flow where downstream is to inputs and upstream from outputs
Since the kernel is passed through the input image to generate the output image, we need to incorporate all upstream gradients by summing over the gradients of the entire output image.
Due to weight sharing, the same kernel element $k[a',b']$ is used at every spatial location, so its gradients must accumulate contributions from all output locations.
This lets us calculate the weight updates.
Gradients wrt. Weights

- This can be done one pixel at a time → ${\partial L}/{\partial k(a,b)}$: e.g., $(a,b)=(0,0)$
- Due to weight-sharing, the loss of the weight affects everything
- Initially, it affects input $x$ at $(0,0)$. After striding (e.g., stride = 1), it affects input $x$ at $(0,1)$ and so on.
Chain Rule over All Output Pixels

- As each kernel value (e.g., top left kernel value) affects all pixels on output, need to incorporate all upstream gradients
- This can be achieved via the chain rule
Calculating $\frac{\partial y(r,c)}{\partial k(a',b')}$ for a Specific Pixel

- This is the term we need to compute — partial derivative of output pixel $(r,c)$ wrt. weight on kernel $(a',b')$
Calculating $\frac{\partial L}{\partial k(a',b')}$

- Hence, to get the total for the kernel at $(a',b')$, then we need to sum up the gradients over the entire output image
<aside>
📌 SUMMARY: Backwards pass is convolution is the forward is a cross-correlation.
</aside>
Date: September 20, 2025
Topic: Simple CNNs