Date: September 4, 2025

Topic: Back-propagation

Recall

In the forward pass, we combine variables after they go through functions (like addition or multiplication) for easier tracking

When computing derivatives for the earlier nodes wrt. the final function (downstream gradient), we need the local gradient which is the derivative of the next node on the current node, and the upstream gradient, which is the derivative of the final function on the next node.

Notes

Back-propagation Example

Forward Pass

When computing the forward pass, we have 2 “composite” functions
- $q=x+y$
- $f=qz$

Backward Pass

Using the functions above, we can now define the partial derivatives for the nodes.
On $f$: $\frac{\partial f}{\partial f} = 1$
On $q$: $\frac{\partial f}{\partial q} = \frac{\partial(qz)}{\partial(q)} = z$
On $z$: $\frac{\partial f}{\partial z} = \frac{\partial(qz)}{\partial(z)} = q$

Applying Chain Rule

For $x$ and $y$, we need to apply chain rule as $q$ is the intermediate
They consist of the following
- Downstream Gradient: Value of the derivative we’re computing at this step (e.g., $f$ on $y$, $\frac{\partial f}{\partial y}$)
- Local Gradient: Local effect of how much the value of the current node affects the next intermediate output (e.g., $y$ on $q$, $\frac{\partial q}{\partial y}$)
- Upstream Gradient: Tells us how much the output this portion affects the final output at the very end (e.g., $f$ on $q$, $\frac{\partial f}{\partial q}$)
Where Downstream Gradient = Upstream Gradient $\times$ Local Gradient

In single-input case, the local gradient is calculated wrt. one input var. (e.g., $x$) and the function only acts on one var.

In the multi-input case, there are as many local gradients as num. of inputs, with each one being a different var. (e.g., $x$ and $y$)

<aside> 📌 SUMMARY: The calculation of Jacobians has certain tricks due to the sparse nature, allowing for more efficient computation of the gradients.

</aside>