Date: August 31, 2025
Topic: Computation Graph
Recall
Computation graphs allow us to make use of hidden vars. These vars represent the intermediate edges in the graph.
When calculating gradients in the graph for the chain rule, we use local gradients which exist at each graph node.
These are the derivatives of the output edge wrt. the input edge.
Notes
Computation Graph
Models functions into its intermediate steps

- Can be broken down into composite steps making up the function
- As we move right in the graph, we are building more complexity β until the end where we have the equation
- We can use it to model gradients as they flow through the graph (either forwards or backwards)
Making Use of Intermediate Variables

- Computation graphs allow us to make use of hidden vars.
- Intermediate variables represent the intermediate edges in the graph (we donβt need labels for input and output edges)
- When we calculate gradients in the graph for chain rule, we generally look at local gradients
- Local gradients exist at each node in the graph
- These gradients are usually the derivative of the output edge wrt. the input edge
- e.g., in the $\times$ node, the local gradient is $\frac{\partial f}{\partial a}$ and in the $+$ node, the local gradient is $\frac{\partial a}{\partial x}$
- These gradients can be combined to get the overall chain rule product
In forward-mode, the gradients are passed forward through the network.
In reverse-mode, the gradients start at the output instead and flow back to the input.
Automatic Differentiation (Autodiff)
Forward-mode Autodiff

- Gradients passed forward as we go through network
- Usually if small num. of inputs and large num. of outputs
Reverse-mode Autodiff

- Gradients start from output and flow back to the input
- We use this mode for DNN tasks as the number of inputs are much greater than the number of outputs.
In Forward Mode, we continuously apply the chain rule on the Next Forward Gradient
This is useful for small inputs β large outputs, as the Jacobians we compute from the small inputs is easier to deal with. The Jacobian can be simultaneously solved.
<aside>
π SUMMARY: Auto-differentiation uses DAGs, where we perform pairwise multiplication on primitives of differentials to calculate gradients.
</aside>