In the forward pass, we combine variables after they go through functions (like addition or multiplication) for easier tracking
When computing derivatives for the earlier nodes wrt. the final function (downstream gradient), we need the local gradient which is the derivative of the next node on the current node, and the upstream gradient, which is the derivative of the final function on the next node.


For $x$ and $y$, we need to apply chain rule as $q$ is the intermediate
They consist of the following
Where Downstream Gradient = Upstream Gradient $\times$ Local Gradient

In single-input case, the local gradient is calculated wrt. one input var. (e.g., $x$) and the function only acts on one var.
In the multi-input case, there are as many local gradients as num. of inputs, with each one being a different var. (e.g., $x$ and $y$)
<aside> 📌 SUMMARY: The calculation of Jacobians has certain tricks due to the sparse nature, allowing for more efficient computation of the gradients.
</aside>