Date: October 11, 2025

Topic: Soft-max and Preview of Attention

Recall

Soft-max allows us to do random sampling from the input set, with probabilities based on the values of the input set.

Notes

Soft-max

Review of Soft-max

$$

\operatorname{softmax}\!\left(\{x_1,\ldots,x_N\}\right) = \left\{\frac{e^{x_1}}{Z},\,\ldots,\,\frac{e^{x_N}}{Z}\right\}, \quad Z=\sum_{j=1}^{N} e^{x_j}.

$$

Selection using Soft-max

After applying soft-max, we get the middle distribution. The green index has a high probability of getting chosen when random sampling.

After applying soft-max, we get the middle distribution. The green index has a high probability of getting chosen when random sampling.


Soft-max attention turns dot-product similarities between a query and candidate embeddings into probabilities, then forms a differentiable weighted sum of the candidates

Attention with Soft-max

Applying Soft-max to $Uq$

image.png

MLP Soft-max

Soft-max Attention


<aside> 📌 SUMMARY: Soft-max allows us to select, in a differentiable manner, previous layer outputs when fed in as input. It will be more probable to select the “best” inputs since soft-max gives us a probability distribution.

</aside>


Date: October 11, 2025

Topic: Attention

Recall

Soft attention allows us to select the vector with the highest similarity from a set of vectors $U$, corresponding to the query $q$.




<aside> 📌 SUMMARY: Through soft attention, we can query a context (apply $q$ on a set of vectors $U$) to understand which entry in the vector corresponds best to $q$. $q$ is conitnuously updated as the model keeps querying $U$, until we finally have the answer we are looking for.

</aside>


Date: October 30, 2025

Topic: Attention in RNN