Date: October 12, 2025
Topic: Neural Information Retrieval
Recall
With ML, we can cast IR tasks as classification or regression problems, where we take the features and use them to predict a label.
Notes
Neural Information Retrieval
- Use machine learning for information retrieval as we now have more information
- More features: Recency, domain, PageRank, etc
- More data: Click logs, large supervised datasets, etc
- Powerful algorithms: GBDTs, neural nets, etc
Information Retrieval Tasks
- For each $(q,d)$ pair, consider a function where $f(q,d) \rightarrow s$, where $s$ is a relevance label
- $s \in \{0,1\}$: Binary relevance — binary classification
- $s \in \{0,1,...,k\}$: Multi-level relevance — multi-class classification
- $s\in R$: Relevance score — regression
Usually use fast heuristics (classical retrieval) then run slower ML algorithms on the top-$k$ candidates.
Re-ranking Pipeline

- General ML formulation requires running inference for every $(q,d)$ pair
- Shortlist candidates first
- For each $q$, first do fast (classical retrieval), then run ML model on the top-$k$ candidates (typically $10 < k < 10000$)
- Can also have multiple re-ranker stages with increasing complexity
To convert to rankings, we need to model relevance between documents (is document A more relevant than document B?)
From Classification to Ranking
- Classification only defines an absolute notion of relevance
- Ranking: Consider not only whether document is “relevant” but more or less relevant than computing documents
- Classification objective is good for discriminating “random” irrelevant docs, but not superficially relevant ones
- Have to use native ranking methods
- Use pair-ranking objectives instead of point-object ranking
- Use hard negative examples, where docs are superficially relevant but don’t really answer query
- By using these difficult examples, the model can learn a more fine notion of ranking
Converting to Ranking

- To train a model on ranking, we compare between 2 documents in a dataset
- Replace with the marginal difference $s_i-s_j$ instead
- This is more effective than directly assigning a ranking to a document
The gradient scaling factor only depends on binary orders of $d_i,d_j$
RankNet
- $\lambda_{ij} = (\sigma(s_i-s_j)-y_i)$ is the gradient scaling factor
- For a given $(q,d)$ pair, summing all $\lambda_{ij}$ with other labeled docs for $q$, we get $\lambda_i$
- This determines how much the score should move for this query-document pair
- $\lambda_{ij}$ only depends on binary ordering of $d_i,d_j$
- Does not depend on any rank difference or absolute rank
<aside>
📌 SUMMARY:
To adapt neural networks for ranking, make ranking a classification problem.
For early networks like RankNet, we only compare 2 documents to determine a binary ordering.
</aside>
Date: October 12, 2025
Topic: Embedding Based Retrieval