Date: October 12, 2025

Topic: Neural Information Retrieval

Recall

With ML, we can cast IR tasks as classification or regression problems, where we take the features and use them to predict a label.

Notes

Neural Information Retrieval

Use machine learning for information retrieval as we now have more information
More features: Recency, domain, PageRank, etc
More data: Click logs, large supervised datasets, etc
Powerful algorithms: GBDTs, neural nets, etc

Information Retrieval Tasks

For each $(q,d)$ pair, consider a function where $f(q,d) \rightarrow s$, where $s$ is a relevance label
$s \in \{0,1\}$: Binary relevance — binary classification
$s \in \{0,1,...,k\}$: Multi-level relevance — multi-class classification
$s\in R$: Relevance score — regression

Usually use fast heuristics (classical retrieval) then run slower ML algorithms on the top-$k$ candidates.

Re-ranking Pipeline

General ML formulation requires running inference for every $(q,d)$ pair
Shortlist candidates first
- For each $q$, first do fast (classical retrieval), then run ML model on the top-$k$ candidates (typically $10 < k < 10000$)
Can also have multiple re-ranker stages with increasing complexity

To convert to rankings, we need to model relevance between documents (is document A more relevant than document B?)

From Classification to Ranking

Classification only defines an absolute notion of relevance
Ranking: Consider not only whether document is “relevant” but more or less relevant than computing documents
Classification objective is good for discriminating “random” irrelevant docs, but not superficially relevant ones
Have to use native ranking methods
- Use pair-ranking objectives instead of point-object ranking
- Use hard negative examples, where docs are superficially relevant but don’t really answer query
- By using these difficult examples, the model can learn a more fine notion of ranking

Converting to Ranking

To train a model on ranking, we compare between 2 documents in a dataset
- Replace with the marginal difference $s_i-s_j$ instead
This is more effective than directly assigning a ranking to a document

The gradient scaling factor only depends on binary orders of $d_i,d_j$

RankNet

$\lambda_{ij} = (\sigma(s_i-s_j)-y_i)$ is the gradient scaling factor
For a given $(q,d)$ pair, summing all $\lambda_{ij}$ with other labeled docs for $q$, we get $\lambda_i$
- This determines how much the score should move for this query-document pair
$\lambda_{ij}$ only depends on binary ordering of $d_i,d_j$
- Does not depend on any rank difference or absolute rank

<aside> 📌 SUMMARY: To adapt neural networks for ranking, make ranking a classification problem. For early networks like RankNet, we only compare 2 documents to determine a binary ordering.

</aside>

Date: October 12, 2025

Topic: Neural Information Retrieval

Recall

Notes

Neural Information Retrieval

Information Retrieval Tasks

Re-ranking Pipeline

From Classification to Ranking

Converting to Ranking

RankNet

Date: October 12, 2025

Topic: Embedding Based Retrieval