Date: October 11, 2025

Topic: Classical Information Retrieval

Recall

Information retrieval is the task of obtaining relevant documents for an information need from text sources.

We present a query to an engine to retrieve the documents.

Notes

Classical Information Retrieval

Retrieval Paradigms

Task

image.png


A document is relevant if it satisfies the user’s original info need. However this can be ambiguous as we usually don’t have the original need and just the query.

Precision is the fraction of relevant documents over all retrieved documents. Recall is the fraction of relevant documents in the retrieved set over all relevant documents in the entire dataset.

Boolean Retrieval

Precision and Recall

Result set contains 5 documents (read from left)

Result set contains 5 documents (read from left)


We retrieve texts using a term-document incidence matrix, where the columns indicate the documents and the rows indicate the important terms in each document.

Then we can just apply bitwise operations when doing a document search.

Implementing Text Retrieval

Term-Document Incidence Matrix

image.png

Problems





<aside> 📌 SUMMARY: Boolean retrieval uses text retrieval methods like the Term-Document Incidence Matrix to perform retrieval by checking if a document has the required words or not This is implemented by using Inverted Index, where each document becomes a bag of words with sorted indices For phrase queries like “Red Hot Chili Peppers”, we can use Positional Index to store the index of each position that a term appears in the document, allowing for phrase queries and proximity searches

</aside>


Date: October 12, 2025

Topic: Ranked Retrieval