## Consider the following retrieval formula: Where c(w, D) is the count of word w in document D, dl is the document length, avdl is the average document length of the collection, N is the total number of documents in the collection,

## Suppose we compute the term vector for a baseball sports news article in a collection of general news articles using TF-IDF weighting. Which of the following words do you expect to have the highest weight in this case?

## Which of the following integer compression has equal-length coding?

## When using an inverted index for scoring documents for queries, a shorter query always uses fewer score accumulators than a longer query.

## The gamma code for the term frequency of a certain document is 1110010. What is the term frequency of the document?

## Assume we have the same scenario as in Question 1. If we enter the query Q= “w1 w2” then the minimum possible number of accumulators needed to score all the matching documents is:

## Which of the following are weighing heuristics for the vector space model?

## In BM25, the TF after transformation has upper bound

## If Zipf’s law does not hold, will an inverted index be much faster or slower?

## What can’t an inverted index alone do for fast search?

