## What is the advantage of tokenization (normalize and stemming) before index?

5. Question 5 What is the advantage of tokenization (normalize and stemming) before index? 1 point   Improves performance by mapping words with similar meanings into the same indexing term…

## Let w1, w2, and w3 represent three words in the dictionary of an inverted index. Suppose we have the following document frequency distribution:

1. Question 1 Let w1, w2, and w3 represent three words in the dictionary of an inverted index. Suppose we have the following document frequency distribution: Word Document Frequency w1…

## Which of the following is false?

2. Question 2 Which of the following is false? 1 point     Search engines rely on the text push mode.     Recommender systems are based on the text…

## Consider the following retrieval formula: Where c(w, D) is the count of word w in document D, dl is the document length, avdl is the average document length of the collection, N is the total number of documents in the collection,

9. Question 9 Consider the following retrieval formula: Where c(w, D) is the count of word w in document D, dl is the document length, avdl is the average document…

## Assume the same scenario as in Question 7, but with TF-IDF weighting. Which of the following words do you expect to have the highest weight in this case?

8. Question 8 Assume the same scenario as in Question 7, but with TF-IDF weighting. Which of the following words do you expect to have the highest weight in this…

## Consider the instantiation of the vector space model where documents and queries are represented as bit vectors. Assume we have the following query and two documents:

3. Question 3 Consider the instantiation of the vector space model where documents and queries are represented as bit vectors. Assume we have the following query and two documents: Q…

## In VSM model, which of the following will be a better way to measure similarity/distance?

10. Question 10 In VSM model, which of the following will be a better way to measure similarity/distance? 1 point   Cosine similarity: cos( v_1, v_2 )cos(v1​,v2​)   L2 distance:…

## Suppose we compute the term vector for a baseball sports news article in a collection of general news articles using TF weighting only. Which of the following do you expect to have the highest weight?

7. Question 7 Suppose we compute the term vector for a baseball sports news article in a collection of general news articles using TF weighting only. Which of the following…

## In the “simplest” VSM instantiation, if instead of using 0-1 bit vectors but we use the word count instead, when we concatenate each document by itself, will the ranking list still remain the same?

5. Question 5 In the “simplest” VSM instantiation, if instead of using 0-1 bit vectors but we use the word count instead, when we concatenate each document by itself, will…

## Consider the same scenario as in Question 3, with dot product as the similarity measure. Which of the following is true?

4. Question 4 Consider the same scenario as in Question 3, with dot product as the similarity measure. Which of the following is true? 1 point   Sim(Q,D1) = 4 …