Online Course Support | Text Retrieval and Search Engines

Let w1, w2, and w3 represent three words in the dictionary of an inverted index. Suppose we have the following document frequency distribution:

1. Question 1 Let w1, w2, and w3 represent three words in the dictionary of an inverted index. Suppose we have the following document frequency distribution: Word Document Frequency w1…

Online Course Support | Text Retrieval and Search Engines

Consider the following retrieval formula: Where c(w, D) is the count of word w in document D, dl is the document length, avdl is the average document length of the collection, N is the total number of documents in the collection,

9. Question 9 Consider the following retrieval formula: Where c(w, D) is the count of word w in document D, dl is the document length, avdl is the average document…

Online Course Support | Text Retrieval and Search Engines

Assume the same scenario as in Question 7, but with TF-IDF weighting. Which of the following words do you expect to have the highest weight in this case?

8. Question 8 Assume the same scenario as in Question 7, but with TF-IDF weighting. Which of the following words do you expect to have the highest weight in this…

Online Course Support | Text Retrieval and Search Engines

Consider the instantiation of the vector space model where documents and queries are represented as bit vectors. Assume we have the following query and two documents:

3. Question 3 Consider the instantiation of the vector space model where documents and queries are represented as bit vectors. Assume we have the following query and two documents: Q…

Online Course Support | Text Retrieval and Search Engines

Suppose we compute the term vector for a baseball sports news article in a collection of general news articles using TF weighting only. Which of the following do you expect to have the highest weight?

7. Question 7 Suppose we compute the term vector for a baseball sports news article in a collection of general news articles using TF weighting only. Which of the following…

Online Course Support | Text Retrieval and Search Engines

In the “simplest” VSM instantiation, if instead of using 0-1 bit vectors but we use the word count instead, when we concatenate each document by itself, will the ranking list still remain the same?

5. Question 5 In the “simplest” VSM instantiation, if instead of using 0-1 bit vectors but we use the word count instead, when we concatenate each document by itself, will…

Online Course Support | Text Retrieval and Search Engines

Consider the same scenario as in Question 3, with dot product as the similarity measure. Which of the following is true?

4. Question 4 Consider the same scenario as in Question 3, with dot product as the similarity measure. Which of the following is true? 1 point   Sim(Q,D1) = 4 …