Types of queries and indexing
Types of queries handled by search engine:
a>Phrase queries
b>Wild card queries
c>Boolean queries
a>Phrase queries: Queries of the form “IIM Ahmedabad” where the word order needs to maintained.
Here we can use two kind of approaches:
1>Biword Index
2>Positional Index
1>Biword Index :
>Index every consecutive pair of terms in the text as a phrase.
>For example “IIM Ahmedabad rocks”
>IIM Ahmedabad
>Ahmedabad rocks
>Disadvantage: False positives for longer queries
2>Positional Index :
>In the postings, store for each term the position(s) in which tokens of it appear:
<term, number of docs containing term;
document1: position1, position2 … ;
document2: position1, position2 … ;
etc.>
b>Wild card queries :
>List and define wildcard queries
>Identify appropriate indices for answering wildcard queries
>Trailing wildcard query
Ex: a*
>Leading wildcard query
Ex *a
c>Boolean queries:
These type of queries are usually binary type of queries.These are easily identifyable and search techniques are readily available in NLP libraries