The application of IR can be found in every technology that records and stores data. The reason is simple. Everything has a database and when the database has a lot of data, IR is used to return query results – from a physical book’s table of contents (index) & glossary (inverted index), to a web search engine’s ranking functions. In this blog we are covering few of the well-known IR use cases, players and how IR relates to data science.
Very generally, an index sits at the heart of the IR implementation. Apart from other changes, the different algorithms to manage the index makes it unique.
Search Engines
The posterior probability of our readers having used a search engine, given that they are on our blog is very high. To give our readers the highest information gain, a list of IR techniques will be presented after watching this very touching story: (Google, 2013)
“park with ancient gate in lahore” – Using Natural Language Processing to understand the intent of the query and augment the IR search process. The very vague request only has 1 named entity “lahore” and the search engine has to return associated entities that satisfy a condition – a list of parks with an ancient gate. It is worth noting that since the release of this video, “Mochi Gate” is now the number 1 hit result for this query.
“what is jharjariya” – Semantic search query processing was used to understand the intent of the query and provide the girl in the video a definition of the local sweets. The contextual meaning of the query is to find a definition for the term jharjariya, so google returns a definition from wikipedia.
“oldest sweet shop near mochi gate lahore” – Temporal information retrieval helps google understand the age of its stored entities. The semantic search query processing technique is used to first decompose the query into entities and a temporal requirement (oldest). The proper search query is then sent along to be processed. As with Mochi Gate, Fazal’s Sweets is now the number 1 hit result for this query.
<Update>
Here is an explanation of how Google Search worked before 2011. (Google, 2010)
</Update>
Music
Moving beyond search queries, an interesting implementation of auditory search is the music recognition service Shazam. The idea is simple, to create a database of music and be able to search through it by using a snippet of a song. Shazam uses its own indexed developed to find spectrogram peaks. The idea was analogous to MACS: Music Audio Characteristic sequence indexing for similarity retrieval (Yang, 2001).
“Good” vs. “bad” matching (Yang, 2001).
Read the original paper that Shazam published here. (Wang, 2003)
Medical
The most important field for IR is medical. Medical posed a bigger challenge for IR than the routine text based retrieval. It demanded an infrastructure which could retrieve information based on combination of images and text. One of the most recent examples is the “Nova MedSearch: A multimodal search engine for medical case-based retrieval” (Mourão, NovaMedSearch: a multimodal search engine for medical case-based retrieval, 2013) which retrieved similar images or medical cases to the queried medical condition.
Visit the website here (NovaMedSearch) to see the medical search engine in action.
Conclusion
The evolution of IR certainly contributes largely to the progress of modern times. Information delivered when required in a timely manner and in the format it is required in. Augmented reality wearable like Google glass (Google Glass, 2013) retrieves information on the fly for the wearer. Only time will tell about what’s next to be conquered by IR.
Literature
(Google, 2013) – https://www.youtube.com/watch?v=gHGDN9-oFJE
(Yang, 2001) – http://infolab.stanford.edu/~yangc/pub/cy-waspaa01.pdf
(Wang, 2003) – http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
(Mourão, 2013) – http://imageclef.org/2013/medical/workshopprogram
(Google Glass, 2013) – http://www.google.com/glass/start/
(Mourão, NovaMedSearch: a multimodal search engine for medical case-based retrieval, 2013) – http://dl.acm.org/citation.cfm?id=2491798&preflayout=tabs
(Google, 2010) – http://www.youtube.com/watch?v=BNHR6IQJGZs
Posted by paarthos | Filed under 08_Information retrieval