Contents

Information Retrieval

Finding materials (usually documents) of an unstructured nature (usually text) that satisfy an information need from within a large collection (usually stored on computers)

Introduction

Terminology

  • Information Need - is the topic about which the user desires to know more about.
  • Query - what the user conveys to the computer in an attempt to communicate the information need
  • Document - the basic unit of data in Information retrieval is document
  • Collection - A set of documents
  • Information Overload

Applications of IR

  • Web Search
  • Site Specific Search
  • Product Search
  • Grouping related documents
  • Mining the web for knowledge
  • Learning how to read
  • Answering everyday questions

Subfields

  • Text Mining

Classic Search model

Core Concepts

Query Representation

  • Lexical gap
  • Semantic gap

Document RepresentationA

  • Lexical and semantic gap
  • Data structures

Retrieval Model

  • Algorithms that find most relevant documents

Demands from information retrieval systems

  • Demand of understanding
  • Demand of efficiency
  • Demand of accuracy
  • Demand of convenience
  • Demand of diversity

Challenges of IR

  • Text documents are generally free-form
  • Searching multimedia content
  • Running a query is hard
  • Running at web scale means massive distributed systems, sub linear algorithms and careful use of heiristics

Models