Contents

Text Classification

Text classification is the task of assigning a sentence or document an appropriate category. The categories depend on the chosen dataset.

Formal Definition

Input:

  • a document d
  • A set of classes C= c_1, c_2, c_3 …, c_j}

Output: a predicted class \[ c \in C\]

Tasks

Classification methods

Hand coded rules

spam: black list address OR (“dollars” AND “have been selected”)

  • High accuracy
  • building maintaining and scaling these rules is expensive

Supervised Machine Learning

Input:

  • a document d
  • A set of classes C= c_1, c_2, c_3 …, c_j
  • a training set of m hand labelled

Output: a predicted class \[ c \in C\]

Unsupervised Machine Learning

Natural Language Undestanding

Tasks

Text Classification