# Text Classification

Contents

Text classification is the task of assigning a sentence or document an appropriate category. The categories depend on the chosen dataset.

## Formal Definition

Input:

• a document d
• A set of classes C= c_1, c_2, c_3 …, c_j}

Output: a predicted class $c \in C$

## Classification methods

### Hand coded rules

spam: black list address OR (“dollars” AND “have been selected”)

• High accuracy
• building maintaining and scaling these rules is expensive

### Supervised Machine Learning

Input:

• a document d
• A set of classes C= c_1, c_2, c_3 …, c_j
• a training set of m hand labelled

Output: a predicted class $c \in C$