In
machine learning and
statistics,
classification is the problem of identifying to which of a set of
categories (sub-populations) a new
observation belongs, on the basis of a
training set of data containing observations (or instances) whose category membership is known. An example would be assigning a given email into
"spam" or "non-spam" classes or assigning a diagnosis to a given patient as described by observed characteristics of the patient (gender, blood pressure, presence or absence of certain symptoms, etc.).