In
machine learning and
statistics,
feature selection, also known as
variable selection,
attribute selection or
variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Feature selection techniques are used for three reasons:
- simplification of models to make them easier to interpret by researchers/users,
- shorter training times,
- enhanced generalization by reducing overfitting(formally, reduction of variance)
The central premise when using a feature selection technique is that the data contains many features that are either
redundant or
irrelevant, and can thus be removed without incurring much loss of information.
Redundant or
irrelevant features are two distinct notions, since one relevant feature may be redundant in the presence of another relevant feature with which it is strongly correlated.