Text mining, also referred to as
text data mining, roughly equivalent to
text analytics, refers to the process of deriving high-quality
information from
text. High-quality information is typically derived through the devising of patterns and trends through means such as
statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a
database), deriving patterns within the
structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of
relevance,
novelty, and interestingness. Typical text mining tasks include
text categorization,
text clustering,
concept/entity extraction, production of granular taxonomies,
sentiment analysis,
document summarization, and entity relation modeling (
i.e., learning relations between
named entities).