TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

Abstract

Author(s): Ms. Nikita P.Katariya; Prof. M. S. Chaudhari

Text mining is the analysis of data contained in natural language text. Text Databases are rapidly growing due to the increasing amount of information available in various electronic forms. User need to access relevant information across multiple documents. In many text mining applications, side-information is available along with the text documents. Side-information may be document origin information, the links in the document, user-access behavior from web logs, or other non-textual attributes which are embedded into the text document. Such attributes may contain a tremendous amount of information for mining purposes. Initial process in Text Mining system is preprocessing. Thus this paper presents different steps involved in text preprocessing.