Winning some of the document preprocessing challenges in a text mining process.

Enter multiple e-mails separated by comma.

Author(s): NOGUEIRA, B. M.; MOURA, M. F.; CONRADO, M. da S.; ROSSI, R. G.; MARCACINI, R. M.; REZENDE, S. O.

Summary: Considering the huge growth of the number of documents in the digital universe and the possibility of obtaining some competitive advantage in processing them, this paper describes some of the difficulties of working with text collections. More specifically, it shows some of the challenges on the step considered one of the most important of the Text Mining process - the data preprocessing - focusing on two of its main tasks: attribute generation and selection, considering not only single terms but composed terms too. In order to overcome the challenges imposed by these problems, this paper presents efficient unsupervised solutions. The application of these solutions in three real data sets is presented in order to evaluate them and to show a way to treat the data step by step. Good results were obtained at the end of the whole process.

Publication year: 2008

Types of publication: Paper in annals and proceedings

Unit: Embrapa Digital Agriculture

Keywords: Dados semânticos, Mineração de textos, Text mining

Further information Download publication (PDF)

Observation

Some of Embrapa's publications are published as ePub files. To read them, use or download one of the following free software options to your computer or mobile device. Android: Google Play Books; IOS: iBooks; Windows and Linux: Calibre.

Access other publications

Access the Agricultural Research Database (BDPA) to consult Embrapa's full library collection and records.
Visit Embrapa Bookstore to purchase books and other publications sold by Embrapa.

Portal Embrapa

Language Language

Winning some of the document preprocessing challenges in a text mining process.

Winning some of the document preprocessing challenges in a text mining process.

Observation

Access other publications