Beginning in mid-February 2008, the 1997-2007 online version of the Science Watch® newsletter, ESI-Topics.com, and in-cites.com, will all be featured together on the redesigned ScienceWatch.com. All previous content from the three sites will be permanently archived, and remain accessible from any existing bookmarks to the archived pages. No new content will be added to this site. Updates and new content (updated biweekly) are available at ScienceWatch.com now.

Fast Breaking Comments

By Fabrizio Sebastiani

ESI Special Topics, February 2004
Citing URL - http://www.esi-topics.com/fbp/2004/february04- FabrizioSebastiani.html

Fabrizio Sebastiani answers a few questions about this month's fast breaking paper in the field of Computer Science.


From •>>February 2004

Field: Computer Science
Article Title: Machine learning in automated text categorization
Authors: Sebastiani, F
Journal: ACM COMPUT SURV
Volume: 34
Page: 1-47
Year: MAR 2002
* CNR, Ist Elaboraz Informaz, Via G Moruzzi 1, I-56124 Pisa, Italy.
* CNR, Ist Elaboraz Informaz, I-56124 Pisa, Italy.

ST:  Why do you think your paper is highly cited?


The paper reviews the field of automated text classification. This 
field has to do with building software systems that can automatically 
build text classifiers.

I think there are several reasons:

  1. It is a review paper and review papers tend to be highly cited.
  2. It tackles a topic, namely the machine learning approach to automated text categorization, which has been steadily growing in importance since the mid-1990s, due to the increased availability of textual documents in digital form and the ensuing need to organize them and manage them with the smallest possible level of manual intervention. The machine learning approach to automated text categorization has by now definitely superseded the knowledge engineering approach.
  3. It is tutorial in nature. I put a lot of effort not only into reviewing the literature thoroughly, but also into explaining the subject matter as clearly as possible, with a special eye to newcomers to the field. The fact that there is as yet no textbook devoted to this topic increases, I think, the tutorial value of the paper.

ST:  Does it describe a new discovery or a new methodology that's useful to others?

It describes an entire class of techniques—the class of supervised learning techniques—that are useful in building text classifiers. In turn, text classifiers are useful for several types of applications in content-based management of textual information, such as indexing scientific articles for later use within textual information retrieval systems, spam filtering, selective dissemination of information, classification of documents by genre, automated authorship attribution, on-the-fly classification of Web search results, and more.

ST:  Could you summarize the significance of your paper in layman's terms?

The paper reviews the field of automated text classification. This field has to do with building software systems that can automatically build text classifiers. A text classifier is itself a software system, that is able to decide to which classes (among those in a predefined set) a given text document belongs to; for instance, deciding whether a given newspaper article fits under HOME_NEWS, or SPORTS, or LIFESTYLES; deciding under which "classified ads" section an ad fits; deciding whether a given e-mail message fits under SPAM or LEGITIMATE; or deciding whether a movie review fits under THUMBS_UP or THUMBS_DOWN.

ST:  How did you become involved in this research?

I was already involved in information retrieval, a related area concerning the content-based retrieval of textual documents. My interest in using machine learning techniques for managing textual documents derived from the recognition that the key problem of managing text is that the semantics of a text is a subjective notion, and that the only way to manage this subjectivity is to learn it from user data.End

Fabrizio Sebastiani
Head, Text Classification Group
Istituto di Scienza e Tecnologie dell'Informazione
Consiglio Nazionale delle Ricerche
Pisa, Italy

ESI Special Topics, February 2004
Citing URL - http://www.esi-topics.com/fbp/2004/february04- FabrizioSebastiani.html

•> Search Special Topics
Fast Breaking Papers Menu || All Topics Menu
Fast Breaking Papers Comments Menu
Help || About || Contact

ScienceWatch.com - Tracking Trends and Perfomance in Basic Research
Go to the new ScienceWatch.com

Write to the Webmaster with questions/comments. Terms of Usage.
The Research Services Group of Thomson Scientific |
(c) 2008 The Thomson Corporation.