By Fabrizio Sebastiani
ESI Special Topics,
February 2004
Citing URL - http://www.esi-topics.com/fbp/2004/february04- FabrizioSebastiani.html
|
Fabrizio Sebastiani answers a
few questions about this month's fast breaking paper in the field of
Computer Science.
From
•>>February 2004
Field:
Computer Science
Article Title: Machine learning in automated text categorization
Authors: Sebastiani, F
Journal: ACM COMPUT SURV
Volume: 34
Page: 1-47
Year: MAR 2002
* CNR, Ist Elaboraz Informaz, Via G Moruzzi 1, I-56124 Pisa, Italy.
* CNR, Ist Elaboraz Informaz, I-56124 Pisa, Italy.
|
Why
do you think your paper is highly cited?
|

“The paper reviews the field of automated text classification. This
field has to do with building software systems that can automatically
build text classifiers.”
|
|
I think there are several reasons:
- It is a review paper and review papers tend to be highly
cited.
- It tackles a topic, namely the machine learning approach to
automated text categorization, which has been steadily growing
in importance since the mid-1990s, due to the increased
availability of textual documents in digital form and the
ensuing need to organize them and manage them with the
smallest possible level of manual intervention. The machine
learning approach to automated text categorization has by now
definitely superseded the knowledge engineering approach.
- It is tutorial in nature. I put a lot of effort not only
into reviewing the literature thoroughly, but also into
explaining the subject matter as clearly as possible, with a
special eye to newcomers to the field. The fact that there is
as yet no textbook devoted to this topic increases, I think,
the tutorial value of the paper.
Does
it describe a new discovery or a new methodology that's useful to
others?
It describes an entire class of techniques—the class of
supervised learning techniques—that are useful in building
text classifiers. In turn, text classifiers are useful for
several types of applications in content-based management of
textual information, such as indexing scientific articles for
later use within textual information retrieval systems, spam
filtering, selective dissemination of information,
classification of documents by genre, automated authorship
attribution, on-the-fly classification of Web search results,
and more.
Could
you summarize the significance of your paper in layman's terms?
The paper reviews the field of automated text classification.
This field has to do with building software systems that can
automatically build text classifiers. A text classifier is
itself a software system, that is able to decide to which
classes (among those in a predefined set) a given text document
belongs to; for instance, deciding whether a given newspaper
article fits under HOME_NEWS, or SPORTS, or LIFESTYLES; deciding
under which "classified ads" section an ad fits;
deciding whether a given e-mail message fits under SPAM or
LEGITIMATE; or deciding whether a movie review fits under
THUMBS_UP or THUMBS_DOWN.
How
did you become involved in this research?
I was already involved in information retrieval, a related
area concerning the content-based retrieval of textual
documents. My interest in using machine learning techniques for
managing textual documents derived from the recognition that the
key problem of managing text is that the semantics of a text is
a subjective notion, and that the only way to manage this
subjectivity is to learn it from user data.
Fabrizio Sebastiani
Head, Text Classification Group
Istituto di Scienza e Tecnologie dell'Informazione
Consiglio Nazionale delle Ricerche
Pisa, Italy
|
ESI Special Topics,
February 2004
Citing URL - http://www.esi-topics.com/fbp/2004/february04- FabrizioSebastiani.html
|
|
|