Beginning in mid-February 2008, the 1997-2007 online version of the Science Watch® newsletter, ESI-Topics.com, and in-cites.com, will all be featured together on the redesigned ScienceWatch.com. All previous content from the three sites will be permanently archived, and remain accessible from any existing bookmarks to the archived pages. No new content will be added to this site. Updates and new content (updated biweekly) are available at ScienceWatch.com now.

Fast Breaking Comments

By Professor Walter L. Ruzzo

ESI Special Topics, December 2002
Citing URL - http://www.esi-topics.com/fbp/comments/december02-WalterLRuzzo.html

Professor Walter L. Ruzzo answers a few questions about this month's fast breaking paper in the field of Computer Science.


From •>>December 2002

Field: Computer Science
Article Title: "Validating clustering for gene expression data"
Authors: Yeung, KY;Haynor, DR;Ruzzo, WL
Journal: BIOINFORMATICS
Volume: 17
Page: 309-318
Year: APR 2001
* Univ Washington, Box 352350, Seattle, WA 98195 USA.
* Univ Washington, Seattle, WA 98195 USA.

ST:  Why do you think your paper is highly cited?

“Clustering” is a commonly used exploratory data analysis technique for attempting to discover patterns in large, complex data sets such as gene expression microarray experiments.  The explosion of such data in recent years has spawned invention of new clustering algorithms and renewed interest in old ones.  Which of these many methods are best for particular tasks or particular kinds of data?  It's easy enough to compare clustering algorithms on data sets where the "ideal clustering" is known in advance, but these cases are usually toy examples.  Our paper provides one of the few methodsKa Yee Yeung - Co-author available for comparative evaluation of clustering algorithms on real data where the "right answer" isn't known in advance.

ST:  Does it describe a new discovery or a new methodology that's useful to others?

It's a new methodology that's basically applicable for the comparison of any clustering algorithms on any particular data set.

ST:  What were some of the circumstances that led you to do this research?

In the early work on microarrays, every research group seemed to have their own favorite clustering method, and all clustering algorithms find "clusters"—that's their job.  But how could you tell whether those clusters were good ones?  We were looking for a more systematic, data-driven way to evaluate the methods, so that researchers could have more confidence in their results.

ST:  Could you summarize the significance of your paper in layman's terms?

"Clustering" is the process of dividing data points into groups or clusters so that, hopefully, the points in each group are more similar to each other than to points in other groups.  With luck, this will lead you to some useful hypotheses about the biological system you're studying, e.g., the genes in cluster A are related to such-and-such a function, while those in cluster B serve a different purpose.  Unfortunately, these divisions usually aren't clear-cut, and different clustering algorithms make different choices.  We proposed a way to test the "quality" of the algorithms without knowing the best clustering in advance.End

Walter L. Ruzzo, Professor, Computer Science & Engineering
University of Washington,
Seattle, WA

Co-authors of the Fast Breaking Paper:
Ka Yee Yeung, Bioinformatics Scientist, Microbiology
University of Washington
Seattle
, WA

 David R. Haynor, Professor, Radiology
 Health Science Center
 University of Washington
 Seattle, WA

ESI Special Topics, December 2002
Citing URL - http://www.esi-topics.com/fbp/comments/december02-WalterLRuzzo.html

•> Search Special Topics
Fast Breaking Papers Menu || All Topics Menu
Fast Breaking Papers Comments Menu
Help || About || Contact

ScienceWatch.com - Tracking Trends and Perfomance in Basic Research
Go to the new ScienceWatch.com

Write to the Webmaster with questions/comments. Terms of Usage.
The Research Services Group of Thomson Scientific |
(c) 2008 The Thomson Corporation.