Research areas in science, particularly those at the cutting edge
of their fields, are characterized by patterns of intense
communication between scientists. This communication manifests
itself in various ways, both formally and informally, but prominent
among these are citations from one scientist’s work to another.
Patterns of citation reflect a fine-grained selection process of how
scientists build on each others’ work, and the relationship of
these works to one another. Such patterns can be used to create a
picture of the state of a specific research area in terms of the
papers that constitute its core of seminal work.
The procedure to accomplish this in Essential Science Indicators
is called Research Front analysis. It is based on identifying the
most-cited papers across multiple disciplines over a five-year
period, and then determining how often these papers have been
jointly cited—that is, how often, in the footnotes or references of
given papers, a citation to one item is accompanied by a citation to
another highly cited item. This defines the frequency of co-citation
of the two highly cited papers.
Identifying research fronts involves manipulating the co-cited
papers in order to group together those that are strongly related.
Before embarking on this process, a threshold is set on the integer
co-citation frequencies to eliminate very low values, and the
remaining frequencies are converted to a normalized form using the
following formula:
Normalized co-citation = Integer co-citation frequency of A and
B/(citation frequency A*citation frequency B)^.5
In other words, we divide the co-citation frequency by the square
root of the product of the citation frequencies of the two papers. A
second threshold is set on these normalized values. In the most
recent data run for Essential Science Indicators, the integer
threshold was set to accept co-citation frequencies of 2 or greater,
and the normalized threshold was set at 0.3.
Starting with a co-cited pair that meets the thresholds, this
grouping procedure then finds other pairs that share common papers.
The gathering process continues until no other pairs of papers can
be added to the set. This process is commonly known as single-link
clustering. The resulting clusters vary in size from a minimum of
two papers to some maximum size.
The numeric attributes of fronts can help determine the
significance of the areas and their stage of development. The number
of core papers in the front and the total citations received give
indications of the size of the area. The numbers of citations per
core paper give an indication of the focus or concentration of
effort. The average publication year and distribution of core papers
by year give an indication of currency or "hotness"—that
is, how quickly research is changing and whether there are new
developments. An analysis of frequently occurring keywords or
phrases in the titles of the paper, as given by the front name, can
give an indication of the subject content and thematic focus of the
area.
Research front analysis will not identify all research areas or
all the papers in an area. However, it can assist in identifying
areas where important work is being done and where the scientific
community is focusing its attention.