Background: The impact of scientific publications has traditionally been expressed in terms of citation counts. However, scientific activity has moved online over the past decade. To better capture scientific impact in the digital era, a variety of new impact measures has been proposed on the basis of social network analysis and usage log data. Here we investigate how these new measures relate to each other, and how accurately and completely they express scientific impact. Methodology: We performed a principal component analysis of the rankings produced by 39 existing and proposed measures of scholarly impact that were calculated on the basis of both citation and usage log data. Conclusions: Our results indicate that the notion of scientific impact is a multi-dimensional construct that can not be adequately measured by any single indicator, although some measures are more suitable than others. The commonly used citation Impact Factor is not positioned at the core of this construct, but at its periphery, and should thus be used with caution.

Funding: This work was funded by the Andrew W. Mellon Foundation in 2006-2008 under grant #30600708. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

. These authors contributed equally to this work.

Science is a gift economy; value is defined as the degree to which one?s ideas have freely contributed to knowledge and impacted the thinking of others. Since authors use citations to indicate which publications influenced their work, scientific impact can be measured as a function of the citations that a publication receives. Looking for quantitative measures of scientific impact, administrators and policy makers have thus often turned to citation data.

A variety of impact measures can be derived from raw citation
data. It is however highly common to assess scientific impact in
terms of average journal citation rates. In particular, the Thomson
Scientific Journal Impact Factor (JIF) [

The JIF has achieved a dominant position among measures of
scientific impact for two reasons. First, it is published as part of a
well-known, commonly available citation database (Thomson
Scientific?s JCR). Second, it has a simple and intuitive definition.
The JIF is now commonly used to measure the impact of journals
and by extension the impact of the articles they have published,
and by even further extension the authors of these articles,
their departments, their universities and even entire countries.
However, the JIF has a number of undesirable properties which
have been extensively discussed in the literature [

The shortcomings of the JIF as a simple citation statistic have
led to the introduction of other measures of scientific impact.
Modifications of the JIF have been proposed to cover longer
periods of time [

In addition, the success of Google?s method of ranking web
pages has inspired numerous measures of journal impact that
apply social network analysis [

Since scientific literature is now mostly published and accessed
online, a number of initiatives have attempted to measure scientific
impact from usage log data. The web portals of scientific publishers,
aggregator services and institutional library services now
consistently record usage at a scale that exceeds the total number of
citations in existence. In fact, Elsevier announced 1 billion fulltext
downloads in 2006, compared to approximately 600 million
citations in the entire Web of Science database. The resulting
usage data allows scientific activity to be observed immediately
upon publication, rather than to wait for citations to emerge in the
published literature and to be included in citation databases such
as the JCR; a process that with average publication delays can
easily take several years. Shepherd (2007) [

These developments have led to a plethora of new measures of scientific impact that can be derived from citation or usage log data, and/or rely on distribution statistics or more sophisticated social network analysis. However, which of these measures is most suitable for the measurement of scientific impact? This question is difficult to answer for two reasons. First, impact measures can be calculated for various citation and usage data sets, and it is thus difficult to distinguish the true characteristics of a measure from the peculiarities of the data set from which it was calculated. Second, we do not have a universally accepted, golden standard of impact to calibrate any new measures to. In fact, we do not even have a workable definition of the notion of ??scientific impact?? itself, unless we revert to the tautology of defining it as the number of citations received by a publication. As most abstract concepts ??scientific impact?? may be understood and measured in many different ways. The issue thus becomes which impact measures best express its various aspects and interpretations.

Here we report on a Principal Component Analysis (PCA) [

The mentioned 39 scientific impact measures were derived from various sources. Our analysis included several existing measures that are published on a yearly basis by Thomson-Reuters and the Scimago project. Other measures were calculated on the basis of existing citation- and usage data. The following sections discuss the methodology by which each of these impact measures was either extracted or derived from various usage and citation sources.

As shown in Fig. 1, the following databases were used in this analysis:

Citation. The CDROM version of the 2007 Journal Citation Reports (JCR Science and Social Science Editions) published by Thomson-Reuters Scientific (formerly ISI).

Usage. The MESUR project?s reference collection of usage log data: http://www.mesur.org/: a collection of 346,312,045 user interactions recorded by the web portals operated by Thomson Scientific (Web of Science), Elsevier (Scopus), JSTOR, Ingenta, University of Texas (9 campuses, 6 health institutions), and California State University (23 campuses) between March 1st 2006 and February 1st 2007.

Additional citation measures. A set of journal rankings published by the Scimago project that are based on Elsevier Scopus citation data: http://www.scimagojr.com/

In the following sections we detail the methodology that was used to retrieve and calculate 39 scientific impact measures from these data sets, and the subsequent analysis of the correlations between the rankings they produced. Throughout the article measures are identified by a unique identifier number that is listed in Table 1. We hope these identifiers will allow readers to more conveniently identify measures in subsequently provided diagrams and tables such as Fig. 1, 2 and 3.

In [

The 2007 JCR contains a table that lists the number of citations that point from one journal to another. The number of citations is separated according to the publication year of both the origin and target of the citation. For example, from this table we could infer that 20 citations point from articles published in ??Physica Review A?? in 2006 to articles published in ??Physica Review B?? in 2004 and 2005. Each such data data point can thus be described as the n-tuple

a[A~V 2|Ys|Ye|Nz where V ~fv1, , vng is the set of n journals for which we have recorded citation data, Ys~fy0, , ymg is the set of m years for which outgoing were recorded, Ye~fy0, , ykg is the set of k years for which incoming citations were recorded, and Nz denotes the set of positive integers including zero that represent the number of counted citations. For example, the journal citation tuplet a~ð1, 2, f2006g, f2004, 2005g, 50Þ represents the observation that 20 citations point from articles published in journal 1 in the year 2006 to those published in journal 2 in 2004 and 2005.

A, the set of citation n-tuples, describes a citation network whose connections indicate the number of times that articles published in one journal cited the articles published in another journal for a particular time period. Such a network can be represented by the citation matrix CYs,Ye of which each entry ci, j represents the number of observed citations that point from articles published in journal vi in the date range given by Ys to articles published in journal vj in the date range Ye.

We attempted to ensure that our citation network conformed to the definition of the Journal Impact Factor rankings published in the 2007 JCR. We therefore extracted citations from the JCR that originated in 2006 publications and pointed to 2004 and 2005 publications. The resulting citation network contained 897,608 connections between 7,388 journals, resulting in a network density of 1.6% (ratio of non-zero connections over all possible nonreflexive connections). This citation network was represented as a 7,33867,338 matrix labeled C whose entries ci, j were the number of 2006 citations pointing from journal i to the 2004 and 2005 articles of journal j.

In [

In short, the MESUR project?s reference collection of usage log data consists of log files recorded by a variety of scholarly web portals (including some of the world?s most significant publishers and aggregators) who donated their usage log data to the MESUR project in the course of 2006?2007. All MESUR usage log data consisted of a list of temporally sorted ??requests??. For each individual request the following data fields were recorded: (1) date/time of the request, (2) session identifier, (3) article identifier, and (4) request type. The session identifier grouped requests issued by the same (anonymous) user, from the same client, within the same session. This allowed the reconstruction of user ??clickstreams??, i.e. the sequences of requests by individual users within a session. Since each article for this investigation is assumed to be A C P f o . ) r e d r o k n a r n a s n o i t a l e r r ,iirttceeeeunhdgdw ,iirttceeeehdgdw ,iirttceeeeunhdgdw ,iirttceeeeunnhddgdw ,iirttceeeenhddgdw ,iirttceeeehdgdw ,iirttceeeehdgdw ,iirttceeeenhddgdw ,iirttceeeeunnhddgdw ,iirttceeeenhddgdw ,iirttceeeeunnhddgdw ,iirttceeeeunnhddgdw ,iirttceeeeunnhddgdw ,iirttceeeeunhdgdw ,iirttceeeeunhdgdw ,iirttceeeeunhdgdw ,iirttceeeehdgdw ,iirttceeeenhddgdw ,iirttceeeeunnhddgdw ,iirttceeeenhddgdw

D D D U U D D U U U U U U D D D D U U U s t s s s s n u u u u e p p p p 7 7 7 7 7 7 7 7 7 7 7 c2onopm rceSuo i/ccaSSoogm J0207RC J0207RC i/ccaSSoogm J0207RC J2007RC J2007RC J2007RC J2007RC J2007RC i/ccaSSoogm i/ccaSSoogm J2007RC J2007RC J2007RC J2007RC J2007RC J2007RC J2007RC J2007RC J2007RC J2007RC J2007RC SE200RUM SE002RUM SE002RUM SE002RUM SE002RUM SE002RUM SE002RUM SE002RUM SE002RUM SE002RUM SE002RUM ifrrrttsxaaavaeom rseaeuM lirJckaaaSRunnoogm iIIcxyaeenddmm lilrttsssyaeeenonCC irtsceeodpC lIrrttJccaaaFouonpm lilrttssscyaeeenonC li-rrtttcyaeeeenudgO li-rrtttcyaeeeenudgO ilrrttyaeeeengCD ilrrttyaeeeengCD I-xendH liittsccaaSeToogm ililirrttJyaaePouonbbC liI-rrttcyaeeeenndg liI-rrttcyaeeeenndg kaaePRng kaaePRng kaaePRng kaaePRng f-rtcaYo ilrtttsscyaeeeeeBnnnw ilrtttsscyaeeeeeBnnnw liiiff-ttLeaaonCH lilrttssscyaeeenonC lilrttssscyaeeenonC lirrttcyaeeeengD kaaePRng kaaePRng liI-rrttcyaeeeenndg il-rrtttcyaeeeenudgO kaaePRng kaaePRng lirtttsscyaeeeeeBnnnw lirtttsscyaeeeeeBnnnw g .saeuM yeTp iittaonC iittaonC iittaonC ittanoC iittaonC iittaonC iittaonC iittaonC iittaonC iittaonC iittaonC iittaonC iittaonC iittaonC iittaonC iittaonC iittaonC iittaonC iittaonC iittaonC iittaonC iittaonC iittaonC saegU saegU saegU saegU saegU saegU saegU saegU saegU saegU saegU 1 . t n o C . 1 e l b a T r e c r u o S e r u s a e M e p y T s ttrrraaeeekopNwm ,iirttceeeehngdddUw ,iirttceeeehgddDw ,iirttceeeehgddDw lirrttcyaeeeengD il-rrtttcyaeeeenugdO ilI-rrttcyaeeeenngd liilrrtJsyaaePonuobbU IttrscceFaaogapUm e e e e e sagU sagU sagU sagU sgaU D I published in a journal, we can derive journal clickstreams from article clickstreams.

Over all clickstreams we can thus determine the transition probability

N vi, vj Pði, jÞ~ P N vi, vj

j where N vi, vj denotes the number of times that we observe journal vi being followed by vj in the journal clickstreams in MESUR?s usage log data. The transition probability Pði, jÞ thus expresses the probability by which we expect to observe vj after vi over all user clickstreams.

This analysis was applied to the MESUR reference data set, i.e.
346,312,045 user interactions recorded by the web portals
operated by Thomson Scientific (Web of Science), Elsevier
(Scopus), JSTOR, Ingenta, University of Texas (9 campuses, 6
health institutions), and California State University (23 campuses)
between March 1st 2006 and February 1st 2007. To ensure that
all subsequent metrics were calculated over the same set of
journals, the resulting set of journal transition probabilities were
trimmed to 7,575 journals for which a JIF could be retrieved from
the 2007 JCR. All usage transition probabilities combined thus
resulted in the 7,57567,575 matrix labeled U . Each entry ui, j of
matrix U was the transition probability Pði, jÞ between two
journals i and j. Matrix U contained 3,617,368 non-zero
connections resulting in a network density of 6.3%. This
procedure and the resulting usage network is explained in detail in [

Four classes of social network measures were applied to both the citation and usage network represented respectively by matrix C and matrix U , namely:

Degree centrality. (Table 1, IDs 7?10, 14, 15, 26, 29, 30, 35?37) Number of connections pointing to or emerging from a journal in the network.

Closeness centrality. (Table 1, IDs 3, 6, 24, 25) The average length of the geodesic connecting a specific journal to all other journals in the network.

Betweenness centrality. (Table 1, IDs 21, 22, 33, 34) The number of geodesics between all pairs of journals in the network that pass through the specific journal.

PageRank. (Table 1, IDs 16?19, 27, 28, 31, 32) As defined by
Brin and Page (1998) [

The definitions of each of the measures in these classes were varied according to the following network factors: (1) Weighted vs. unweighted connections, i.e. measures can be calculated by assuming that each non-zero connection valued 1 vs. taken into account the actual weight of the connection, (2) Directed vs. undirected connections, i.e. some measures can be calculated to take into account the directionality of journal relations or not, and finally (3) Citation vs. usage network data, i.e. any of these measure variations can be calculated for either the citation or the usage network.

These factors result in 23 = 8 variations for each the above listed 4 classes of social network measures, i.e. 32 variants. However, not all permutations make equal sense. For example, in the case of Betweenness Centrality we calculated only two of these variants that both ignored connection directionality (irrelevant for betweenness) but one took into account connection weights (weighted geodesics) and another ignored connections weights (all connections weighted .0). Each of these variants were however calculated for the citation and usage-network. The final list of social network measures thus to some degree reflect our judgment on which of these permutations were meaningful.

In addition to the existing measures and the social network measure, we calculated, a number of measures that did not fit any the above outlined classes, namely

Y-Factor. (Table 1, ID 20) A measure that results from
multiplying a journal?s Impact Factor with its PageRank, described
in Bollen (2006) [

Journal Cite Probability. (Table 1, ID 13) We calculated the Journal Cite Probability from the citation numbers listed in the 2007 JCR 2007.

Journal Use Probability. (Table 1, ID 38) The normalized frequency by which a journal will be used according to the MESUR usage log data.

Usage Impact Factor. (Table 1, ID 39) Same definition as the JIF, but expressing the 2-year ??usage?? average for articles published in a journal.

In total, we calculated 32 citation- and usage-based impact measures; 16 social network measures on the basis of matrix C (citation network) and 16 social network measures on the basis of matrix U (usage network). 4 journal impact measures published by the Scimago group (http://www.scimagojr.com/) and 3 precalculated impact measures from the 2007 JCR were added, bringing the total to 39 measures. A list of measures is provided in Table 1 along with information on the data they have been derived from and the various network factors that were applied in their calculation. A list of mathematical definitions is provided in Appendix S1.

The set of selected measures was intended to capture the major classes of statistics and social network measures presently proposed as alternatives to the JIF. In summary, the set of all measures can be categorized in 4 major classes. First, citation and usage statistics such as Citation Probability (number of one journal?s citations over total citations), Usage Probability (amount of one journal?s usage over total usage), the JIF, the Scimago Cites per Doc, and a Usage Impact Factor (UIF) whose definition follows that of the JIF but is based on usage counts. Second, citation and usage social network measures such as Closeness Centrality (the mean length of geodesics between a journal and all other journals), Betweenness Centrality (number of times that a journal sits on the geodesics between all pairs of journals) and PageRank (cf. Eigenvector Centrality). Third, a set of citation and usage degree centrality measures such as OutDegree Centrality, In-Degree Centrality and Undirected Degree Centrality. Finally, we included a set of recently introduced measures such as the Scimago Journal Rank (SJR), the Y-factor

Spearman rank-order correlations were then calculated for each pair of journal rankings. Because C, U and the Scimago rankings pertained to slightly different sets of journals, correlation values were only calculated for the intersections of those sets, i.e. N = 7,388, N = 7,575 or N = 6,913 journals. For 39 measures. this resulted in a 39639 correlation matrix R of which each entry ri, j [½{1,1 is the Spearman rank-order correlation between the journal rankings produced by measure i and measure j.

A sample of matrix R for 10 selected measures is shown below. For example, the Spearman rank-order correlation between the Citation H-index and Usage PageRank is 0.66. The IDs listed in Table 1 precede each measure name.

0 1:00 0:71 0:77 0:52 0:79 0:55 0:69 0:63 0:60 0:18 1 19 : Citation PageRank
BBBB 00::7771 00::5929 10::0502 00::6629 00::6739 00::3895 00::7409 00::7434 00::6489 00::2202 CCCC 522: :JoCuitrantailonImBpeatwcteFenancetossr
R10|10~BBBBBBBBBB 0000::::75659295 0000::::76489995 0000::::66733209 0100::::60478098 1000::::06680862 0001::::87402800 0010::::64046900 0000::::58466902 0000::::68565506 0000::::05116335 CCCCCCCCCC 311611:: ::CCCUiittisaatattaiigtooiennoPnSCaHclgoime{sReaniagnenodskseJxournal Rank
BBBB@ 00::6603 00::4494 00::6783 00::6556 00::6662 00::5400 00::8859 01::9070 10::0907 00::4425 CCCCA 3244 :: UUssaaggee BCelotwseeneensnsess
0:18 0:22 0:20 0:06 0:15 0:13 0:53 0:45 0:42 1:00 39 : Usage Impact Factor
Not all pair-wise correlations were statistically significant.
Two measures in particular lacked significant correlations
(N~39, pw0:05) with any of the other measures, namely Citation
Half-Life and the UIF. They were for that reason removed from
the list of measures under consideration. All other Spearman
rank-order correlations were statistically significant (U : N~39,
pv0:05). The reduced 37637 correlation matrix R was subjected
to a Principal Component Analysis [

The resulting PCA components were ranked according to the degree by which they explain the variances in R0s values (eigenvalues transformed to component loadings). The component loadings are listed in Table 2. The first component, PC1,

PC1 66.1% 66.1%

PC2 17.3% 83.4%

PC3 9.2% 92.6%

PC4 4.8% 97.4%

PC5 0.9% 98.3% represents 66.1% of the variance in measure correlations, with each successive component representing less variance, i.e. PC2 17%, PC3 9% and PC4 4%. Retention of the first 2 components will thus yield a model that covers 83.4% of variance in measure correlations. The addition of the third component will yield a model that covers 92.6% of variation in measure correlations.

We projected all measures unto the first two components, PC1 and PC2, to create a 2-dimensional map of measures. A varimax rotation was applied to the measure loadings to arrive at a structure that was more amenable to interpretation. The measure loadings for each component are listed in Table 1 (??PC1?? and ??PC2??). The resulting 2-dimensional map of measure similarities is shown in Fig. 2. Measures are identified in the map by their ??ID?? in Table 1. Black circles indicate citation-based measures. White circles indicate usage-based measures. The JIF is marked by a blue circle (ID 5). The hue of any map location indicates how strongly measures are concentrated in that particular area, i.e. red means highly clustered.

To cross-validate the PCA results, a hierarchical cluster analysis (single linkage, euclidean distances over R0s row vectors) and a kmeans cluster analysis were applied to the measure correlations in R to identify clusters of measures that produce similar journal rankings.

The map in Fig. 2 reveals a number of clusters. First, we observe
a cluster in the top right quadrant that contains all usage-based
measures (IDs 24?37), with the exception of Usage Probability (ID
38). In the upper-left and bottom-left quadrants of the map we find
most citation-based measures. The bottom-left quadrant contains
the JIF that is among others surrounded by the Scimago Cites per
Doc, the Scimago Journal Rank, the JCR immediacy index (IDs
1?8) and in the upper section the various permutations of citation
degree centrality measures (IDs 9?10, 14?15), a group of Total
Cite measures (IDs 12?13) and most prominently the H-index (ID
11). The arrangement of the H-index and Citation Total Cites is
quite similar to that found by Leydesdorff (2007) [

A complete linkage hierarchical cluster analysis based on the Euclidean distances of the measure R0s row vectors confirms these general distinctions. When we cut the dendrogram in Fig. 3 at the 1.1 distance level, we find 4 main clusters. First, at the top of Fig. 3 we find the first cluster which contains the JIF, SJR and other related measures that express citation normalized per document. Followingly, a second cluster contains the Citation Betweenness Centrality and Pagerank measures that rely on the graphproperties of the citation network. The third cluster contains Total Citation rates, various degree centralities and the H-index that express various distribution parameters of total citation counts. At the bottom of Fig. 3, we find the fourth cluster that contains all usage measures.

Table 3 lists the results of a 5 cluster k-means analysis of matrix R that further corroborates the observed clustering in the PCA and hierarchical cluster analysis.

The pattern of clusters indicate that some measures express a more distinct aspect of scientific impact and will thus be farther removed from all other measures. Table 1 lists the r values of each measure, defined as the mean Spearman rank-order correlation of a measure to all other 38 measures in R. The r of Citation

Interpretation

Half-Life (ID 23) and the Usage Impact Factor (ID 39) fell below the significance threshold of pv0:05 for N~39, further justifying their removal as outliers. Most r values range from 0.6 to 0.7 indicating a moderate but significant congruence in the rankings produced by a majority of measures. However, a cluster of five particular measures has low r values in the range 0.5?0.6. They form a separate, but poorly defined cluster in the lower bottom-left quadrant of Fig. 2 (ID 1?5: SJR, Immediacy Index, Citation Undirected Weighted Closeness Centrality, Scimago Cites per Doc, and the 2007 JIF), indicating they produce rankings removed from the ??mainstream?? in Fig. 2.

To interprete the meaning of PC1 and PC2 we need to investigate the distribution of measures along either axis of the map in Fig. 2. Fig. 4 shows a simplified schema of the distribution of impact measures along the PC1 and PC2 axes. Each of the observed cluster of measures has been given an intuitive ??group?? name to simplify the general pattern.

PC1 clearly separates usage measures from citation measures. On the positive end of PC1, we find a sharply demarcated cluster of all usage measures, with the exception of the Journal Use Probability (ID 38) which sits isolated on the extreme positive end of PC1. On the negative end of PC1, we find most citation measures. Surprisingly, some citation measures are positioned close to the cluster of usage measures in terms of their PC1 coordinates. Citation Closeness (ID 3) and in particular Citation Immediacy Index (ID 2) are located on the positive end of PC1, i.e. closest to the usage measures. Citation Betweenness Centrality (IDs 21 and 22) are also positioned closely to the cluster of usage measures according to PC1.

This particular distribution of citation measures along PC1
points to an interesting, alternative interpretation of PC1 simply
separating the usage from the citation measures. In the center, we
find Citation Immediacy Index (ID 2) positioned close to the
cluster of usage measures in terms of its PC1 coordinates. The
Citation Immediacy Index is intended to be a ??rapid?? indicator of
scientific impact since it is based on same-year citations. Its
proximity to the usage measures according to PC1 may thus
indicate that the usage measures are equally rapid indicators, if not
more so. The assumption that usage measures are ??Rapid??
indicators of scientific impact is furthermore warranted for the
following reasons. First, usage log data is generally considered a
more ??rapid?? indicator of scientific impact than citation data,
since usage log data is nearly immediately affected by changes in
scientific habits and interests whereas citation data is subject to
extensive publication delays. It has in fact been shown that present
usage rates predict future citation rates [

PC2 separates citation statistics such as Scimago Total Cites
(ID12), JIF (Table 1, ID 5) and Cites per Doc (ID 4) on its negative
end from the social network measures such as Citation
Betweenness centrality (IDs 21 and 22) and Citation PageRank
(ID 16?19) including the Y-factor (ID 20) on its positive end.
Measures such as the JIF (ID 5), Scimago Total Cites (ID 12),
Journal Cite Probability (ID13), and Journal Use Probability (ID
38) express the rate at which journals indiscriminately receive
citations or usage from a variety of sources, i.e. their Popularity,
whereas the mentioned social network measures rely on network
structure to express various facets of journal Prestige [

Consequently, the PCA results could be interpreted in terms of a separation of measures along two dimensions: ??Rapid?? vs. ??Delayed?? (PC1) and ??Popularity?? vs. ??Prestige?? (PC2). Surprisingly, most usage-based measures would then fall in the ??Rapid, ??Prestige?? quadrant, approximated in this aspect only by two Citation Betweenness Centrality measures. The majority of citation-based measures can then be classified as ??Delayed??, but with the social network measures being indicative of aspects of ??Prestige?? and the normalized citation measures such as the JIF, Scimago Journal Rank (ID 1) and Cites per Doc indicative of journal ??Popularity??. We also note that the Scimago Journal Rank is positioned among measures such as the JIF and Cites per Doc. This indicates it too expresses ??Delayed?? ??Popularity??, in spite of the fact that SJR rankings originate from 2007 citation data and that the SJR has been explicitly defined to ??transfer(s) (of) prestige from a journal to another one?? (http://www.scimagojr.com/ SCImagoJournalRank.pdf).

Another interesting aspect of the distribution of measures along PC1 and PC2 relates to the determination of a ??consensus?? view of scientific impact. The r values indicate the average Spearman rank-order correlation of a particular measure to all other measures, i.e. the degree to which it approximates the results of all other measures. The measure which best succeeds in approximating the most general sense of scholarly impact will therefore have the highest r and will therefore be the best candidate for a ??consensus?? measure. As shown in Table 1 that measure would be Usage Closeness Centrality (ID: 25) whose r~0:731. Conversely, the Citation Scimago Journal Rank (ID1), Citation Immediacy Index (ID 2), Citation Closeness Centrality (ID 3), Citaton Cites per doc (ID 4) and Citation Journal Impact Factor (ID:5) have the lowest r values indicating that they represent the most particular view of scientific impact.

The presented results pertain to what we believe to be the largest and most thorough survey of usage- and citation based measures of scientific impact. Nevertheless, a number of issues need to be addressed in future research efforts.

First, although an attempt was made to establish a representative sample of existing and plausible scientific impact measures, several other conceivable impact measures could have been included in this analysis. For example, the HITS algorithm has been successfully applied to web page rankings. Like Google?s PageRank it could be calculated for our citation and usage journal networks. Other possible measures that should be considered for inclusion include the Eigenfactor.org measures, and various information-theoretical indexes. The addition of more measures may furthermore enable statistical significance to be achieved on the correlations with now-removed measures such as Citation Half-Life and the Usage Impact Factor, so that they could be included on the generated PCA map of measures.

Second, we projected measure correlations onto a space spanned by the 2 highest-ranked components, the first of which seems to make a rather superficial distinction between usage- and citation-derived impact measures and the second of which seems to make a meaningful distinction between ??degree?? and ??quality?? of endorsement. Future analysis should focus on including additional components, different combinations of lower-valued components and even the smallest-valued components to determine whether they reveal additional useful distinctions. In addition, non-linear dimensionality reduction methods could be leveraged to reveal non-linear patterns of measure correlations.

Third, a significant number of the measures surveyed in this article have been standard tools for decades in social network analysis, but they are not in common use in the domain of scientific impact assessment. To increase the ??face-validity?? of these rankings, all have been made available to the public on the MESUR web site and can be freely explored and interacted with by users at the following URL: http://www.mesur.org/services.

Fourth, the implemented MESUR services can be enhanced to support the development of novel measures by allowing users to submit their own rankings which can then automatically be placed in the context of existing measures. Such a service could foster the free and open exchange of scientific impact measures by allowing the public to evaluate where any newly proposed measure can be positioned among existing measures. If the measure is deemed to similar to existing measures, it need not be developed. If however, it covers a part of the measure space that was previously unsampled, the new measure may make a significant contribution and could therefore be considered for wider adoption by those involved in scientific assessment.

Our results indicate that scientific impact is a multi-dimensional construct. The component loadings of a PCA indicate that 92% of the variances between the correlations of journal rankings produced by 37 impact measures can be explained by the first 3 components. To surpass the 95% limit, a 4-component model would have to be adopted.

A projection of measure correlations onto the first 2 components (83.4%) nevertheless reveals a number of useful distinctions. We found that the most salient distinction is made by PC1 which separates usage from citation measures with the exception of Citation Betweenness centrality and Citation Immediacy. The position of the latter and the time periods for which usage was recorded suggests an interpretation of PC1 as making a distinction between measures that provide a ??rapid?? vs ??delayed?? view of scientific impact.

PC2 seems to separate measures that express Popularity from those that express Prestige. Four general clusters of impact measures can be superimposed on this projection: (1) usage measures, (2) a group of distinctive yet dispersed measures expressing per document citation popularity, (3) measures based on total citation rates and distributions, and (4) finally a set of citation social network measures. These 4 clusters along with the PCA components allows us to quantitatively interpret the landscape of presently available impact measures and determine which aspects of scientific impact they represent. Future research will focus on determining whether these distinctions are stable across a greater variety of measures as well other usage and citation data sets.

Four more general conclusions can be drawn from these results; each has significant implications for the developing science of scientific assessment.

First, the set of usage measures is more strongly correlated (average Spearman rank-order correlation = 0.93, incl. Usage Probability) than the set of citation measures (average Spearman rank-order correlation = 0.65). This indicates a greater reliability of usage measures calculated from the same usage log data than between citation measures calculated from the same citation data. This effect is possibly caused by the significantly greater density of the usage matrix U in comparison to the citation matrix C. As mentioned in the introduction, the amount of usage data that can be collected is much higher than the total amount of citation data in existence because papers can contain only a limited set of citations and once they are published that set is fixed in perpetuity. This limitation may place an upper bound on the reliability that can be achieved with citation measures, but it does not apply to usage measures.

Second, if our interpretation of PC2 is correct, usage-based measures are actually stronger indicators of scientific Prestige than many presently available citation measures. Contrary to expectations, the IF as well as the SJR most strongly express scientific Popularity.

Third, some citation measures are more closely related to their usage counterparts than they are to other citation measures such as the JIF. For example, the Spearman rank-order correlation between Citation Betweenness Centrality and Usage Betweenness Centrality is 0.747. In comparison, the Spearman rank-order correlation between the JIF and Citation Betweenness Centrality is only 0.52. This indicates that contrary to what would be expected, usage impact measures can be closer to a ??consensus ranking?? of journals than some common citation measures.

Fourth, and related, when we rank measures according to their average correlation to all other measures r, i.e. how close they are to all other measures, we find that the JIF and SJR rank 34rd and 38th respectively among 39 measures, indicating their isolated position among the studied set of measures. The JCR Citation Immediacy Index and the Scimago Cites per Doc are in a similar position. On the other hand, Usage Closeness centrality (measure 25) is positioned closest to all other measures (max. r~0:731). These results should give pause to those who consider the JIF the ??golden standard?? of scientific impact. Our results indicate that the JIF and SJR express a rather particular aspect of scientific impact that may not be at the core of the notion of scientific ??impact??. Usage-based measures such as Usage Closeness centrality may in fact be better ??consensus?? measures.

The ranking data produced to support the discussed Principal Component Analysis is available upon request from the corresponding author with the exception of those that have been obtained under proprietary licenses.

Appendix S1 Found at: doi:10.1371/journal.pone.0006022.s001 (0.06 PDF) MB

Conceived and designed the experiments: JB. Performed the experiments: JB RC. Analyzed the data: JB AH RC. Wrote the paper: JB HVdS AH. Methodological consultancy: HVdS.