2,312 research outputs found
The Symbiotic Relationship Between Information Retrieval and Informetrics
Informetrics and information retrieval (IR) represent fundamental areas of study within information science. Historically, researchers have not fully capitalized on the potential research synergies that exist between these two areas. Data sources used in traditional informetrics studies have their analogues in IR, with similar types of empirical regularities found in IR system content and use. Methods for data collection and analysis used in informetrics can help to inform IR system development and evaluation. Areas of application have included automatic indexing, index term weighting and understanding user query and session patterns through the quantitative analysis of user transaction logs. Similarly, developments in database technology have made the study of informetric phenomena less cumbersome, and recent innovations used in IR research, such as language models and ranking algorithms, provide new tools that may be applied to research problems of interest to informetricians. Building on the author’s previous work (Wolfram 2003), this paper reviews a sample of relevant literature published primarily since 2000 to highlight how each area of study may help to inform and benefit the other
An evaluation of Bradfordizing effects
The purpose of this paper is to apply and evaluate the bibliometric method Bradfordizing for information retrieval (IR) experiments. Bradfordizing is used for generating core document sets for subject-specific questions and to reorder result sets from distributed searches. The method will be applied and tested in a controlled scenario of scientific literature databases from social and political sciences, economics, psychology and medical science (SOLIS, SoLit, USB Köln Opac, CSA Sociological Abstracts, World Affairs Online, Psyndex and Medline) and 164 standardized topics. An evaluation of the method and its effects is carried out in two laboratory-based information retrieval experiments (CLEF and KoMoHe) using a controlled document corpus and human relevance assessments. The results show that Bradfordizing is a very robust method for re-ranking the main document types (journal articles and monographs) in today’s digital libraries (DL). The IR tests show that relevance distributions after re-ranking improve at a significant level if articles in the core are compared with articles in the succeeding zones. The items in the core are significantly more often assessed as relevant, than items in zone 2 (z2) or zone 3 (z3). The improvements between the zones are statistically significant based on the Wilcoxon signed-rank test and the paired T-Test
Science Models as Value-Added Services for Scholarly Information Systems
The paper introduces scholarly Information Retrieval (IR) as a further
dimension that should be considered in the science modeling debate. The IR use
case is seen as a validation model of the adequacy of science models in
representing and predicting structure and dynamics in science. Particular
conceptualizations of scholarly activity and structures in science are used as
value-added search services to improve retrieval quality: a co-word model
depicting the cognitive structure of a field (used for query expansion), the
Bradford law of information concentration, and a model of co-authorship
networks (both used for re-ranking search results). An evaluation of the
retrieval quality when science model driven services are used turned out that
the models proposed actually provide beneficial effects to retrieval quality.
From an IR perspective, the models studied are therefore verified as expressive
conceptualizations of central phenomena in science. Thus, it could be shown
that the IR perspective can significantly contribute to a better understanding
of scholarly structures and activities.Comment: 26 pages, to appear in Scientometric
Scopus's Source Normalized Impact per Paper (SNIP) versus a Journal Impact Factor based on Fractional Counting of Citations
Impact factors (and similar measures such as the Scimago Journal Rankings)
suffer from two problems: (i) citation behavior varies among fields of science
and therefore leads to systematic differences, and (ii) there are no statistics
to inform us whether differences are significant. The recently introduced SNIP
indicator of Scopus tries to remedy the first of these two problems, but a
number of normalization decisions are involved which makes it impossible to
test for significance. Using fractional counting of citations-based on the
assumption that impact is proportionate to the number of references in the
citing documents-citations can be contextualized at the paper level and
aggregated impacts of sets can be tested for their significance. It can be
shown that the weighted impact of Annals of Mathematics (0.247) is not so much
lower than that of Molecular Cell (0.386) despite a five-fold difference
between their impact factors (2.793 and 13.156, respectively)
Editorial for the First Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics
The workshop "Mining Scientific Papers: Computational Linguistics and
Bibliometrics" (CLBib 2015), co-located with the 15th International Society of
Scientometrics and Informetrics Conference (ISSI 2015), brought together
researchers in Bibliometrics and Computational Linguistics in order to study
the ways Bibliometrics can benefit from large-scale text analytics and sense
mining of scientific papers, thus exploring the interdisciplinarity of
Bibliometrics and Natural Language Processing (NLP). The goals of the workshop
were to answer questions like: How can we enhance author network analysis and
Bibliometrics using data obtained by text analytics? What insights can NLP
provide on the structure of scientific writing, on citation networks, and on
in-text citation analysis? This workshop is the first step to foster the
reflection on the interdisciplinarity and the benefits that the two disciplines
Bibliometrics and Natural Language Processing can drive from it.Comment: 4 pages, Workshop on Mining Scientific Papers: Computational
Linguistics and Bibliometrics at ISSI 201
DataCite as a novel bibliometric source: Coverage, strengths and limitations
This paper explores the characteristics of DataCite to determine its
possibilities and potential as a new bibliometric data source to analyze the
scholarly production of open data. Open science and the increasing data sharing
requirements from governments, funding bodies, institutions and scientific
journals has led to a pressing demand for the development of data metrics. As a
very first step towards reliable data metrics, we need to better comprehend the
limitations and caveats of the information provided by sources of open data. In
this paper, we critically examine records downloaded from the DataCite's OAI
API and elaborate a series of recommendations regarding the use of this source
for bibliometric analyses of open data. We highlight issues related to metadata
incompleteness, lack of standardization, and ambiguous definitions of several
fields. Despite these limitations, we emphasize DataCite's value and potential
to become one of the main sources for data metrics development.Comment: Paper accepted for publication in Journal of Informetric
Usage History of Scientific Literature: Nature Metrics and Metrics of Nature Publications
In this study, we analyze the dynamic usage history of Nature publications
over time using Nature metrics data. We conduct analysis from two perspectives.
On the one hand, we examine how long it takes before the articles' downloads
reach 50%/80% of the total; on the other hand, we compare the percentage of
total downloads in 7 days, 30 days, and 100 days after publication. In general,
papers are downloaded most frequently within a short time period right after
their publication. And we find that compared with Non-Open Access papers,
readers' attention on Open Access publications are more enduring. Based on the
usage data of a newly published paper, regression analysis could predict the
future expected total usage counts.Comment: 11 pages, 5 figures and 4 table
A comparison of two techniques for bibliometric mapping: Multidimensional scaling and VOS
VOS is a new mapping technique that can serve as an alternative to the
well-known technique of multidimensional scaling. We present an extensive
comparison between the use of multidimensional scaling and the use of VOS for
constructing bibliometric maps. In our theoretical analysis, we show the
mathematical relation between the two techniques. In our experimental analysis,
we use the techniques for constructing maps of authors, journals, and keywords.
Two commonly used approaches to bibliometric mapping, both based on
multidimensional scaling, turn out to produce maps that suffer from artifacts.
Maps constructed using VOS turn out not to have this problem. We conclude that
in general maps constructed using VOS provide a more satisfactory
representation of a data set than maps constructed using well-known
multidimensional scaling approaches
- …