40 research outputs found
Editorial for the Bibliometric-enhanced Information Retrieval Workshop at ECIR 2014
This first "Bibliometric-enhanced Information Retrieval" (BIR 2014) workshop
aims to engage with the IR community about possible links to bibliometrics and
scholarly communication. Bibliometric techniques are not yet widely used to
enhance retrieval processes in digital libraries, although they offer
value-added effects for users. In this workshop we will explore how statistical
modelling of scholarship, such as Bradfordizing or network analysis of
co-authorship network, can improve retrieval services for specific communities,
as well as for large, cross-domain collections. This workshop aims to raise
awareness of the missing link between information retrieval (IR) and
bibliometrics / scientometrics and to create a common ground for the
incorporation of bibliometric-enhanced services into retrieval at the digital
library interface. Our interests include information retrieval, information
seeking, science modelling, network analysis, and digital libraries. The goal
is to apply insights from bibliometrics, scientometrics, and informetrics to
concrete practical problems of information retrieval and browsing.Comment: 4 pages, Bibliometric-enhanced Information Retrieval Workshop at ECIR
2014, Amsterdam, N
Bibliometric-enhanced Information Retrieval: 2nd International BIR Workshop
This workshop brings together experts of communities which often have been
perceived as different once: bibliometrics / scientometrics / informetrics on
the one side and information retrieval on the other. Our motivation as
organizers of the workshop started from the observation that main discourses in
both fields are different, that communities are only partly overlapping and
from the belief that a knowledge transfer would be profitable for both sides.
Bibliometric techniques are not yet widely used to enhance retrieval processes
in digital libraries, although they offer value-added effects for users. On the
other side, more and more information professionals, working in libraries and
archives are confronted with applying bibliometric techniques in their
services. This way knowledge exchange becomes more urgent. The first workshop
set the research agenda, by introducing in each other methods, reporting about
current research problems and brainstorming about common interests. This
follow-up workshop continues the overall communication, but also puts one
problem into the focus. In particular, we will explore how statistical
modelling of scholarship can improve retrieval services for specific
communities, as well as for large, cross-domain collections like Mendeley or
ResearchGate. This second BIR workshop continues to raise awareness of the
missing link between Information Retrieval (IR) and bibliometrics and
contributes to create a common ground for the incorporation of
bibliometric-enhanced services into retrieval at the scholarly search engine
interface.Comment: 4 pages, 37th European Conference on Information Retrieval, BIR
worksho
In Praise of Interdisciplinary Research through Scientometrics
International audienceThe BIR workshop series foster the revitalisation of dormant links between two fields in information science: information retrieval and bibliometrics/scientometrics. Hopefully, tightening up these links will cross-fertilise both fields. I believe compelling research questions lie at the crossroads of scientometrics and other fields: not only information retrieval but also, for instance, psychology and sociology. This overview paper traces my endeavours to explore these field boundaries. I wish to communicate my enthusiasm about interdisciplinary research mediated by scientometrics and stress the opportunities offered to researchers in information science
Report on the Information Retrieval Festival (IRFest2017)
The Information Retrieval Festival took place in April 2017 in Glasgow. The focus of the workshop was to bring together IR researchers from the various Scottish universities and beyond in order to facilitate more awareness, increased interaction and reflection on the status of the field and its future. The program included an industry session, research talks, demos and posters as well as two keynotes. The first keynote was delivered by Prof. Jaana Kekalenien, who provided a historical, critical reflection of realism in Interactive Information Retrieval Experimentation, while the second keynote was delivered by Prof. Maarten de Rijke, who argued for more Artificial Intelligence usage in IR solutions and deployments. The workshop was followed by a "Tour de Scotland" where delegates were taken from Glasgow to Aberdeen for the European Conference in Information Retrieval (ECIR 2017
The linguistic patterns and rhetorical structure of citation context : an approach using n-grams
Using the full-text corpus of more than 75,000 research articles published by seven PLOS journals, this paper
proposes a natural language processing approach for identifying the function of citations. Citation contexts are
assigned based on the frequency of n-gram co-occurrences located near the citations. Results show that the most
frequent linguistic patterns found in the citation contexts of papers vary according to their location in the IMRaD
structure of scientific articles. The presence of negative citations is also dependent on this structure. This
methodology offers new perspectives to locate these discursive forms according to the rhetorical structure of
scientific articles, and will lead to a better understanding of the use of citations in scientific articles
Citation metrics for legal information retrieval: scholars and practitioners intertwined?
This paper examines citations in legal documents in the context of bibliometric-enhanced legal information retrieval. It is suggested that users of legal information retrieval systems wish to see both scholarly and non-scholarly information, and legal information retrieval systems are developed to be used by both scholarly and non-scholarly users. Since the use of citations in building arguments plays an important role in the legal domain, bibliometric information (such as citations) is an instrument to enhance legal information retrieval systems. This paper examines, through literature and data analysis, whether a bibliometric-enhanced ranking for legal information retrieval should consider both scholarly and non-scholarly publications, and whether this ranking could serve both user groups, or whether a distinction needs to be made.Our literature analysis suggests that for legal documents, there is no strict separation between scholarly and non-scholarly documents. There is no clear mark by which the two groups can be separated, and in as far as a distinction can be made, literature shows that both scholars and practitioners (non-scholars) use both types.We perform a data analysis to analyze this finding for legal information retrieval in practice, using citation and usage data from a legal search engine in the Netherlands. We first create a method to classify legal documents as either scholarly or non-scholarly based on criteria found in the literature. We then semi-automatically analyze a set of seed documents and register by what (type of) documents they are cited. This resulted in a set of 52 cited (seed) documents and 3086 citing documents. Based on the affiliation of users of the search engine, we analyzed the relation between user group and document type.Our data analysis confirms the literature analysis and shows much cross-citations between scholarly and non-scholarly documents. In addition, we find that scholarly users often open non-scholarly documents and vice versa. Our results suggest that for use in legal information retrieval systems citations in legal documents measure part of a broad scope of impact, or relevance, on the entire legal field. This means that for bibliometric-enhanced ranking in legal information retrieval, both scholarly and non-scholarly documents should be considered. The disregard by both scholarly and non-scholarly users of the distinction between scholarly and non-scholarly publications also suggests that the affiliation of the user is not likely a suitable factor to differentiate rankings on. The data in combination with literature suggests that a differentiation on user intent might be more suitable.Algorithms and the Foundations of Software technolog
Geographic information extraction from texts
A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction
Embedding models for supervised automatic extraction and classification of named entities in scientific acknowledgements
Acknowledgments in scientific papers may give an insight into aspects of the scientific community, such as reward systems, collaboration patterns, and hidden research trends. The aim of the paper is to evaluate the performance of different embedding models for the task of automatic extraction and classification of acknowledged entities from the acknowledgment text in scientific papers. We trained and implemented a named entity recognition (NER) task using the flair NLP framework. The training was conducted using three default Flair NER models with four differently-sized corpora and different versions of the flair NLP framework. The Flair Embeddings model trained on the medium corpus with the latest FLAIR version showed the best accuracy of 0.79. Expanding the size of a training corpus from very small to medium size massively increased the accuracy of all training algorithms, but further expansion of the training corpus did not bring further improvement. Moreover, the performance of the model slightly deteriorated. Our model is able to recognize six entity types: funding agency, grant number, individuals, university, corporation, and miscellaneous. The model works more precisely for some entity types than for others; thus, individuals and grant numbers showed a very good F1-Score over 0.9. Most of the previous works on acknowledgment analysis were limited by the manual evaluation of data and therefore by the amount of processed data. This model can be applied for the comprehensive analysis of acknowledgment texts and may potentially make a great contribution to the field of automated acknowledgment analysis.Danksagungen in wissenschaftlichen Arbeiten können einen Einblick in Aspekte der wissenschaftlichen Gemeinschaft geben, wie z.B. Belohnungssysteme, Kooperationsmuster und versteckte Forschungstrends. Das Ziel dieser Arbeit ist es, die Leistung verschiedener Einbettungsmodelle für die Aufgabe der automatischen Extraktion und Klassifizierung von anerkannten Entitäten aus dem Danksagungstext in wissenschaftlichen Arbeiten zu bewerten