4 research outputs found
ScholarSight: Visualizing Temporal Trends of Scientific Concepts
2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL): June 2 2019 to June 6 2019 Champaign, IL, USA.In this paper, we present a system for exploring the temporal trends of scientific concepts. Scientific concepts were captured by extracting noun phrases and entities from all computer science papers of arXiv.org. Our system allows users to review the time series of numerous concepts and to identify positively and negatively trending concepts. By applying clustering techniques and cluster analysis visualizations, it can also present concepts which share the same usage patterns over time. Our system can be beneficial for both ordinary researchers of any field and for researchers working in bibliometrics and scientometrics in order to investigate the evolution of scientific concepts
Citation Recommendation on Scholarly Legal Articles
Citation recommendation is the task of finding appropriate citations based on
a given piece of text. The proposed datasets for this task consist mainly of
several scientific fields, lacking some core ones, such as law. Furthermore,
citation recommendation is used within the legal domain to identify supporting
arguments, utilizing non-scholarly legal articles. In order to alleviate the
limitations of existing studies, we gather the first scholarly legal dataset
for the task of citation recommendation. Also, we conduct experiments with
state-of-the-art models and compare their performance on this dataset. The
study suggests that, while BM25 is a strong benchmark for the legal citation
recommendation task, the most effective method involves implementing a two-step
process that entails pre-fetching with BM25+, followed by re-ranking with
SciNCL, which enhances the performance of the baseline from 0.26 to 0.30
MAP@10. Moreover, fine-tuning leads to considerable performance increases in
pre-trained models, which shows the importance of including legal articles in
the training data of these models.Comment: Seventeenth International Workshop on Juris-informatics (JURISIN
2023
unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata
In recent years, scholarly data sets have been used for various purposes, such as paper recommendation, citation recommendation, citation context analysis, and citation context-based document summarization. The evaluation of approaches to such tasks and their applicability in real-world scenarios heavily depend on the used data set. However, existing scholarly data sets are limited in several regards.
Here, we propose a new data set based on all publications from all scientific disciplines available on arXiv.org. Apart from providing the papers' plain text, in-text citations were annotated via global identifiers. Furthermore, citing and cited publications were linked to the Microsoft Academic Graph, providing access to rich metadata. Our data set consists of over one million documents and 29.2 million citation contexts. The data set, which is made freely available for research purposes, not only can enhance the future evaluation of research paper-based and citation context-based approaches but also serve as a basis for new ways to analyze in-text citations.
See https://github.com/IllDepence/unarXive for the source code which has been used for creating the data set.
For citing our data set and for further information we can refer to our journal article
Tarek Saier, Michael Färber: "unarXive: A Large Scholarly Data Set with Publications’ Full-Text, Annotated In-Text Citations, and Links to Metadata", Scientometrics, 2020, http://dx.doi.org/10.1007/s11192-020-03382-z
Citation Recommendation: Approaches and Datasets
Citation recommendation describes the task of recommending citations for a
given text. Due to the overload of published scientific works in recent years
on the one hand, and the need to cite the most appropriate publications when
writing scientific texts on the other hand, citation recommendation has emerged
as an important research topic. In recent years, several approaches and
evaluation data sets have been presented. However, to the best of our
knowledge, no literature survey has been conducted explicitly on citation
recommendation. In this article, we give a thorough introduction into automatic
citation recommendation research. We then present an overview of the approaches
and data sets for citation recommendation and identify differences and
commonalities using various dimensions. Last but not least, we shed light on
the evaluation methods, and outline general challenges in the evaluation and
how to meet them. We restrict ourselves to citation recommendation for
scientific publications, as this document type has been studied the most in
this area. However, many of the observations and discussions included in this
survey are also applicable to other types of text, such as news articles and
encyclopedic articles.Comment: to be published in the International Journal on Digital Librarie