8,324 research outputs found
Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity
We present a new scientific document similarity model based on matching
fine-grained aspects of texts. To train our model, we exploit a
naturally-occurring source of supervision: sentences in the full-text of papers
that cite multiple papers together (co-citations). Such co-citations not only
reflect close paper relatedness, but also provide textual descriptions of how
the co-cited papers are related. This novel form of textual supervision is used
for learning to match aspects across papers. We develop multi-vector
representations where vectors correspond to sentence-level aspects of
documents, and present two methods for aspect matching: (1) A fast method that
only matches single aspects, and (2) a method that makes sparse multiple
matches with an Optimal Transport mechanism that computes an Earth Mover's
Distance between aspects. Our approach improves performance on document
similarity tasks in four datasets. Further, our fast single-match method
achieves competitive results, paving the way for applying fine-grained
similarity to large scientific corpora. Code, data, and models available at:
https://github.com/allenai/aspireComment: NAACL 2022 camera-read
Science Models as Value-Added Services for Scholarly Information Systems
The paper introduces scholarly Information Retrieval (IR) as a further
dimension that should be considered in the science modeling debate. The IR use
case is seen as a validation model of the adequacy of science models in
representing and predicting structure and dynamics in science. Particular
conceptualizations of scholarly activity and structures in science are used as
value-added search services to improve retrieval quality: a co-word model
depicting the cognitive structure of a field (used for query expansion), the
Bradford law of information concentration, and a model of co-authorship
networks (both used for re-ranking search results). An evaluation of the
retrieval quality when science model driven services are used turned out that
the models proposed actually provide beneficial effects to retrieval quality.
From an IR perspective, the models studied are therefore verified as expressive
conceptualizations of central phenomena in science. Thus, it could be shown
that the IR perspective can significantly contribute to a better understanding
of scholarly structures and activities.Comment: 26 pages, to appear in Scientometric
Semantic Modelling of Citation Contexts for Context-Aware Citation Recommendation
Contents
The four CSV files are the data used for the evaluation in:
Saier T., Färber M. (2020) Semantic Modelling of Citation Contexts for Context-Aware Citation Recommendation. In: Advances in Information Retrieval. ECIR 2020. Lecture Notes in Computer Science, vol 12035.
DOI: 10.1007/978-3-030-45439-5_15
Code: github.com/IllDepence/ecir2020
The evaluation was conducted in a citation re-prediction setting.
CSV Format
7 columns divided by \u241E
cited document ID
for *_nomarker.csv: citation marker position ambiguous
for *_withmarker.csv: citation marker position at 'MAINCIT' in citation context
adjacent cited document IDs
only given in citrec_unarxive_*.csv
divided by \u241F
order matches 'CIT' markers in citation context
citing document ID
citation context
MAG field of study IDs
divided by \u241F
predicate:argument tuples generated based on PredPatt
JSON
noun phrases
for *_nomarker.csv: divided by \u241F
for *_withmarker.csv:
divided by \u241D into
noun phrases
noun phrase directly preceding citation marker
Data Sources
citrec_unarxive_cs_withmarker.csv
data set
unarXive
Paper DOI: 10.1007/s11192-020-03382-z
Data DOI: 10.5281/zenodo.2553522
filter
citing doc from computer science
cited doc is cited at least 5 times
citrec_mag_cs_en.csv
data set
Microsoft Academic Graph (MAG)
Paper DOI: 10.1145/2740908.2742839
filter
citing doc from computer science and in English
citing doc abstract in MAG given
cited doc is cited at least 50 times
citrec_refseer.csv
data set
RefSeer
Paper URL: ojs.aaai.org/index.php/AAAI/article/view/9528
Data URL: psu.app.box.com/v/refseer
filter
for citing and cited docs title, venue, venuetype, abstract, and year not NULL
citrec_acl-arc_withmarker.csv
data set
ACL ARC
Paper URL: aclanthology.org/L08-1005
Data URL: acl-arc.comp.nus.edu.sg/
filter
cited doc has a DBLP ID
Paper Citation
@inproceedings{Saier2020ECIR,
author = {Tarek Saier and
Michael F{\"{a}}rber},
title = {{Semantic Modelling of Citation Contexts for Context-aware Citation Recommendation}},
booktitle = {Proceedings of the 42nd European Conference on Information Retrieval},
pages = {220--233},
year = {2020},
month = apr,
doi = {10.1007/978-3-030-45439-5_15},
Citation recommendation: approaches and datasets
Citation recommendation describes the task of recommending citations for a given text. Due to the overload of published scientific works in recent years on the one hand, and the need to cite the most appropriate publications when writing scientific texts on the other hand, citation recommendation has emerged as an important research topic. In recent years, several approaches and evaluation data sets have been presented. However, to the best of our knowledge, no literature survey has been conducted explicitly on citation recommendation. In this article, we give a thorough introduction to automatic citation recommendation research. We then present an overview of the approaches and data sets for citation recommendation and identify differences and commonalities using various dimensions. Last but not least, we shed light on the evaluation methods and outline general challenges in the evaluation and how to meet them. We restrict ourselves to citation recommendation for scientific publications, as this document type has been studied the most in this area. However, many of the observations and discussions included in this survey are also applicable to other types of text, such as news articles and encyclopedic articles
- …