8,324 research outputs found

    Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity

    Full text link
    We present a new scientific document similarity model based on matching fine-grained aspects of texts. To train our model, we exploit a naturally-occurring source of supervision: sentences in the full-text of papers that cite multiple papers together (co-citations). Such co-citations not only reflect close paper relatedness, but also provide textual descriptions of how the co-cited papers are related. This novel form of textual supervision is used for learning to match aspects across papers. We develop multi-vector representations where vectors correspond to sentence-level aspects of documents, and present two methods for aspect matching: (1) A fast method that only matches single aspects, and (2) a method that makes sparse multiple matches with an Optimal Transport mechanism that computes an Earth Mover's Distance between aspects. Our approach improves performance on document similarity tasks in four datasets. Further, our fast single-match method achieves competitive results, paving the way for applying fine-grained similarity to large scientific corpora. Code, data, and models available at: https://github.com/allenai/aspireComment: NAACL 2022 camera-read

    Science Models as Value-Added Services for Scholarly Information Systems

    Full text link
    The paper introduces scholarly Information Retrieval (IR) as a further dimension that should be considered in the science modeling debate. The IR use case is seen as a validation model of the adequacy of science models in representing and predicting structure and dynamics in science. Particular conceptualizations of scholarly activity and structures in science are used as value-added search services to improve retrieval quality: a co-word model depicting the cognitive structure of a field (used for query expansion), the Bradford law of information concentration, and a model of co-authorship networks (both used for re-ranking search results). An evaluation of the retrieval quality when science model driven services are used turned out that the models proposed actually provide beneficial effects to retrieval quality. From an IR perspective, the models studied are therefore verified as expressive conceptualizations of central phenomena in science. Thus, it could be shown that the IR perspective can significantly contribute to a better understanding of scholarly structures and activities.Comment: 26 pages, to appear in Scientometric

    Semantic Modelling of Citation Contexts for Context-Aware Citation Recommendation

    Get PDF
    Contents The four CSV files are the data used for the evaluation in: Saier T., Färber M. (2020) Semantic Modelling of Citation Contexts for Context-Aware Citation Recommendation. In: Advances in Information Retrieval. ECIR 2020. Lecture Notes in Computer Science, vol 12035. DOI: 10.1007/978-3-030-45439-5_15 Code: github.com/IllDepence/ecir2020 The evaluation was conducted in a citation re-prediction setting. CSV Format 7 columns divided by \u241E cited document ID for *_nomarker.csv: citation marker position ambiguous for *_withmarker.csv: citation marker position at 'MAINCIT' in citation context adjacent cited document IDs only given in citrec_unarxive_*.csv divided by \u241F order matches 'CIT' markers in citation context citing document ID citation context MAG field of study IDs divided by \u241F predicate:argument tuples generated based on PredPatt JSON noun phrases for *_nomarker.csv: divided by \u241F for *_withmarker.csv: divided by \u241D into noun phrases noun phrase directly preceding citation marker Data Sources citrec_unarxive_cs_withmarker.csv data set unarXive Paper DOI: 10.1007/s11192-020-03382-z Data DOI: 10.5281/zenodo.2553522 filter citing doc from computer science cited doc is cited at least 5 times citrec_mag_cs_en.csv data set Microsoft Academic Graph (MAG) Paper DOI: 10.1145/2740908.2742839 filter citing doc from computer science and in English citing doc abstract in MAG given cited doc is cited at least 50 times citrec_refseer.csv data set RefSeer Paper URL: ojs.aaai.org/index.php/AAAI/article/view/9528 Data URL: psu.app.box.com/v/refseer filter for citing and cited docs title, venue, venuetype, abstract, and year not NULL citrec_acl-arc_withmarker.csv data set ACL ARC Paper URL: aclanthology.org/L08-1005 Data URL: acl-arc.comp.nus.edu.sg/ filter cited doc has a DBLP ID Paper Citation @inproceedings{Saier2020ECIR, author = {Tarek Saier and Michael F{\"{a}}rber}, title = {{Semantic Modelling of Citation Contexts for Context-aware Citation Recommendation}}, booktitle = {Proceedings of the 42nd European Conference on Information Retrieval}, pages = {220--233}, year = {2020}, month = apr, doi = {10.1007/978-3-030-45439-5_15},

    Citation recommendation: approaches and datasets

    Get PDF
    Citation recommendation describes the task of recommending citations for a given text. Due to the overload of published scientific works in recent years on the one hand, and the need to cite the most appropriate publications when writing scientific texts on the other hand, citation recommendation has emerged as an important research topic. In recent years, several approaches and evaluation data sets have been presented. However, to the best of our knowledge, no literature survey has been conducted explicitly on citation recommendation. In this article, we give a thorough introduction to automatic citation recommendation research. We then present an overview of the approaches and data sets for citation recommendation and identify differences and commonalities using various dimensions. Last but not least, we shed light on the evaluation methods and outline general challenges in the evaluation and how to meet them. We restrict ourselves to citation recommendation for scientific publications, as this document type has been studied the most in this area. However, many of the observations and discussions included in this survey are also applicable to other types of text, such as news articles and encyclopedic articles
    corecore