Search CORE

79 research outputs found

Construction of the Literature Graph in Semantic Scholar

Author: Ammar Waleed
Beltagy Iz
Bhagavatula Chandra
Crawford Miles
Downey Doug
Dunkelberger Jason
Elgohary Ahmed
Etzioni Oren
Feldman Sergey
Groeneveld Dirk
Ha Vu
Kinney Rodney
Kohlmeier Sebastian
Lo Kyle
Murray Tyler
Ooi Hsu-Han
Peters Matthew
Power Joanna
Skjonsberg Sam
van Zuylen Madeleine
Wang Lucy Lu
Wilhelm Chris
Yuan Zheng
Publication venue
Publication date: 01/01/2018
Field of study

We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph consists of more than 280M nodes, representing papers, authors, entities and various interactions between them (e.g., authorships, citations, entity mentions). We reduce literature graph construction into familiar NLP tasks (e.g., entity extraction and linking), point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task. The methods described in this paper are used to enable semantic features in www.semanticscholar.orgComment: To appear in NAACL 2018 industry trac

arXiv.org e-Print Archive

Crossref

High-Precision Extraction of Emerging Concepts from Scientific Literature

Author: Devlin Jacob
Goodfellow Ian
He Xiangnan
Jo Yookyung
Mesbah Sepideh
Mihalcea Rada
Peters Matthew E
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/06/2020
Field of study

Identification of new concepts in scientific literature can help power faceted search, scientific trend analysis, knowledge-base construction, and more, but current methods are lacking. Manual identification cannot keep up with the torrent of new publications, while the precision of existing automatic techniques is too low for many applications. We present an unsupervised concept extraction method for scientific literature that achieves much higher precision than previous work. Our approach relies on a simple but novel intuition: each scientific concept is likely to be introduced or popularized by a single paper that is disproportionately cited by subsequent papers mentioning the concept. From a corpus of computer science papers on arXiv, we find that our method achieves a Precision@1000 of 99%, compared to 86% for prior work, and a substantially better precision-yield trade-off across the top 15,000 extractions. To stimulate research in this area, we release our code and data (https://github.com/allenai/ForeCite).Comment: Accepted to SIGIR 202

arXiv.org e-Print Archive

Crossref

Growing Attributed Networks through Local Processes

Author: Kumar Suhansanu
Shah Harshay
Sundaram Hari
Publication venue
Publication date: 01/01/2019
Field of study

This paper proposes an attributed network growth model. Despite the knowledge that individuals use limited resources to form connections to similar others, we lack an understanding of how local and resource-constrained mechanisms explain the emergence of rich structural properties found in real-world networks. We make three contributions. First, we propose a parsimonious and accurate model of attributed network growth that jointly explains the emergence of in-degree distributions, local clustering, clustering-degree relationship and attribute mixing patterns. Second, our model is based on biased random walks and uses local processes to form edges without recourse to global network information. Third, we account for multiple sociological phenomena: bounded rationality, structural constraints, triadic closure, attribute homophily, and preferential attachment. Our experiments indicate that the proposed Attributed Random Walk (ARW) model accurately preserves network structure and attribute mixing patterns of six real-world networks; it improves upon the performance of eight state-of-the-art models by a statistically significant margin of 2.5-10x.Comment: 11 pages, 13 figure

arXiv.org e-Print Archive

Crossref