79 research outputs found
Construction of the Literature Graph in Semantic Scholar
We describe a deployed scalable system for organizing published scientific
literature into a heterogeneous graph to facilitate algorithmic manipulation
and discovery. The resulting literature graph consists of more than 280M nodes,
representing papers, authors, entities and various interactions between them
(e.g., authorships, citations, entity mentions). We reduce literature graph
construction into familiar NLP tasks (e.g., entity extraction and linking),
point out research challenges due to differences from standard formulations of
these tasks, and report empirical results for each task. The methods described
in this paper are used to enable semantic features in www.semanticscholar.orgComment: To appear in NAACL 2018 industry trac
High-Precision Extraction of Emerging Concepts from Scientific Literature
Identification of new concepts in scientific literature can help power
faceted search, scientific trend analysis, knowledge-base construction, and
more, but current methods are lacking. Manual identification cannot keep up
with the torrent of new publications, while the precision of existing automatic
techniques is too low for many applications. We present an unsupervised concept
extraction method for scientific literature that achieves much higher precision
than previous work. Our approach relies on a simple but novel intuition: each
scientific concept is likely to be introduced or popularized by a single paper
that is disproportionately cited by subsequent papers mentioning the concept.
From a corpus of computer science papers on arXiv, we find that our method
achieves a Precision@1000 of 99%, compared to 86% for prior work, and a
substantially better precision-yield trade-off across the top 15,000
extractions. To stimulate research in this area, we release our code and data
(https://github.com/allenai/ForeCite).Comment: Accepted to SIGIR 202
Growing Attributed Networks through Local Processes
This paper proposes an attributed network growth model. Despite the knowledge
that individuals use limited resources to form connections to similar others,
we lack an understanding of how local and resource-constrained mechanisms
explain the emergence of rich structural properties found in real-world
networks. We make three contributions. First, we propose a parsimonious and
accurate model of attributed network growth that jointly explains the emergence
of in-degree distributions, local clustering, clustering-degree relationship
and attribute mixing patterns. Second, our model is based on biased random
walks and uses local processes to form edges without recourse to global network
information. Third, we account for multiple sociological phenomena: bounded
rationality, structural constraints, triadic closure, attribute homophily, and
preferential attachment. Our experiments indicate that the proposed Attributed
Random Walk (ARW) model accurately preserves network structure and attribute
mixing patterns of six real-world networks; it improves upon the performance of
eight state-of-the-art models by a statistically significant margin of 2.5-10x.Comment: 11 pages, 13 figure
- …