250,637 research outputs found
WikiM: Metapaths based Wikification of Scientific Abstracts
In order to disseminate the exponential extent of knowledge being produced in
the form of scientific publications, it would be best to design mechanisms that
connect it with already existing rich repository of concepts -- the Wikipedia.
Not only does it make scientific reading simple and easy (by connecting the
involved concepts used in the scientific articles to their Wikipedia
explanations) but also improves the overall quality of the article. In this
paper, we present a novel metapath based method, WikiM, to efficiently wikify
scientific abstracts -- a topic that has been rarely investigated in the
literature. One of the prime motivations for this work comes from the
observation that, wikified abstracts of scientific documents help a reader to
decide better, in comparison to the plain abstracts, whether (s)he would be
interested to read the full article. We perform mention extraction mostly
through traditional tf-idf measures coupled with a set of smart filters. The
entity linking heavily leverages on the rich citation and author publication
networks. Our observation is that various metapaths defined over these networks
can significantly enhance the overall performance of the system. For mention
extraction and entity linking, we outperform most of the competing
state-of-the-art techniques by a large margin arriving at precision values of
72.42% and 73.8% respectively over a dataset from the ACL Anthology Network. In
order to establish the robustness of our scheme, we wikify three other datasets
and get precision values of 63.41%-94.03% and 67.67%-73.29% respectively for
the mention extraction and the entity linking phase
The Closer the Better: Similarity of Publication Pairs at Different Co-Citation Levels
We investigate the similarities of pairs of articles which are co-cited at
the different co-citation levels of the journal, article, section, paragraph,
sentence and bracket. Our results indicate that textual similarity,
intellectual overlap (shared references), author overlap (shared authors),
proximity in publication time all rise monotonically as the co-citation level
gets lower (from journal to bracket). While the main gain in similarity happens
when moving from journal to article co-citation, all level changes entail an
increase in similarity, especially section to paragraph and paragraph to
sentence/bracket levels. We compare results from four journals over the years
2010-2015: Cell, the European Journal of Operational Research, Physics Letters
B and Research Policy, with consistent general outcomes and some interesting
differences. Our findings motivate the use of granular co-citation information
as defined by meaningful units of text, with implications for, among others,
the elaboration of maps of science and the retrieval of scholarly literature
Identifying Overlapping and Hierarchical Thematic Structures in Networks of Scholarly Papers: A Comparison of Three Approaches
We implemented three recently proposed approaches to the identification of
overlapping and hierarchical substructures in graphs and applied the
corresponding algorithms to a network of 492 information-science papers coupled
via their cited sources. The thematic substructures obtained and overlaps
produced by the three hierarchical cluster algorithms were compared to a
content-based categorisation, which we based on the interpretation of titles
and keywords. We defined sets of papers dealing with three topics located on
different levels of aggregation: h-index, webometrics, and bibliometrics. We
identified these topics with branches in the dendrograms produced by the three
cluster algorithms and compared the overlapping topics they detected with one
another and with the three pre-defined paper sets. We discuss the advantages
and drawbacks of applying the three approaches to paper networks in research
fields.Comment: 18 pages, 9 figure
- …