Search CORE

2,473 research outputs found

WikiM: Metapaths based Wikification of Scientific Abstracts

Author: Goyal Pawan
Jana Abhik
Mooriyath Sruthi
Mukherjee Animesh
Publication venue
Publication date: 09/05/2017
Field of study

In order to disseminate the exponential extent of knowledge being produced in the form of scientific publications, it would be best to design mechanisms that connect it with already existing rich repository of concepts -- the Wikipedia. Not only does it make scientific reading simple and easy (by connecting the involved concepts used in the scientific articles to their Wikipedia explanations) but also improves the overall quality of the article. In this paper, we present a novel metapath based method, WikiM, to efficiently wikify scientific abstracts -- a topic that has been rarely investigated in the literature. One of the prime motivations for this work comes from the observation that, wikified abstracts of scientific documents help a reader to decide better, in comparison to the plain abstracts, whether (s)he would be interested to read the full article. We perform mention extraction mostly through traditional tf-idf measures coupled with a set of smart filters. The entity linking heavily leverages on the rich citation and author publication networks. Our observation is that various metapaths defined over these networks can significantly enhance the overall performance of the system. For mention extraction and entity linking, we outperform most of the competing state-of-the-art techniques by a large margin arriving at precision values of 72.42% and 73.8% respectively over a dataset from the ACL Anthology Network. In order to establish the robustness of our scheme, we wikify three other datasets and get precision values of 63.41%-94.03% and 67.67%-73.29% respectively for the mention extraction and the entity linking phase

arXiv.org e-Print Archive

Crossref

Pair-Linking for Collective Entity Disambiguation: Two Could Be Better Than All

Author: Han Jialong
Li Chenliang
Phan Minh C.
Sun Aixin
Tay Yi
Publication venue
Publication date: 01/01/2018
Field of study

Collective entity disambiguation aims to jointly resolve multiple mentions by linking them to their associated entities in a knowledge base. Previous works are primarily based on the underlying assumption that entities within the same document are highly related. However, the extend to which these mentioned entities are actually connected in reality is rarely studied and therefore raises interesting research questions. For the first time, we show that the semantic relationships between the mentioned entities are in fact less dense than expected. This could be attributed to several reasons such as noise, data sparsity and knowledge base incompleteness. As a remedy, we introduce MINTREE, a new tree-based objective for the entity disambiguation problem. The key intuition behind MINTREE is the concept of coherence relaxation which utilizes the weight of a minimum spanning tree to measure the coherence between entities. Based on this new objective, we design a novel entity disambiguation algorithms which we call Pair-Linking. Instead of considering all the given mentions, Pair-Linking iteratively selects a pair with the highest confidence at each step for decision making. Via extensive experiments, we show that our approach is not only more accurate but also surprisingly faster than many state-of-the-art collective linking algorithms

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Diffusion of scientific credits and the ranking of scientists

Author: Alessandro Vespignani
Benjamin Markines
C. Castillo
E. Garfield
E. Garfield
Filippo Radicchi
L. Egghe
Santo Fortunato
Publication venue: 'American Physical Society (APS)'
Publication date: 23/09/2009
Field of study

Recently, the abundance of digital data enabled the implementation of graph based ranking algorithms that provide system level analysis for ranking publications and authors. Here we take advantage of the entire Physical Review publication archive (1893-2006) to construct authors' networks where weighted edges, as measured from opportunely normalized citation counts, define a proxy for the mechanism of scientific credit transfer. On this network we define a ranking method based on a diffusion algorithm that mimics the spreading of scientific credits on the network. We compare the results obtained with our algorithm with those obtained by local measures such as the citation count and provide a statistical analysis of the assignment of major career awards in the area of Physics. A web site where the algorithm is made available to perform customized rank analysis can be found at the address http://www.physauthorsrank.orgComment: Revised version. 11 pages, 10 figures, 1 table. The portal to compute the rankings of scientists is at http://www.physauthorsrank.or

arXiv.org e-Print Archive

Crossref

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Large-scale event extraction from literature with multi-level gene normalization

Author: Ananiadou Sophia
Bjorne Jari
Ginter Filip
Hakala Kai
Kao Hung-Yu
Lu Zhiyong
Pyysalo Sampo
Salakoski Tapio
Van de Peer Yves
Van Landeghem Sofie
Wei Chih-Hsuan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families. To this end, we have combined two state-of-the-art text mining components, previously evaluated on two community-wide challenges, and have extended and improved upon these methods by exploiting their complementary nature. Using these systems, we perform normalization and event extraction to create a large-scale resource that is publicly available, unique in semantic scope, and covers all 21.9 million PubMed abstracts and 460 thousand PubMed Central open access full-text articles. This dataset contains 40 million biomolecular events involving 76 million gene/protein mentions, linked to 122 thousand distinct genes from 5032 species across the full taxonomic tree. Detailed evaluations and analyses reveal promising results for application of this data in database and pathway curation efforts. The main software components used in this study are released under an open-source license. Further, the resulting dataset is freely accessible through a novel API, providing programmatic and customized access (http://www.evexdb.org/api/v001/). Finally, to allow for large-scale bioinformatic analyses, the entire resource is available for bulk download from http://evexdb.org/download/, under the Creative Commons -Attribution - Share Alike (CC BY-SA) license

Ghent University Academic Bibliography

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

FigShare

Author identification in bibliographic data using deep neural networks

Author: Darmawahyuni Annisa
Firdaus Firdaus
Juliano Andre Herviant
Malik Reza Firsandaya
Nugraha Tio Artha
Nurmaini Siti
Putra Varindo Ockta Keneddi
Rachmatullah Muhammad Naufal
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/06/2021
Field of study

Author name disambiguation (AND) is a challenging task for scholars who mine bibliographic information for scientific knowledge. A constructive approach for resolving name ambiguity is to use computer algorithms to identify author names. Some algorithm-based disambiguation methods have been developed by computer and data scientists. Among them, supervised machine learning has been stated to produce decent to very accurate disambiguation results. This paper presents a combination of principal component analysis (PCA) as a feature reduction and deep neural networks (DNNs), as a supervised algorithm for classifying AND problems. The raw data is grouped into four classes, i.e., synonyms, homonyms, homonyms-synonyms, and non-homonyms-synonyms classification. We have taken into account several hyperparameters tuning, such as learning rate, batch size, number of the neuron and hidden units, and analyzed their impact on the accuracy of results. To the best of our knowledge, there are no previous studies with such a scheme. The proposed DNNs are validated with other ML techniques such as Naïve Bayes, random forest (RF), and support vector machine (SVM) to produce a good classifier. By exploring the result in all data, our proposed DNNs classifier has an outperformed other ML technique, with accuracy, precision, recall, and F1-score, which is 99.98%, 97.98%, 97.86%, and 99.99%, respectively. In the future, this approach can be easily extended to any dataset and any bibliographic records provider

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System