Search CORE

136,649 research outputs found

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Mapping Subsets of Scholarly Information

Author: Ginsparg Paul
Houle Paul
Joachims Thorsten
Sul Jae-Hoon
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2003
Field of study

We illustrate the use of machine learning techniques to analyze, structure, maintain, and evolve a large online corpus of academic literature. An emerging field of research can be identified as part of an existing corpus, permitting the implementation of a more coherent community structure for its practitioners.Comment: 10 pages, 4 figures, presented at Arthur M. Sackler Colloquium on "Mapping Knowledge Domains", 9--11 May 2003, Beckman Center, Irvine, CA, proceedings to appear in PNA

arXiv.org e-Print Archive

CiteSeerX

Crossref

PubMed Central

Large-Scale Online Semantic Indexing of Biomedical Articles via an Ensemble of Multi-Label Classification Models

Author: Laliotis Manos
Markantonatos Nikos
Papanikolaou Yannis
Tsoumakas Grigorios
Vlahavas Ioannis
Publication venue
Publication date: 18/04/2017
Field of study

Background: In this paper we present the approaches and methods employed in order to deal with a large scale multi-label semantic indexing task of biomedical papers. This work was mainly implemented within the context of the BioASQ challenge of 2014. Methods: The main contribution of this work is a multi-label ensemble method that incorporates a McNemar statistical significance test in order to validate the combination of the constituent machine learning algorithms. Some secondary contributions include a study on the temporal aspects of the BioASQ corpus (observations apply also to the BioASQ's super-set, the PubMed articles collection) and the proper adaptation of the algorithms used to deal with this challenging classification task. Results: The ensemble method we developed is compared to other approaches in experimental scenarios with subsets of the BioASQ corpus giving positive results. During the BioASQ 2014 challenge we obtained the first place during the first batch and the third in the two following batches. Our success in the BioASQ challenge proved that a fully automated machine-learning approach, which does not implement any heuristics and rule-based approaches, can be highly competitive and outperform other approaches in similar challenging contexts

arXiv.org e-Print Archive

Directory of Open Access Journals

Enriching Knowledge Bases with Counting Quantifiers

Author: F Darari
HT Dang
L Galárraga
Marc Denecker
S Auer
S Neumaier
S Riedel
XL Dong
Publication venue
Publication date: 01/01/2018
Field of study

Information extraction traditionally focuses on extracting relations between identifiable entities, such as . Yet, texts often also contain Counting information, stating that a subject is in a specific relation with a number of objects, without mentioning the objects themselves, for example, "California is divided into 58 counties". Such counting quantifiers can help in a variety of tasks such as query answering or knowledge base curation, but are neglected by prior work. This paper develops the first full-fledged system for extracting counting information from text, called CINEX. We employ distant supervision using fact counts from a knowledge base as training seeds, and develop novel techniques for dealing with several challenges: (i) non-maximal training seeds due to the incompleteness of knowledge bases, (ii) sparse and skewed observations in text sources, and (iii) high diversity of linguistic patterns. Experiments with five human-evaluated relations show that CINEX can achieve 60% average precision for extracting counting information. In a large-scale experiment, we demonstrate the potential for knowledge base enrichment by applying CINEX to 2,474 frequent relations in Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct relations, which is 28% more than the existing Wikidata facts for these relations.Comment: 16 pages, The 17th International Semantic Web Conference (ISWC 2018

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Recommended from our members

The Evolution of Student Engagement: Writing Improves Teaching in Introductory Biology

Author: Camfield Eileen
Land Kirkwood
Publication venue: eScholarship, University of California
Publication date: 28/06/2021
Field of study

In response to calls for pedagogical reforms in undergraduate biology courses to decrease student attrition rates and increase active learning, this article describes one faculty member’s conversion from traditional teaching methods to more engaging forms of practice. Partially told as a narrative, this article illustrates a.) the way many faculty initially learn to teach by modeling the pedagogy from their own undergraduate programs; b.) the kind of support biology faculty may need to break out of traditional molds; c.) how writing can promote active learning; and d.) the impact of reformed pedagogy on student levels of engagement. The latter will be demonstrated through assessment results gathered from student surveys, reflective writing, and focus group interview. Ultimately, the study challenges misunderstandings some faculty might have regarding the value of writing in science classes and offers inspiration, urging critical reflection and persistence

eScholarship - University of California

Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks

Author: Abi-Haidar Alaa
Kaur Jasleen
Maguitman Ana G.
Radivojac Predrag
Retchsteiner Andreas
Rocha Luis M.
Verspoor Karin
Wang Zhiping
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (IAS), discovery of protein pairs (IPS) and text passages characterizing protein interaction (ISS) in full text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam-detection techniques, as well as an uncertainty-based integration scheme. We also used a Support Vector Machine and the Singular Value Decomposition on the same features for comparison purposes. Our approach to the full text subtasks (protein pair and passage identification) includes a feature expansion method based on word-proximity networks. Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of the measures of performance used in the challenge evaluation (accuracy, F-score and AUC). We also report on a web-tool we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Our approach to abstract classification shows that a simple linear model, using relatively few features, is capable of generalizing and uncovering the conceptual nature of protein-protein interaction from the bibliome. Since the novel approach is based on a very lightweight linear model, it can be easily ported and applied to similar problems. In full text problems, the expansion of word features with word-proximity networks is shown to be useful, though the need for some improvements is discussed

arXiv.org e-Print Archive

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Springer - Publisher Connector

PubMed Central

University of Melbourne Institutional Repository