Search CORE

2,758 research outputs found

Semi-automated screening of biomedical citations for systematic reviews

Author: A Aronson
A Blum
A Cohen
A Wilcox
B Settles
B Wallace
Byron C Wallace
C Blake
C Cole
C Counsell
Carla Brodley
Chih-Chung
Christopher H Schmid
CJL Chih-Wei Hsu
D Chen
DD Lewis
E Perrin
F Camous
G Druck
G Schohn
H Kilicoglu
Joseph Lau
K Brinker
KS Goh
KS Jones
L Breiman
L Hunter
M Barza
M Chung
M Yetisgen-Yildiz
N Japkowicz
P Wheeler
P Zweigenbaum
S Dasgupta
S Ertekin
S Kotsiantis
S Tong
T Joachims
T Terasawa
Thomas A Trikalinos
VN Vapnik
W Yu
Y Aphinyanaphongs
YAC Aphinyanaphongs
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Automatic document classification of biological literature

Author: Chen David
Muller Hans-Michael
Sternberg Paul W.
Publication venue
Publication date: 01/08/2006
Field of study

Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature. Results: We present a two-step text categorization algorithm to classify a corpus of C. elegans papers. Our classification method first uses a support vector machine-trained classifier, followed by a novel, phrase-based clustering algorithm. This clustering step autonomously creates cluster labels that are descriptive and understandable by humans. This clustering engine performed better on a standard test-set (Reuters 21578) compared to previously published results (F-value of 0.55 vs. 0.49), while producing cluster descriptions that appear more useful. A web interface allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept. Conclusions: We have demonstrated a simple method to classify biological documents that embodies an improvement over current methods. While the classification results are currently optimized for Caenorhabditis elegans papers by human-created rules, the classification engine can be adapted to different types of documents. We have demonstrated this by presenting a web interface that allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Caltech Authors

Deep Probabilistic Logic: A Unifying Framework for Indirect Supervision

Author: Poon Hoifung
Wang Hai
Publication venue
Publication date: 01/01/2018
Field of study

Deep learning has emerged as a versatile tool for a wide range of NLP tasks, due to its superior capacity in representation learning. But its applicability is limited by the reliance on annotated examples, which are difficult to produce at scale. Indirect supervision has emerged as a promising direction to address this bottleneck, either by introducing labeling functions to automatically generate noisy examples from unlabeled text, or by imposing constraints over interdependent label decisions. A plethora of methods have been proposed, each with respective strengths and limitations. Probabilistic logic offers a unifying language to represent indirect supervision, but end-to-end modeling with probabilistic logic is often infeasible due to intractable inference and learning. In this paper, we propose deep probabilistic logic (DPL) as a general framework for indirect supervision, by composing probabilistic logic with deep learning. DPL models label decisions as latent variables, represents prior knowledge on their relations using weighted first-order logical formulas, and alternates between learning a deep neural network for the end task and refining uncertain formula weights for indirect supervision, using variational EM. This framework subsumes prior indirect supervision methods as special cases, and enables novel combination via infusion of rich domain and linguistic knowledge. Experiments on biomedical machine reading demonstrate the promise of this approach.Comment: EMNLP 2018 final versio

arXiv.org e-Print Archive

Crossref

Citation Function and Polarity Classification in Biomedical Papers

Author: Jia Meng
Publication venue: Scholarship@Western
Publication date: 18/04/2018
Field of study

The traditional reference evaluation method treats all citations equally. However, a citation can serve various functions. It may reflect the citing paper author’s motivation as well as his/her true attitude towards the cited paper. Investigating such information can be achieved through citation content analysis. This thesis develops an 8-category classification scheme on citation function and polarity to help understand what role a citation played in scientific papers. A biomedical citation corpus is annotated with this scheme and experimented with supervised machine learning methods. Several types of features that capture the characteristics of citation sentences are extracted by natural language processing techniques to serve as the inputs of automatic classifiers. The importance of cue phrases in citation classification is also addressed and discussed

Scholarship@Western

Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

Author: Habib Mena B.
Keulen Maurice van
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2011
Field of study

Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover

Maastricht University Research Portal

University of Twente Research Information