Search CORE

100 research outputs found

Linguistic feature analysis for protein interaction extraction

Author: A Airola
A Moschitti
A Yakushiji
B Schölkopf
C Cortes
C Giuliano
C Nedellec
CC Chang
Chris Cornelis
D Haussler
H Lodhi
J Ding
J Xiao
JH Eom
K Fundel
M Collins
Martine De Cock
MF Porter
R Bunescu
R Saetre
RC Bunescu
S Katrenko
S Kim
S Pyysalo
S Pyysalo
S Van Landeghem
T Fayruzov
T Fayruzov
Timur Fayruzov
Veronique Hoste
Y Saeys
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The rapid growth of the amount of publicly available reports on biomedical experimental results has recently caused a boost of text mining approaches for protein interaction extraction. Most approaches rely implicitly or explicitly on linguistic, i.e., lexical and syntactic, data extracted from text. However, only few attempts have been made to evaluate the contribution of the different feature types. In this work, we contribute to this evaluation by studying the relative importance of deep syntactic features, i.e., grammatical relations, shallow syntactic features (part-of-speech information) and lexical features. For this purpose, we use a recently proposed approach that uses support vector machines with structured kernels. Results Our results reveal that the contribution of the different feature types varies for the different data sets on which the experiments were conducted. The smaller the training corpus compared to the test data, the more important the role of grammatical relations becomes. Moreover, deep syntactic information based classifiers prove to be more robust on heterogeneous texts where no or only limited common vocabulary is shared. Conclusion Our findings suggest that grammatical relations play an important role in the interaction extraction task. Moreover, the net advantage of adding lexical and shallow syntactic features is small related to the number of added features. This implies that efficient classifiers can be built by using only a small fraction of the features that are typically being used in recent approaches.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Ghent University Academic Bibliography

PubMed Central

Electricity Demand Forecasting: the Uruguayan Case

Author: A Antoniadis
A Antoniadis
GEP Box
J Cancelo
J Durbin
JR Lloyd
JW Taylor
L Breiman
Lacir J. Soares
M Devaine
Mathilde Mougeot
Pierre Gaillard
PJ Brockwell
R Nedellec
R Weron
R. E. Kalman
Rafał Weron
S Ben Taieb
T Hong
V Dordonnat
W Guang-Te
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/05/2018
Field of study

Crossref

Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts

Author: AB Clegg
C Nedellec
D Klein
D Rebholz-Schuhmann
E Charniak
H Jose
Hans-Werner Mewes
I Donaldson
J Tsujii
J-H Eom
Jason Weston
K Fundel
L Hirschman
M Lease
M Palmer
Mark Isalan
R Collobert
R Collobert
R Hoffmann
Ronan Collobert
RT-H Tsai
S Bethard
S Pradhan
TH Tsai
Thorsten Barnickel
Volker Stümpflen
Y Kogan
Y Miyao
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA (“Semantic Extraction using a Neural Network Architecture”), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, cooccurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

PuSH

Using Unsupervised Patterns to Extract Gene Regulation Relationships for Network Construction

Author: A Ozgur
BJ Stapley
C Blaschke
C Nedellec
C Rodriguez-Penagos
CC van der Eijk
CF Schaefer
D Klein
D Klein
Dongxiao Zhu
E Buyko
Hei-Chia Wang
HM Muller
Hung-Yu Kao
J Saric
J Saric
JH Chiang
K Fundel
L Tanabe
M Huang
R Chowdhary
R Hoffmann
R Jelier
S Kim
S Pyysalo
Shaw-Jenq Tsai
Shuo-Jang Li
T Ono
TK Jenssen
U Hahn
Yi-Tsung Tang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

BACKGROUND: The gene expression is usually described in the literature as a transcription factor X that regulates the target gene Y. Previously, some studies discovered gene regulations by using information from the biomedical literature and most of them require effort of human annotators to build the training dataset. Moreover, the large amount of textual knowledge recorded in the biomedical literature grows very rapidly, and the creation of manual patterns from literatures becomes more difficult. There is an increasing need to automate the process of establishing patterns. METHODOLOGY/PRINCIPAL FINDINGS: In this article, we describe an unsupervised pattern generation method called AutoPat. It is a gene expression mining system that can generate unsupervised patterns automatically from a given set of seed patterns. The high scalability and low maintenance cost of the unsupervised patterns could help our system to extract gene expression from PubMed abstracts more precisely and effectively. CONCLUSIONS/SIGNIFICANCE: Experiments on several regulators show reasonable precision and recall rates which validate AutoPat's practical applicability. The conducted regulation networks could also be built precisely and effectively. The system in this study is available at http://ikmbio.csie.ncku.edu.tw/AutoPat/

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature

The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein–protein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using cross-validation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study shows that three kernels are clearly superior to the other methods

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Toll-Like Receptor 3 (TLR3) Plays a Major Role in the Formation of Rabies Virus Negri Bodies

Author: AC Jackson
AC Jackson
AH Sharpe
Anne Danckaert
Anne-Marie Le Sourd
B Dietzschold
B Salaun
C Farina
C Prehaud
Christophe Préhaud
CS Jack
DJ Groskreutz
E Meylan
E Vercammen
E Yang
F Opazo
F Paquet-Durand
F Weber
FL Rock
Françoise Mégret
G Matsumoto
GC Sen
H Hacker
IB Johnsen
J Hennetin
J Schonborn
J Skare
JA Johnston
JA Johnston
JB Dictenberg
Jean-Pierre Bourgeois
JS Cameron
K Funami
K Kariko
K Kristensson
K Takeuchi
KA Fitzgerald
L Alexopoulou
M Kumar
M Lafon
M Lafon
M Lafon
M Matsumoto
M Sasai
M Sato
M Tanaka
M Yamamoto
MI Thoulouze
Mireille Lafage
MK Chelbi-Alix
Monique Lafon
N Janabi
N Lukacs
N Nozawa
P Lewis
P Nedellec
Pascal Roux
Pauline Ménager
R Garcia-Mata
Ralph S. Baric
RR Kopito
RR Novoa
S Camelo
S Marshall-Clarke
S Sato
SJ Pleasure
ST Suhr
T Shin
T Wileman
T Wileman
WM Cheung
Y Ni
YM Kim
Z Jiang
Z Jiang
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Human neurons express the innate immune response receptor, Toll-like receptor 3 (TLR3). TLR3 levels are increased in pathological conditions such as brain virus infection. Here, we further investigated the production, cellular localisation, and function of neuronal TLR3 during neuronotropic rabies virus (RABV) infection in human neuronal cells. Following RABV infection, TLR3 is not only present in endosomes, as observed in the absence of infection, but also in detergent-resistant perinuclear inclusion bodies. As well as TLR3, these inclusion bodies contain the viral genome and viral proteins (N and P, but not G). The size and composition of inclusion bodies and the absence of a surrounding membrane, as shown by electron microscopy, suggest they correspond to the previously described Negri Bodies (NBs). NBs are not formed in the absence of TLR3, and TLR3−/− mice—in which brain tissue was less severely infected—had a better survival rate than WT mice. These observations demonstrate that TLR3 is a major molecule involved in the spatial arrangement of RABV–induced NBs and viral replication. This study shows how viruses can exploit cellular proteins and compartmentalisation for their own benefit

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

HAL-Pasteur

SHIV-162P3 Infection of Rhesus Macaques Given Maraviroc Gel Vaginally Does Not Involve Resistant Viruses

Maraviroc (MVC) gels are effective at protecting rhesus macaques from vaginal SHIV transmission, but breakthrough infections can occur. To determine the effects of a vaginal MVC gel on infecting SHIV populations in a macaque model, we analyzed plasma samples from three rhesus macaques that received a MVC vaginal gel (day 0) but became infected after high-dose SHIV-162P3 vaginal challenge. Two infected macaques that received a placebo gel served as controls. The infecting SHIV-162P3 stock had an overall mean genetic distance of 0.294±0.027%; limited entropy changes were noted across the envelope (gp160). No envelope mutations were observed consistently in viruses isolated from infected macaques at days 14–21, the time of first detectable viremia, nor selected at later time points, days 42–70. No statistically significant differences in MVC susceptibilities were observed between the SHIV inoculum (50% inhibitory concentration [IC50] 1.87 nM) and virus isolated from the three MVC-treated macaques (MVC IC50 1.18 nM, 1.69 nM, and 1.53 nM, respectively). Highlighter plot analyses suggested that infection was established in each MVC-treated animal by one founder virus genotype. The expected Poisson distribution of pairwise Hamming Distance frequency counts was observed and a phylogenetic analysis did not identify infections with distinct lineages from the challenge stock. These data suggest that breakthrough infections most likely result from incomplete viral inhibition and not the selection of MVC-resistant variants

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Linking genes to literature: text mining, information extraction, and retrieval applications for biology

Author: A Divoli
A Doms
A Mitchell
A Sood
Alfonso Valencia
B Alako
B Carpenter
B Settles
BR Haynes
C Batchelor
C Blaschke
C Nedellec
C Rodriguez-Penagos
C Sneiderman
D Chen
D Chen
D Hanisch
D Koning
D Oliver
D Rebholz-Schuhmann
D Searls
D Wheeler
E Camon
F Couto
F Couto
G Divita
G Gomez-Lopez
G Grimes
G Poulter
H Che
H Liu
H Mangalam
H Shatkay
H Yu
I Iliopoulos
I Sarkar
J Baumgartner
J Caporaso
J Chang
J Chang
J Hakenberg
J Hakenberg
J Lewis
J Tamames
J Wilbur
J Wren
K Frantzi
K Mane
K Tomanek
L Chen
L Hunter
L Smith
L Smith
L Tanabe
Lynette Hirschman
M Ashburner
M Craven
M Errami
M Falagas
M Fattore
M Galperin
M Huang
M Krallinger
M Krallinger
M Krauthammer
M Muin
M Ongenaert
M Porter
M Shultz
M Shultz
M Synnestvedt
M Weeber
MA Andrade
Martin Krallinger
MJ Schuemie
N Okazaki
N Smalheiser
N Smalheiser
P Fontelo
P Leary
P Roberts
Q Tu
R Grishman
R Hoffmann
R Hoffmann
R Kittredge
R Netzel
R Steinbrook
S Altschul
S Brady
S Buckingham
S Douglas
S Nelson
S Staab
T Jenssen
T Shtatland
T Vanhecke
W Baumgartner
W Xuan
W Zhou
W Zhou
Y Fang
Y Yamamoto
Z Harris
Publication venue: BioMed Central
Publication date
Field of study

Efficient access to information contained in online scientific literature collections is essential for life science research, playing a crucial role from the initial stage of experiment planning to the final interpretation and communication of the results. The biological literature also constitutes the main information source for manual literature curation used by expert-curated databases. Following the increasing popularity of web-based applications for analyzing biological data, new text-mining and information extraction strategies are being implemented. These systems exploit existing regularities in natural language to extract biologically relevant information from electronic texts automatically. The aim of the BioCreative challenge is to promote the development of such tools and to provide insight into their performance. This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the following: the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. The current trend in biomedical text mining points toward an increasing diversification in terms of application types and techniques, together with integration of domain-specific resources such as ontologies. Additional descriptions of some of the systems discussed here are available on the internet

Crossref

PubMed Central

Human Immunodeficiency Virus Type 1 Coreceptor Switching: V1/V2 Gain-of-Fitness Mutations Compensate for V3 Loss-of-Fitness Mutations

Author: Mosier D. E.
Nedellec R.
Pastore C.
Pontow S.
Ramos A.
Ratner L.
Publication venue: American Society for Microbiology
Publication date: 01/01/2006
Field of study

Human immunodeficiency virus type 1 (HIV-1) entry into target cells is mediated by the virus envelope binding to CD4 and the conformationally altered envelope subsequently binding to one of two chemokine receptors. HIV-1 envelope glycoprotein (gp120) has five variable loops, of which three (V1/V2 and V3) influence the binding of either CCR5 or CXCR4, the two primary coreceptors for virus entry. Minimal sequence changes in V3 are sufficient for changing coreceptor use from CCR5 to CXCR4 in some HIV-1 isolates, but more commonly additional mutations in V1/V2 are observed during coreceptor switching. We have modeled coreceptor switching by introducing most possible combinations of mutations in the variable loops that distinguish a previously identified group of CCR5- and CXCR4-using viruses. We found that V3 mutations entail high risk, ranging from major loss of entry fitness to lethality. Mutations in or near V1/V2 were able to compensate for the deleterious V3 mutations and may need to precede V3 mutations to permit virus survival. V1/V2 mutations in the absence of V3 mutations often increased the capacity of virus to utilize CCR5 but were unable to confer CXCR4 use. V3 mutations were thus necessary but not sufficient for coreceptor switching, and V1/V2 mutations were necessary for virus survival. HIV-1 envelope sequence evolution from CCR5 to CXCR4 use is constrained by relatively frequent lethal mutations, deep fitness valleys, and requirements to make the right amino acid substitution in the right place at the right time

Crossref

PubMed Central

B cell response to surface IgM cross-linking identifies different prognostic groups of B-chronic lymphocytic leukemia patients

Author: Berthou C.
Berthou C.
Bordron A.
Bordron A.
Lydyard P.M.
Lydyard P.M.
Nedellec S.
Nedellec S.
Pers J.O.
Pers J.O.
Porakishvili N.
Porakishvili N.
Renaudineau Y.
Renaudineau Y.
Youinou P.Y.
Youinou P.Y.
Publication venue
Publication date
Field of study

WestminsterResearch