Search CORE

Leiden University Scholary Publications

EUR Research Repository

Erasmus University Digital Repository

Automated annotation of chemical names in the literature with tunable accuracy

Author: A Copestake
AR Aronson
AR Aronson
C Kolarik
C Kolarik
CE Lipscomb
DL Banville
DM Jassop
E Bolton
Evan E Bolton
GG Chowdhury
GG Chowdhury
JD Wren
Jun D Zhang
KM Hettne
KM Hettne
Lewis Y Geer
P Corbett
PT Corbett
R Klinger
Stephen H Bryant
WJ Wilbur
YY Zhou
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background A significant portion of the biomedical and chemical literature refers to small molecules. The accurate identification and annotation of compound name that are relevant to the topic of the given literature can establish links between scientific publications and various chemical and life science databases. Manual annotation is the preferred method for these works because well-trained indexers can understand the paper topics as well as recognize key terms. However, considering the hundreds of thousands of new papers published annually, an automatic annotation system with high precision and relevance can be a useful complement to manual annotation. Results An automated chemical name annotation system, MeSH Automated Annotations (MAA), was developed to annotate small molecule names in scientific abstracts with tunable accuracy. This system aims to reproduce the MeSH term annotations on biomedical and chemical literature that would be created by indexers. When comparing automated free text matching to those indexed manually of 26 thousand MEDLINE abstracts, more than 40% of the annotations were false-positive (FP) cases. To reduce the FP rate, MAA incorporated several filters to remove "incorrect" annotations caused by nonspecific, partial, and low relevance chemical names. In part, relevance was measured by the position of the chemical name in the text. Tunable accuracy was obtained by adding or restricting the sections of the text scanned for chemical names. The best precision obtained was 96% with a 28% recall rate. The best performance of MAA, as measured with the F statistic was 66%, which favorably compares to other chemical name annotation systems. Conclusions Accurate chemical name annotation can help researchers not only identify important chemical names in abstracts, but also match unindexed and unstructured abstracts to chemical records. The current work is tested against MEDLINE, but the algorithm is not specific to this corpus and it is possible that the algorithm can be applied to papers from chemical physics, material, polymer and environmental science, as well as patents, biological assay descriptions and other textual data.</p

Mining metabolites: extracting the yeast metabolome from the literature

Author: Chikashi Nobata
CR Batchelor
D Banville
D Broadhurst
D Jiao
DB Kell
Douglas B. Kell
GA Eller
J Brecher
J Finkel
J Townsend
J Wisniewski
J Wren
JD Kim
JD Kim
Jun’ichi Tsujii
K Degtyarenko
KM Hettne
L Goebels
M Hucka
M Kanehisa
M Kanehisa
M Kanehisa
M Krallinger
N Okazaki
P Corbett
P Mendes
Paul D. Dobson
PD Dobson
Pedro Mendes
R Hoffmann
R Klinger
S Ananiadou
S Ananiadou
S Ananiadou
Sophia Ananiadou
Syed A. Iqbal
X Wang
Y Kano
Y Kano
Y Miyao
Y Sasaki
Y Tsuruoka
Y Tsuruoka
Publication venue: Springer US
Publication date: 01/01/2011
Field of study

Text mining methods have added considerably to our capacity to extract biological knowledge from the literature. Recently the field of systems biology has begun to model and simulate metabolic networks, requiring knowledge of the set of molecules involved. While genomics and proteomics technologies are able to supply the macromolecular parts list, the metabolites are less easily assembled. Most metabolites are known and reported through the scientific literature, rather than through large-scale experimental surveys. Thus it is important to recover them from the literature. Here we present a novel tool to automatically identify metabolite names in the literature, and associate structures where possible, to define the reported yeast metabolome. With ten-fold cross validation on a manually annotated corpus, our recognition tool generates an f-score of 78.49 (precision of 83.02) and demonstrates greater suitability in identifying metabolite names than other existing recognition tools for general chemical molecules. The metabolite recognition tool has been applied to the literature covering an important model organism, the yeast Saccharomyces cerevisiae, to define its reported metabolome. By coupling to ChemSpider, a major chemical database, we have identified structures for much of the reported metabolome and, where structure identification fails, been able to suggest extensions to ChemSpider. Our manually annotated gold-standard data on 296 abstracts are available as supplementary materials. Metabolite names and, where appropriate, structures are also available as supplementary materials

The University of Manchester - Institutional Repository

Identification of a Shared Genetic Susceptibility Locus for Coronary Heart Disease and Periodontitis

Recent studies indicate a mutual epidemiological relationship between coronary heart disease (CHD) and periodontitis. Both diseases are associated with similar risk factors and are characterized by a chronic inflammatory process. In a candidate-gene association study, we identify an association of a genetic susceptibility locus shared by both diseases. We confirm the known association of two neighboring linkage disequilibrium regions on human chromosome 9p21.3 with CHD and show the additional strong association of these loci with the risk of aggressive periodontitis. For the lead SNP of the main associated linkage disequilibrium region, rs1333048, the odds ratio of the autosomal-recessive mode of inheritance is 1.99 (95% confidence interval 1.33–2.94; P = 6.9×10−4) for generalized aggressive periodontitis, and 1.72 (1.06–2.76; P = 2.6×10−2) for localized aggressive periodontitis. The two associated linkage disequilibrium regions map to the sequence of the large antisense noncoding RNA ANRIL, which partly overlaps regulatory and coding sequences of CDKN2A/CDKN2B. A closely located diabetes-associated variant was independent of the CHD and periodontitis risk haplotypes. Our study demonstrates that CHD and periodontitis are genetically related by at least one susceptibility locus, which is possibly involved in ANRIL activity and independent of diabetes associated risk variants within this region. Elucidation of the interplay of ANRIL transcript variants and their involvement in increased susceptibility to the interactive diseases CHD and periodontitis promises new insight into the underlying shared pathogenic mechanisms of these complex common diseases

VU Research Portal

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Gut microbiota and diabetes: from pathogenesis to therapeutic perspective

Author: A Bjorneklett
A Cerutti
A Cerutti
A Green
A Velayudham
AC Vreugdenhil
AL Goodman
BE Gustafsson
C King
C Knauf
C Levy-Marchal
C Manichanh
C Maziere
C Mering von
C Palmer
C Postic
CC Patterson
CF Favier
Chantal Chabo
CJ Wiedermann
CM Edwards
CP Day
D Knights
D Thabut
DD Black
DW Han
E Esposito
E Firatli
ED Sonnenburg
EF Murphy
EG Zoetendal
EG Zoetendal
F Backhed
F Backhed
F Backhed
F D’Aiuto
F Guarner
G Dalmasso
G Danaei
G Eisenbarth
G Musso
GC Baker
GR Gibson
GS Hotamisligil
H Buchwald
H Löe
H Zhang
HW Harris
HY Zhang
I Pappo
J Amar
J Amar
J McCarthy
J Medina
J Qin
J Slots
J Tap
Jacques Amar
JC Pickup
JG Caporaso
JH Fritz
JM Neefs
JM Wells
JP Furet
K Hettne
KM Pantalone
L Bry
L Darré
L Dethlefsen
L Dethlefsen
L Miller
L Wen
LG Ooi
LV Hooper
LV Hooper
M Boirivant
M Bukoff
M Feuerer
M Kalliomaki
M Kanehisa
M Membrez
M Oresic
M Poggi
M Roberfroid
M Schenk
M Serino
M Soell
M Spencer
M Vijay-Kumar
M Westerterp
Matteo Serino
MB Roberfroid
MC Collado
ME Dumas
ME Johansson
MF McInerney
MJ Claesson
MJ Song
MT Asquith
MW Sadelain
N Doan
N Larsen
N Osman
O Koren
P Gerard
P Pozzilli
P Turnbaugh
PC Rensen
PD Cani
PD Cani
PD Cani
PD Cani
PG Holt
PG Holt
PJ Turnbaugh
PJ Turnbaugh
PJ Turnbaugh
Q Wang
R Andraws
R Assan
R Caesar
R Calvani
R Ley
R Menghini
R Valladares
RD Heijtz
RE Ley
RE Ley
RE Ley
RE Ley
RE Ley
RG Burcelin
RI Amann
RI Mackie
Rémy Burcelin
S Akira
S Brugman
S Caspar-Bauguil
S Ding
S Epstein
S Fukuda
S Ghoshal
S Grossi
S Hapfelmeier
S Kondo
S Menard
S Ott
S Rakoff-Nahoum
S Shoelson
S Winer
SA Kocaman
SK Mazmanian
SM Ruchat
SP Weisberg
T Pischon
T Saito
TA Kufer
V Gaboriau-Routhiau
Vincent Blasco-Baque
VO Rotimi
VR Velagapudi
WHO
Y Fukushima
Y Nishizawa
YH Paik
ZL Kumwenda
Publication venue: Springer Milan
Publication date: 01/01/2011
Field of study

More than several hundreds of millions of people will be diabetic and obese over the next decades in front of which the actual therapeutic approaches aim at treating the consequences rather than causes of the impaired metabolism. This strategy is not efficient and new paradigms should be found. The wide analysis of the genome cannot predict or explain more than 10–20% of the disease, whereas changes in feeding and social behavior have certainly a major impact. However, the molecular mechanisms linking environmental factors and genetic susceptibility were so far not envisioned until the recent discovery of a hidden source of genomic diversity, i.e., the metagenome. More than 3 million genes from several hundreds of species constitute our intestinal microbiome. First key experiments have demonstrated that this biome can by itself transfer metabolic disease. The mechanisms are unknown but could be involved in the modulation of energy harvesting capacity by the host as well as the low-grade inflammation and the corresponding immune response on adipose tissue plasticity, hepatic steatosis, insulin resistance and even the secondary cardiovascular events. Secreted bacterial factors reach the circulating blood, and even full bacteria from intestinal microbiota can reach tissues where inflammation is triggered. The last 5 years have demonstrated that intestinal microbiota, at its molecular level, is a causal factor early in the development of the diseases. Nonetheless, much more need to be uncovered in order to identify first, new predictive biomarkers so that preventive strategies based on pre- and probiotics, and second, new therapeutic strategies against the cause rather than the consequence of hyperglycemia and body weight gain

HAL-Inserm

Knowledge-based extraction of adverse drug events from biomedical text

Author: A Airola
A Ozg
AM Cohen
AR Aronson
Bharat Singh
C Bizer
CF Thorn
Chinh Bui
D Demner-Fushman
D Ferrucci
D Hanisch
D Revere
DS Wishart
E Buyko
Erik M van Mulligen
F Leitner
F Rinaldi
F Rinaldi
GB Melton
H Gurulingappa
H Gurulingappa
H Gurulingappa
H Jang
HJ Dai
HW Chun
J Saric
J-D Kim
J-H Kim
Jan A Kors
JD Kim
K Fundel
KM Hettne
LJ Jensen
M Bundschus
M Huang
M Krallinger
M Krallinger
M Krallinger
MA Schwartz Hearst
MJ Schuemie
MS Simpson
N Kang
N Kang
N Kang
Ning Kang
O Bodenreider
O Bodenreider
O Uzuner
P Zweigenbaum
PL Elkin
QC Bui
R Islamaj Doğan
S Buchholz
S Kandula
S Katrenko
S Pyysalo
T Rindflesch
TC Rindflesch
TC Rindflesch
U Hahn
Y Huang
Y Kano
Y Tateisi
Zubair Afzal
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Brain MRI data sharing guide

Author: Achterberg Michelle
Hettne KM
Huijser D (Dorien)
Klapwijk Eduard
van 't Veer AE
van Erkel R
Wierenga LM
Publication venue
Publication date: 06/05/2021
Field of study

We present a guide on sharing Magnetic Resonance Imaging (MRI) data, with a focus on The Netherlands. The guide is meant as a help for researchers to know what they can share and where, and where they can find information or support

EUR Research Repository

Why workflows break - Understanding and combating decay in Taverna workflows.

Author: Belhajjame K
García-Cuesta E
Garrido A
Goble CA
Gómez-Pérez JM
Hettne KM
Klyne G
Roos M
Roure DD
Zhao J
Publication venue: IEEE Computer Society
Publication date: 01/01/2012
Field of study

Workflows provide a popular means for preserving scientific methods by explicitly encoding their process. However, some of them are subject to a decay in their ability to be re-executed or reproduce the same results over time, largely due to the volatility of the resources required for workflow executions. This paper provides an analysis of the root causes of workflow decay based on an empirical study of a collection of Taverna workflows from the myExperiment repository. Although our analysis was based on a specific type of workflow, the outcomes and methodology should be applicable to workflows from other systems, at least those whose executions also rely largely on accessing third-party resources. Based on our understanding about decay we recommend a minimal set of auxiliary resources to be preserved together with the workflows as an aggregation object and provide a software tool for end-users to create such aggregations and to assess their completeness. ©2012 IEEE

CiteSeerX

Oxford University Research Archive

The University of Manchester - Institutional Repository

Lancaster E-Prints

Drug prioritization using the semantic properties of a knowledge graph

Author: 't Hoen PAC
Charrout M
Hettne KM
Kors Jan
Kudrin R
Malas TB
Peters DJM
Roos M
Starikov S
van Mulligen Erik
Vlietstra Wytze
Vos Reinder
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

Author: A Boorsma
Aldert H Piersma
André Boorsma
AP Davis
CJ Patel
CY Lin
d V-v den Bosch HM
DA van Dartel
DA van Dartel
Dorien A M van Dartel
E de Jong
Esther de Jong
H Vanden Bossche
HJ Bussemaker
J Lamb
Jan A Kors
Jelle J Goeman
JJ Goeman
JL Pennings
JL Smalley
JM Naciff
Jos C Kleinjans
KM Hettne
Kristina M Hettne
M Ashburner
M Reich
MJ Schuemie
N Tijet
P Minguez
R Jelier
R Jelier
R Nogales-Cadenas
Rob H Stierum
TB Knudsen
V Selvaraj
W da Huang
Y Benjamini
Y Bosse
Z Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Contains fulltext : 125714.pdf (publisher's version ) (Open Access)BACKGROUND: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. METHODS: We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. RESULTS: Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. CONCLUSIONS: Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect

Maastricht University Research Portal