Search CORE

23 research outputs found

Entity recognition in the biomedical domain using a hybrid approach

Author: A Tharatipyakul
C Funk
CD Paice
CS Funk
D Campos
D Koning
D Maglott
D Szklarczyk
DM Jessop
E Pafilis
E Tseytlin
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
G Sheikhshab
K Degtyarenko
K Eilbeck
K Verspoor
K Verspoor
M Ashburner
M Bada
M Basaldella
M Basaldella
MF Porter
N Pudota
P Lopez
PD Turney
R Core Team
R Leaman
R Leaman
S Aubin
S Eltyeb
S Tulkens
SA Akhondi
T Groza
T Munkhdalai
U Leser
Y Sasaki
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Virus-host interactomics: new insights and opportunities for antiviral drug discovery

Author: A Calderone
A Pichlmair
A Segura-Cabrera
A-C Gingras
AP Bento
AW Reinke
AW Tai
B De Chassey
B De Chassey
B De Chassey
B De Chassey
Benoît de Chassey
C Qin
C Ramière
C Southan
CE Engeland
CN Root
D Jackson
D Mairiang
D Mellacheruvu
E Emmott
E Van der Vries
EA Golemis
F Ciampor
F Momose
G Neveu
G Rigaut
GL Law
H Shelton
H Yu
HTT Ngo
I García-Dorival
I Mazur
I Uzoma
IM Cristea
IM Cristea
IP Grégoire
J Pellet
J-I Henter
JA Garcia-Rivera
Jacky Vonderscher
JR Parrish
K Inoue
K Tawaratsumida
L Meyniel-Schicklin
L Salwinski
Laurène Meyniel-Schicklin
LN Carpp
M Barrios-Rodiles
M Griffith
M Muller
M-A Germain
MA Calderwood
MD De Jong
MD Dyer
MJ Rindler
O Planz
O Rozenblatt-Rosen
OV Denisova
P Braun
P Cassonnet
P Dorr
P Lamesch
PA Gallay
Patrice André
PJ Lim
PS Pennings
PT Dolan
Q Li
R Dierkes
S Duffy
S Eyckerman
S Fields
S Kerrien
S Ludwig
S Orchard
S Orchard
S Pfefferle
S Pleschka
S Wang
S-E Ong
SA Akhondi
SD Shapira
SS Jourdan
T Hagai
T Klingström
TL Kieffer
V Law
V Navratil
V Navratil
Vincent Lotteau
W Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Consistency of systematic chemical identifiers within and between small-molecule databases

Author: Akhondi SA
Kors Jan
Muresan S
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background: Correctness of structures and associated metadata within public and commercial chemical databases greatly impacts drug discovery research activities such as quantitative structure-property relationships modelling and compound novelty checking. MOL files, SMILES notations, IUPAC names, and InChI strings are ubiquitous file formats and systematic identifiers for chemical structures. While interchangeable for many cheminformatics purposes there have been no studies on the inconsistency of these structure identifiers due to various approaches for data integration, including the use of different software and different rules for structure standardisation. We have investigated the consistency of systematic identifiers of small molecules within and between some of the commonly used chemical resources, with and without structure standardisation. Results: The consistency between systematic chemical identifiers and their corresponding MOL representation varies greatly between data sources (37.2%-98.5%). We observed the lowest overall consistency for MOL-IUPAC names. Disregarding stereochemistry increases the consistency (84.8% to 99.9%). A wide variation in consistency also exists between MOL representations of compounds linked via cross-references (25.8% to 93.7%). Removing stereochemistry improved the consistency (47.6% to 95.6%). Conclusions: We have shown that considerable inconsistency exists in structural representation and systematic chemical identifiers within and between databases. This can have a great influence especially when merging data and if systematic identifiers are used as a key index for structure integration or cross-querying several databases. Regenerating systematic identifiers starting from their MOL representation and applying well-defined and documented chemistry standardisation rules to all compo

EUR Research Repository

Detecting Chemical Reactions in Patents

Author: Akhondi SA
Baldwin T
Druckenbrodt C
Nguyen DQ
Thorne C
Verspoor K
Yoshikawa H
Zhai Z
Publication venue: Australasian Language Technology Association
Publication date: 01/01/2019
Field of study

Extracting chemical reactions from patents is a crucial task for chemists working on chemical exploration. In this paper we introduce the novel task of detecting the textual spans that describe or refer to chemical reactions within patents. We formulate this task as a paragraph-level sequence tagging problem, where the system is required to return a sequence of paragraphs that contain a description of a reaction. To address this new task, we construct an annotated dataset from an existing proprietary database of chemical reactions manually extracted from patents. We introduce several baseline methods for the task and evaluate them over our dataset. Through error analysis, we discuss what makes the task complex and challenging, and suggest possible directions for future research

University of Melbourne Institutional Repository