Search CORE

4 research outputs found

A parallel and incremental algorithm for efficient unique signature discovery on DNA databases

Author: Lee Hsiao Ping
Sheu Tzu-Fang
Tang Chuan Yi
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

A method for automatically extracting infectious disease-related primers and probes from the literature

Author: Crespo José
Cuevas Alejandro
de la Calle Guillermo
de la Iglesia Diana
García-Remesal Miguel
Lopez-Alonso Victoria
López-Campos Guillermo
Maojo Víctor
Martin-Sanchez Fernando
Pérez-Rey David
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

BACKGROUND: Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. RESULTS: We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. CONCLUSIONS: We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch.The present work has been funded, in part, by the European Commission through the ACGT integrated project (FP6-2005-IST-026996) and the ACTION-Grid support action (FP7-ICT-2007-2-224176), the Spanish Ministry of Science and Innovation through the OntoMineBase project (ref. TSI2006-13021-C02-01), the ImGraSec project (ref. TIN2007-61768), FIS/AES PS09/00069 and COMBIOMED-RETICS, and the Comunidad de Madrid, Spain.S

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Springer - Publisher Connector

PubMed Central

REPISALUD

University of Melbourne Institutional Repository

Archivo Digital UPM

A method for automatically extracting infectious disease-related primers and probes from the literature

Author: A Loy
Alejandro Cuevas
BS Rice
D Betel
DA Benson
David Pérez-Rey
Diana de la Iglesia
EA Mothershed
F Li
F Pattyn
Fernando Martín-Sánchez
G De la Calle
Guillermo de la Calle
Guillermo López-Campos
H González-Díaz
H Hyyrö
HD VanGuilder
HP Lee
J Stajich
J Tamames
J Tarhio
JJ Rocchio
José Crespo
K Pabbaraju
L Hirschman
LL Cheng
LT Bravo
M Minsky
MB Miller
MC Enright
MG Campi
Miguel García-Remesal
National Center for Biotechnology Information
P Harmon
PC Woo
R McDonald
RM Ratcliff
SF Altschul
Victoria López-Alonso
Víctor Maojo
YC Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A parallel and incremental algorithm for efficient unique signature discovery on DNA databases

Author: Lee Hsiao
Sheu Tzu-Fang
Tang Chuan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2010
Field of study

Abstract Background DNA signatures are distinct short nucleotide sequences that provide valuable information that is used for various purposes, such as the design of Polymerase Chain Reaction primers and microarray experiments. Biologists usually use a discovery algorithm to find unique signatures from DNA databases, and then apply the signatures to microarray experiments. Such discovery algorithms require to set some input factors, such as signature length l and mismatch tolerance d, which affect the discovery results. However, suggestions about how to select proper factor values are rare, especially when an unfamiliar DNA database is used. In most cases, biologists typically select factor values based on experience, or even by guessing. If the discovered result is unsatisfactory, biologists change the input factors of the algorithm to obtain a new result. This process is repeated until a proper result is obtained. Implicit signatures under the discovery condition (l, d) are defined as the signatures of length ≤ l with mismatch tolerance ≥ d. A discovery algorithm that could discover all implicit signatures, such that those that meet the requirements concerning the results, would be more helpful than one that depends on trial and error. However, existing discovery algorithms do not address the need to discover all implicit signatures. Results This work proposes two discovery algorithms - the consecutive multiple discovery (CMD) algorithm and the parallel and incremental signature discovery (PISD) algorithm. The PISD algorithm is designed for efficiently discovering signatures under a certain discovery condition. The algorithm finds new results by using previously discovered results as candidates, rather than by using the whole database. The PISD algorithm further increases discovery efficiency by applying parallel computing. The CMD algorithm is designed to discover implicit signatures efficiently. It uses the PISD algorithm as a kernel routine to discover implicit signatures efficiently under every feasible discovery condition. Conclusions The proposed algorithms discover implicit signatures efficiently. The presented CMD algorithm has up to 97% less execution time than typical sequential discovery algorithms in the discovery of implicit signatures in experiments, when eight processing cores are used.</p

Directory of Open Access Journals