Search CORE

5 research outputs found

GenomeBlast: a web tool for small genome comparison

Author: AL Delcher
DD Womble
DL Swofford
Etsuko N Moriyama
Guoqing Lu
J Felsenstein
JO Korbel
KA Frazer
KP O'Brien
L Florea
Liying Jiang
Luwen Zhang
M Berriman
M Remm
MD Hendy
MG Montague
MM Alba
RD Page
Resa MK Helikar
RL Tatusov
S Kurtz
S Schwartz
S Yang
SF Altschul
T Treangen
T Xie
Thaine W Rowley
TJ Carver
Xianfeng Chen
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Comparative genomics has become an essential approach for identifying homologous gene candidates and their functions, and for studying genome evolution. There are many tools available for genome comparisons. Unfortunately, most of them are not applicable for the identification of unique genes and the inference of phylogenetic relationships in a given set of genomes. RESULTS: GenomeBlast is a Web tool developed for comparative analysis of multiple small genomes. A new parameter called "coverage" was introduced and used along with sequence identity to evaluate global similarity between genes. With GenomeBlast, the following results can be obtained: (1) unique genes in each genome; (2) homologous gene candidates among compared genomes; (3) 2D plots of homologous gene candidates along the all pairwise genome comparisons; and (4) a table of gene presence/absence information and a genome phylogeny. We demonstrated the functions in GenomeBlast with an example of multiple herpesviral genome analysis and illustrated how GenomeBlast is useful for small genome comparison. CONCLUSION: We developed a Web tool for comparative analysis of small genomes, which allows the user not only to identify unique genes and homologous gene candidates among multiple genomes, but also to view their graphical distributions on genomes, and to reconstruct genome phylogeny. GenomeBlast runs on a Linux server with 4 CPUs and 4 GB memory. The online version of GenomeBlast is available to public by using a Web browser with the URL

Crossref

DigitalCommons@University of Nebraska

Springer - Publisher Connector

PubMed Central

The University of Nebraska, Omaha

IgTM: An algorithm to predict transmembrane domains and topology in proteins

Author: B Mathews
C Pasquier
D Angluin
D Angluin
D Lopez
D Lopez
Damián López
DB Searls
DT Jones
E Wallin
EE Pashou
ELL Sonnhammer
EM Gold
GE Tusnády
H Viklund
J Berstel
JE Hopcroft
JM Sempere
L Käll
LR Murphy
M Burset
M Ikeda
M Punta
Marcelino Campos
MM Gromiha
NS Sadovskaya
P Fariselli
P García
P Peris
PG Bagos
Piedachu Peris
R B
S Jayasinghe
S Mitaku
S Möller
T Knuutila
T Li
T Yokomori
T Yokomori
Publication venue: BioMed Central
Publication date: 01/09/2008
Field of study

Abstract Background Due to their role of receptors or transporters, membrane proteins play a key role in many important biological functions. In our work we used Grammatical Inference (GI) to localize transmembrane segments. Our GI process is based specifically on the inference of Even Linear Languages. Results We obtained values close to 80% in both specificity and sensitivity. Six datasets have been used for the experiments, considering different encodings for the input sequences. An encoding that includes the topology changes in the sequence (from inside and outside the membrane to it and vice versa) allowed us to obtain the best results. This software is publicly available at: <url>http://www.dsic.upv.es/users/tlcc/bio/bio.html</url> Conclusion We compared our results with other well-known methods, that obtain a slightly better precision. However, this work shows that it is possible to apply Grammatical Inference techniques in an effective way to bioinformatics problems.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research

Author: A Aho
Altera
Altera
AT Alex
AT Castelo
B Kuster
C Lin
D Farre
DE Kalume
DE Knuth
FM McCarthy
H Hyyro
HJ Jung
I Bogdan
I Xilinx
IT Li
J Buhler
JD Jaffe
JD Jaffe
L Tan
M Brudno
M Brudno
M Michael
Mark Lawrence
PG Lokhov
R Sidhu
RS Boyer
S Fide
Shane C Burgess
Susan M Bridges
T Oliver
TST Mak
V Boeva
Xilinx
Yoginder S Dandass
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic mapping pipeline that is used as a case-study. The Aho-Corasick algorithm is adapted for execution in field programmable gate array (FPGA) devices in a manner that optimizes space and performance. In this approach, the traditional Aho-Corasick finite state machine (FSM) is split into smaller FSMs, operating in parallel, each of which matches up to 20 peptides in the input translated genome. Each of the smaller FSMs is further divided into five simpler FSMs such that each simple FSM operates on a single bit position in the input (five bits are sufficient for representing all amino acids and special symbols in protein sequences). Results This bit-split organization of the Aho-Corasick implementation enables efficient utilization of the limited random access memory (RAM) resources available in typical FPGAs. The use of on-chip RAM as opposed to FPGA logic resources for FSM implementation also enables rapid reconfiguration of the FPGA without the place and routing delays associated with complex digital designs. Conclusion Experimental results show storage efficiencies of over 80% for several data sets. Furthermore, the FPGA implementation executing at 100 MHz is nearly 20 times faster than an implementation of the traditional Aho-Corasick algorithm executing on a 2.67 GHz workstation.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Hardware Software Co-Design for Protein Identification

Author: Thallada Sandeep
Publication venue
Publication date: 01/01/2016
Field of study

Recently new technologies and research in computational bioinformatics have revolutionized the rate of biological data generation. A vast amount of proteomics and genomics data is contributed to the life science society by researchers especially in the domain of high throughput next generation sequencing methods and it is doubling at every 18 months. Protein identification is a fundamental step in protein sequence analysis and it needs efficient solutions to match the data growth. Rapid methods are focused in the quest for faster protein sequence analysis to scan databases and identify a protein accurately. This benefits the discipline of disease biomarker identification and aid disease diagnosis and prognosis

Research Archive of Indian Institute of Technology Hyderabad

Conference Proceedings Editors

Author: Bioinformatikaren Jardunaldiak
Gabriel Valiente
Gabriel Valiente
Jornadas De Bioinformática
Jornadas De Bioinformática
Jornades De Bioinformàtica
Jornades De Bioinformàtica
Xavier Messeguer
Xavier Messeguer
Xavier Messeguer Gabriel Valiente (eds
Xornadas De Bioinformatica
Xornadas De Bioinformatica
Publication venue
Publication date
Field of study

Preface The 5th Annual Spanish Bioinformatics Conference was held in Barcelona (Spain)

CiteSeerX