Search CORE

104 research outputs found

OMPdb: a database of β-barrel outer membrane proteins from Gram-negative bacteria

Author: Bagos Pantelis G.
Hamodrakas Stavros J.
Tsirigos Konstantinos D.
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

We describe here OMPdb, which is currently the most complete and comprehensive collection of integral β-barrel outer membrane proteins from Gram-negative bacteria. The database currently contains 69 354 proteins, which are classified into 85 families, based mainly on structural and functional criteria. Although OMPdb follows the annotation scheme of Pfam, many of the families included in the database were not previously described or annotated in other publicly available databases. There are also cross-references to other databases, references to the literature and annotation for sequence features, like transmembrane segments and signal peptides. Furthermore, via the web interface, the user can not only browse the available data, but submit advanced text searches and run BLAST queries against the database protein sequences or domain searches against the collection of profile Hidden Markov Models that represent each family’s domain organization as well. The database is freely accessible for academic users at http://bioinformatics.biol.uoa.gr/OMPdb and we expect it to be useful for genome-wide analyses, comparative genomics as well as for providing training and test sets for predictive algorithms regarding transmembrane β-barrels

PubMed Central

Copenhagen University Research Information System

Toward On-demand Profile Hidden Markov Models for Genetic Barcode Identification

Author: Sheu Jessica
Publication venue: SJSU ScholarWorks
Publication date: 16/05/2019
Field of study

Genetic identification aims to solve the shortcomings of morphological identification. By using the cytochrome c oxidase subunit 1 (COI) gene as the Eukaryotic “barcode,” scientists hope to research species that may be morphologically ambiguous, elusive, or similarly difficult to visually identify. Current COI databases allow users to search only for existing database records. However, as the number of sequenced, potential COI genes increases, COI identification tools should ideally also be informative of novel, previously unreported sequences that may represent new species. If an unknown COI sequence does not represent a reported organism, an ideal identification tool would report taxonomic ranks to which the sequence is likely to belong. A potential solution is to dynamically create profile hidden Markov models (PHMMs): first at the genus level, then at the family level, traversing to higher taxonomic ranks until a significant score is found. This study experiments with creating PHMMs at the genus level, determining thresholds for classification, and assessing the general performance of this method and the requirements for future expansion to higher taxonomic groups. It ultimately determines that this model shows potential, but may require additional data pre-processing and may fall victim to current machine limitations

SJSU ScholarWorks

The p53HMM algorithm: using profile hidden markov models to detect p53-responsive genes

Author: A Contente
A Krogh
AJ Levine
Arnold Levine
B Ma
B Schuster-Böckler
C Barrett
Eduardo Sontag
G Stormo
J Hoh
J Lee
JC Bourdon
M Djordjevic
Q Zhou
R Durbin
R Hughey
RCG Holland
SE Kern
SR Eddy
T Riley
T Tan
TD Schneider
Todd Riley
VD Marinescu
W Hu
WD Funk
WS el Deiry
X Yu
Xin Yu
Y Barash
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background A computational method (called p53HMM) is presented that utilizes Profile Hidden Markov Models (PHMMs) to estimate the relative binding affinities of putative p53 response elements (REs), both p53 single-sites and cluster-sites. These models incorporate a novel "Corresponded Baum-Welch" training algorithm that provides increased predictive power by exploiting the redundancy of information found in the repeated, palindromic p53-binding motif. The predictive accuracy of these new models are compared against other predictive models, including position specific score matrices (PSSMs, or weight matrices). We also present a new dynamic acceptance threshold, dependent upon a putative binding site's distance from the Transcription Start Site (TSS) and its estimated binding affinity. This new criteria for classifying putative p53-binding sites increases predictive accuracy by reducing the false positive rate. Results Training a Profile Hidden Markov Model with corresponding positions matching a combined-palindromic p53-binding motif creates the best p53-RE predictive model. The p53HMM algorithm is available on-line: <url>http://tools.csb.ias.edu</url> Conclusion Using Profile Hidden Markov Models with training methods that exploit the redundant information of the homotetramer p53 binding site provides better predictive models than weight matrices (PSSMs). These methods may also boost performance when applied to other transcription factor binding sites.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Improving model construction of profile HMMs for remote homology detection through structural alignment

Author: A Andreeva
A Bateman
A Krogh
A Krogh
AC Camproux
Alberto MR Dávila
B Brejova
B Knudsen
B Qian
C Bystroff
C Do
C Notredame
D Feng
D Haft
F Altschul
F Goyon
Gerson Zaverucha
H Mamitsuka
I Letunic
J Espadaler
J Gough
J Park
J Shi
J Söding
J Thompson
JD Thompson
JR Beck
Juliana S Bernardes
K Bae
K Karplus
K Karplus
K Katoh
K Lin
K Mizuguchi
K Sjolander
L Holm
L Rabiner
M Gribskov
M Helen
M Madera
M Mendel
M Wistrand
M Wistrand
O Sullivan
P Bourne
P Nuin
R Edgar
R Hughey
R Hughey
R Karchin
S Altschul
S Eddy
S Jones
T Attwood
T Mitchell
V Alexandrov
Vítor S Costa
W Majoros
W Taylor
WR Pearson
Y Hou
Y Hou
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the <it>Twilight Zone</it>, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. Results We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. Conclusion We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MGEScan-non-LTR: computational identification and classification of autonomous non-LTR retrotransposons in eukaryotic genomes

Author: Adams
Altschul
Blesa
Blesa
Burke
Burke
Dehal
Eddy
Edgar
H. Tang
Hessa
Hizer
Krogh
Kyte
Lander
Lovsin
Luan
M. Rho
Malik
McCarthy
Novikova
Permanyer
Sea Urchin Genome Sequencing Consortium
Tu
Unge
Volff
Wimley
Publication venue: Oxford University Press
Publication date
Field of study

Computational methods for genome-wide identification of mobile genetic elements (MGEs) have become increasingly necessary for both genome annotation and evolutionary studies. Non-long terminal repeat (non-LTR) retrotransposons are a class of MGEs that have been found in most eukaryotic genomes, sometimes in extremely high numbers. In this article, we present a computational tool, MGEScan-non-LTR, for the identification of non-LTR retrotransposons in genomic sequences, following a computational approach inspired by a generalized hidden Markov model (GHMM). Three different states represent two different protein domains and inter-domain linker regions encoded in the non-LTR retrotransposons, and their scores are evaluated by using profile hidden Markov models (for protein domains) and Gaussian Bayes classifiers (for linker regions), respectively. In order to classify the non-LTR retrotransposons into one of the 12 previously characterized clades using the same model, we defined separate states for different clades. MGEScan-non-LTR was tested on the genome sequences of four eukaryotic organisms, Drosophila melanogaster, Daphnia pulex, Ciona intestinalis and Strongylocentrotus purpuratus. For the D. melanogaster genome, MGEScan-non-LTR found all known ‘full-length’ elements and simultaneously classified them into the clades CR1, I, Jockey, LOA and R1. Notably, for the D. pulex genome, in which no non-LTR retrotransposon has been annotated, MGEScan-non-LTR found a significantly larger number of elements than did RepeatMasker, using the current version of the RepBase Update library. We also identified novel elements in the other two genomes, which have only been partially studied for non-LTR retrotransposons

Crossref

PubMed Central

A Study on Masquerade Detection

Author: Huang Lin
Publication venue: SJSU ScholarWorks
Publication date: 01/12/2010
Field of study

In modern computer systems, usernames and passwords have been by far the most common forms of authentication. A security system relying only on password protection is defenseless when the passwords of legitimate users are compromised. A masquerader can impersonate a legitimate user by using a compromised password. An intrusion detection system (IDS) can provide an additional level of protection for a security system by inspecting user behavior. In terms of detection techniques, there are two types of IDSs: signature-based detection and anomaly-based detection. An anomaly-based intrusion detection technique consists of two steps: 1) creating a normal behavior model for legitimate users during the training process, 2) analyzing user behavior against the model during the detection process. In this project, we concentrate on masquerade detection, a specific type of anomaly-based IDS. We have first explored suitable techniques to build a normal behavior model for masquerade detection. After studying two existing modeling techniques, N-gram frequency and hidden Markov models (HMMs), we have developed a novel approach based on profile hidden Markov models (PHMMs). Then we have analyzed these three approaches using the classical Schonlau data set. To find the best detection results, we have also conducted sensitivity analysis on the modeling parameters. However, we have found that our proposed PHMMs do not outperform the corresponding HMMs. We conjectured that Schonlau data set lacked the position information required by the PHMMs. To verify this conjecture, we have also generated several data sets with position information. Our experimental results show that when there is no sufficient training data, the PHMMs yield considerably better detection results than the iv corresponding HMMs since the generated position information is significantly helpful for the PHMMs

SJSU ScholarWorks

Classification of HIV-1 Sequences Using Profile Hidden Markov Models

Author: A Krogh
A Pandit
AF Santos
AK Schultz
AK Schultz
Art F. Y. Poon
CV Gale
D Paraskevis
D Robertson
DL Robertson
F Gao
G Myers
I Bulla
J Goudsmit
J Louwagie
J Truszkowski
JV Parry
K Karplus
K Triques
KS Lole
LG Kostrikis
M Rozanov
ML Guimarães
MM Thomson
O Westesson
P Hraber
P Singh
Palella FJ Jr
R Durbin
R Myers
R Spang
RC Edgar
RD Finn
Sanjiv K. Dwivedi
SR Eddy
SS Sanabani
Supratim Sengupta
T Leitner
W Janssens
Y Takebe
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Accurate classification of HIV-1 subtypes is essential for studying the dynamic spatial distribution pattern of HIV-1 subtypes and also for developing effective methods of treatment that can be targeted to attack specific subtypes. We propose a classification method based on profile Hidden Markov Model that can accurately identify an unknown strain. We show that a standard method that relies on the construction of a positive training set only, to capture unique features associated with a particular subtype, can accurately classify sequences belonging to all subtypes except B and D. We point out the drawbacks of the standard method; namely, an arbitrary choice of threshold to distinguish between true positives and true negatives, and the inability to discriminate between closely related subtypes. We then propose an improved classification method based on construction of a positive as well as a negative training set to improve discriminating ability between closely related subtypes like B and D. Finally, we show how the improved method can be used to accurately determine the subtype composition of Common Recombinant Forms of the virus that are made up of two or more subtypes. Our method provides a simple and highly accurate alternative to other classification methods and will be useful in accurately annotating newly sequenced HIV-1 strains

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis

Author: Alser Mohammed
Cali Damla Senol
Cavlak Meryem Banu
Firtina Can
Kalsi Gurpreet S.
Kim Jeremie
Lindegger Joel
Luna Juan Gómez
Mutlu Onur
Pillai Kamlesh
Shahroodi Taha
Subramoney Sreenivas
Suresh Bharathwaj
Publication venue
Publication date: 21/10/2023
Field of study

Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these probabilities to optimize and compute similarity scores. However, the Baum-Welch algorithm is computationally intensive, and existing solutions offer either software-only or hardware-only approaches with fixed pHMM designs. We identify an urgent need for a flexible, high-performance, and energy-efficient HW/SW co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs. We introduce ApHMM, the first flexible acceleration framework designed to significantly reduce both computational and energy overheads associated with the Baum-Welch algorithm for pHMMs. ApHMM tackles the major inefficiencies in the Baum-Welch algorithm by 1) designing flexible hardware to accommodate various pHMM designs, 2) exploiting predictable data dependency patterns through on-chip memory with memoization techniques, 3) rapidly filtering out negligible computations using a hardware-based filter, and 4) minimizing redundant computations. ApHMM achieves substantial speedups of 15.55x - 260.03x, 1.83x - 5.34x, and 27.97x when compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations in three key bioinformatics applications: 1) error correction, 2) protein family search, and 3) multiple sequence alignment, by 1.29x - 59.94x, 1.03x - 1.75x, and 1.03x - 1.95x, respectively, while improving their energy efficiency by 64.24x - 115.46x, 1.75x, 1.96x.Comment: Accepted to ACM TAC

arXiv.org e-Print Archive