6 research outputs found

    Improving protein secondary structure prediction using a simple k-mer model

    Get PDF
    Motivation: Some first order methods for protein sequence analysis inherently treat each position as independent. We develop a general framework for introducing longer range interactions. We then demonstrate the power of our approach by applying it to secondary structure prediction; under the independence assumption, sequences produced by existing methods can produce features that are not protein like, an extreme example being a helix of length 1. Our goal was to make the predictions from state of the art methods more realistic, without loss of performance by other measures

    VIPR: A probabilistic algorithm for analysis of microbial detection microarrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>All infectious disease oriented clinical diagnostic assays in use today focus on detecting the presence of a single, well defined target agent or a set of agents. In recent years, microarray-based diagnostics have been developed that greatly facilitate the highly parallel detection of multiple microbes that may be present in a given clinical specimen. While several algorithms have been described for interpretation of diagnostic microarrays, none of the existing approaches is capable of incorporating training data generated from positive control samples to improve performance.</p> <p>Results</p> <p>To specifically address this issue we have developed a novel interpretive algorithm, VIPR (<b>V</b>iral <b>I</b>dentification using a <b>PR</b>obabilistic algorithm), which uses Bayesian inference to capitalize on empirical training data to optimize detection sensitivity. To illustrate this approach, we have focused on the detection of viruses that cause hemorrhagic fever (HF) using a custom HF-virus microarray. VIPR was used to analyze 110 empirical microarray hybridizations generated from 33 distinct virus species. An accuracy of 94% was achieved as measured by leave-one-out cross validation. <it>Conclusions</it></p> <p>VIPR outperformed previously described algorithms for this dataset. The VIPR algorithm has potential to be broadly applicable to clinical diagnostic settings, wherein positive controls are typically readily available for generation of training data.</p

    Bioinformatics for High-throughput Virus Detection and Discovery

    Get PDF
    Pathogen detection is a challenging problem given that any given specimen may contain one or more of many different microbes. Additionally, a specimen may contain microbes that have yet to be discovered. Traditional diagnostics are ill-equipped to address these challenges because they are focused on the detection of a single agent or panel of agents. I have developed three innovative computational approaches for analyzing high-throughput genomic assays capable of detecting many microbes in a parallel and unbiased fashion. The first is a metagenomic sequence analysis pipeline that was initially applied to 12 pediatric diarrhea specimens in order to give the first ever look at the diarrhea virome. Metagenomic sequencing and subsequent analysis revealed a spectrum of viruses in these specimens including known and highly divergent viruses. This metagenomic survey serves as a basis for future investigations about the possible role of these viruses in disease. The second tool I developed is a novel algorithm for diagnostic microarray analysis called VIPR: Viral Identification with a PRobabilistic algorithm). The main advantage of VIPR relative to other published methods for diagnostic microarray analysis is that it relies on a training set of empirical hybridizations of known viruses to guide future predictions. VIPR uses a Bayesian statistical framework in order to accomplish this. A set of hemorrhagic fever viruses and their relatives were hybridized to a total of 110 microarrays in order to test the performance of VIPR. VIPR achieved an accuracy of 94% and outperformed existing approaches for this dataset. The third tool I developed for pathogen detection is called VIPR HMM. VIPR HMM expands upon VIPR\u27s previous implementation by incorporating a hidden Markov model: HMM) in order to detect recombinant viruses. VIPR HMM correctly identified 95% of inter-species breakpoints for a set of recombinant alphaviruses and flaviviruses Mass sequencing and diagnostic microarrays require robust computational tools in order to make predictions regarding the presence of microbes in specimens of interest. High-throughput diagnostic assays coupled with powerful analysis tools have the potential to increase the efficacy with which we detect pathogens and treat disease as these technologies play more prominent roles in clinical laboratories
    corecore