60 research outputs found

    Exact Asymptotic Results for a Model of Sequence Alignment

    Full text link
    Finding analytically the statistics of the longest common subsequence (LCS) of a pair of random sequences drawn from c alphabets is a challenging problem in computational evolutionary biology. We present exact asymptotic results for the distribution of the LCS in a simpler, yet nontrivial, variant of the original model called the Bernoulli matching (BM) model which reduces to the original model in the large c limit. We show that in the BM model, for all c, the distribution of the asymptotic length of the LCS, suitably scaled, is identical to the Tracy-Widom distribution of the largest eigenvalue of a random matrix whose entries are drawn from a Gaussian unitary ensemble. In particular, in the large c limit, this provides an exact expression for the asymptotic length distribution in the original LCS problem.Comment: 4 pages Revtex, 2 .eps figures include

    A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate <it>de novo </it>sequencing for identification of post-translational modifications and amino acid polymorphisms.</p> <p>Results</p> <p>In this study, a new <it>de novo </it>sequencing algorithm, called Vonode, has been developed specifically for analysis of such high-resolution tandem mass spectra. To fully exploit the high mass accuracy of these spectra, a unique scoring system is proposed to evaluate sequence tags based primarily on mass accuracy information of fragment ions. Consensus sequence tags were inferred for 11,422 spectra with an average peptide length of 5.5 residues from a total of 40,297 input spectra acquired in a 24-hour proteomics measurement of <it>Rhodopseudomonas palustris</it>. The accuracy of inferred consensus sequence tags was 84%. According to our comparison, the performance of Vonode was shown to be superior to the PepNovo v2.0 algorithm, in terms of the number of <it>de novo </it>sequenced spectra and the sequencing accuracy.</p> <p>Conclusions</p> <p>Here, we improved <it>de novo </it>sequencing performance by developing a new algorithm specifically for high-resolution tandem mass spectral data. The Vonode algorithm is freely available for download at <url>http://compbio.ornl.gov/Vonode</url>.</p

    Computational Methods for Protein Identification from Mass Spectrometry Data

    Get PDF
    Protein identification using mass spectrometry is an indispensable computational tool in the life sciences. A dramatic increase in the use of proteomic strategies to understand the biology of living systems generates an ongoing need for more effective, efficient, and accurate computational methods for protein identification. A wide range of computational methods, each with various implementations, are available to complement different proteomic approaches. A solid knowledge of the range of algorithms available and, more critically, the accuracy and effectiveness of these techniques is essential to ensure as many of the proteins as possible, within any particular experiment, are correctly identified. Here, we undertake a systematic review of the currently available methods and algorithms for interpreting, managing, and analyzing biological data associated with protein identification. We summarize the advances in computational solutions as they have responded to corresponding advances in mass spectrometry hardware. The evolution of scoring algorithms and metrics for automated protein identification are also discussed with a focus on the relative performance of different techniques. We also consider the relative advantages and limitations of different techniques in particular biological contexts. Finally, we present our perspective on future developments in the area of computational protein identification by considering the most recent literature on new and promising approaches to the problem as well as identifying areas yet to be explored and the potential application of methods from other areas of computational biology

    A novel physiologically based pharmacokinetic model of rectal absorption, evaluated and verified using clinical data on 10 rectally administered drugs_Simcyp V21 workspaces

    No full text
    This dataset contains the Simcyp Simulator V21 workspaces and associated datafiles used to simulate the intravenous and rectal administration clinical studies described in the paper.THIS DATASET IS ARCHIVED AT DANS/EASY, BUT NOT ACCESSIBLE HERE. TO VIEW A LIST OF FILES AND ACCESS THE FILES IN THIS DATASET CLICK ON THE DOI-LINK ABOV

    A novel physiologically based pharmacokinetic model of rectal absorption, evaluated and verified using clinical data on 10 rectally administered drugs_Simcyp V21 workspaces

    No full text
    This dataset contains the Simcyp Simulator V21 workspaces and associated datafiles used to simulate the intravenous and rectal administration clinical studies described in the paper.THIS DATASET IS ARCHIVED AT DANS/EASY, BUT NOT ACCESSIBLE HERE. TO VIEW A LIST OF FILES AND ACCESS THE FILES IN THIS DATASET CLICK ON THE DOI-LINK ABOV
    • ā€¦
    corecore