14 research outputs found

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum

    Local protein structures to bridge sequence-structure knowledge

    Get PDF
    Protein sequences can be classified based on their structure similarity and/or common evolutionary origin called structural class. Information on structural class is readily available, easing the protein structure and protein function probing. SCOP and CATH are two prominent classification schemes used to assign the structural class of proteins. Both schemes determine the structural class manually base on known protein tertiary structures. However, the quantity of known protein sequences is growing exponentially with respect to the quantity of known tertiary proteins structures. Although SCOP and CATH are examples of well-established databases that contain more reliable information of structural class, yet the lack of known structural class of protein due to the laborious wet-lab experimental routine limits the high-throughput structural class assignment. The fact that this is a tedious and time-consuming manually-determined method has further limited the structural class assignment. As a consequence, the assignment of structural class by computational method suffers from the arbitrated statistical infer-ence. Thus, this study aims to provide a structural class prediction method that can acquire the knowledge of local protein structures, derived from known excessive primary sequences, in order to produce high-throughput sequence-structure class assignment instead of the laborious experimental based method. This structural class prediction method is termed as SVM-LpsSCPred

    Bayesian Methods in Brain Connectivity Change Point Detection with EEG Data and Genetic Algorithm

    Get PDF
    Human brain is processing a great amount of information everyday, and our brain regions are organized optimally for this information processing. There have been increasing number of studies focusing on functional or effective connectivity in human brain regions in the last decade. In this dissertation, Bayesian methods in Brain connectivity change point detection are discussed. First, a review of state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI) data is carried out, three methods are reviewed and compared. Second, the Bayesian connectivity change point model is extended to change point analysis in electroencephalogram (EEG) data, and the ability of EEG measures of frontal and temporo-parietal activity during mindfulness therapy to track response to dysfunctional anxiety patients\u27 treatment is tested successfully. Then an optimized method for Bayesian connectivity change point model with genetic algorithm (GA) is proposed and proved to be more efficient in change point detection. And due to the good parallel performance of GA, the change point detection method can be parallelized in GPU or multi-processor computers as a future work. Furthermore, a more advanced Bayesian bi-cluster connectivity change point model is developed to simultaneously detect change point of each subject within a group, and cluster subjects into different groups according to their change point distribution and connectivity dynamics. The method is also validated on experimental datasets. After discussing brain change point detection, a review of Bayesian analysis of complex mutations in HBV HCV and HIV studies is also included as part of my Ph.D. work. Finally, conclusions are drawn and future work is discussed

    Recent advances in inferring viral diversity from high-throughput sequencing data

    Get PDF
    Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.ISSN:0168-170

    Large scale rigidity-based flexibility analysis of biomolecules

    Get PDF
    KINematics And RIgidity (KINARI) is an on-going project for in silico flexibility analysis of proteins. The new version of the software, Kinari-2, extends the function- ality of our free web server KinariWeb, incorporates advanced web technologies, emphasizes the reproducibility of its experiments, and makes substantially improved tools available to the user. It is designed specifically for large scale experiments, in particular, for (a) very large molecules, including bioassemblies with high degree of symmetry such as viruses and crystals, (b) large collections of related biomolecules, such as those obtained through simulated dilutions, mutations, or conformational changes from various types of dynamics simulations, and (c) is intended to work as seemlessly as possible on the large, idiosyncratic, publicly available repository of biomolecules, the Protein Data Bank. We describe the system design, along with the main data processing, computational, mathematical, and validation challenges under- lying this phase of the KINARI project

    Computational Methods for Sequencing and Analysis of Heterogeneous RNA Populations

    Get PDF
    Next-generation sequencing (NGS) and mass spectrometry technologies bring unprecedented throughput, scalability and speed, facilitating the studies of biological systems. These technologies allow to sequence and analyze heterogeneous RNA populations rather than single sequences. In particular, they provide the opportunity to implement massive viral surveillance and transcriptome quantification. However, in order to fully exploit the capabilities of NGS technology we need to develop computational methods able to analyze billions of reads for assembly and characterization of sampled RNA populations. In this work we present novel computational methods for cost- and time-effective analysis of sequencing data from viral and RNA samples. In particular, we describe: i) computational methods for transcriptome reconstruction and quantification; ii) method for mass spectrometry data analysis; iii) combinatorial pooling method; iv) computational methods for analysis of intra-host viral populations

    Rechecking the Centrality-Lethality Rule in the Scope of Protein Subcellular Localization Interaction Networks

    Get PDF
    Essential proteins are indispensable for living organisms to maintain life activities and play important roles in the studies of pathology, synthetic biology, and drug design. Therefore, besides experiment methods, many computational methods are proposed to identify essential proteins. Based on the centrality-lethality rule, various centrality methods are employed to predict essential proteins in a Protein-protein Interaction Network (PIN). However, neglecting the temporal and spatial features of protein-protein interactions, the centrality scores calculated by centrality methods are not effective enough for measuring the essentiality of proteins in a PIN. Moreover, many methods, which overfit with the features of essential proteins for one species, may perform poor for other species. In this paper, we demonstrate that the centrality-lethality rule also exists in Protein Subcellular Localization Interaction Networks (PSLINs). To do this, a method based on Localization Specificity for Essential protein Detection (LSED), was proposed, which can be combined with any centrality method for calculating the improved centrality scores by taking into consideration PSLINs in which proteins play their roles. In this study, LSED was combined with eight centrality methods separately to calculate Localization-specific Centrality Scores (LCSs) for proteins based on the PSLINs of four species (Saccharomyces cerevisiae, Homo sapiens, Mus musculus and Drosophila melanogaster). Compared to the proteins with high centrality scores measured from the global PINs, more proteins with high LCSs measured from PSLINs are essential. It indicates that proteins with high LCSs measured from PSLINs are more likely to be essential and the performance of centrality methods can be improved by LSED. Furthermore, LSED provides a wide applicable prediction model to identify essential proteins for different species

    Optimization Techniques For Next-Generation Sequencing Data Analysis

    Get PDF
    High-throughput RNA sequencing (RNA-Seq) is a popular cost-efficient technology with many medical and biological applications. This technology, however, presents a number of computational challenges in reconstructing full-length transcripts and accurately estimate their abundances across all cell types. Our contributions include (1) transcript and gene expression level estimation methods, (2) methods for genome-guided and annotation-guided transcriptome reconstruction, and (3) de novo assembly and annotation of real data sets. Transcript expression level estimation, also referred to as transcriptome quantification, tackle the problem of estimating the expression level of each transcript. Transcriptome quantification analysis is crucial to determine similar transcripts or unraveling gene functions and transcription regulation mechanisms. We propose a novel simulated regression based method for transcriptome frequency estimation from RNA-Seq reads. Transcriptome reconstruction refers to the problem of reconstructing the transcript sequences from the RNA-Seq data. We present genome-guided and annotation-guided transcriptome reconstruction methods. Empirical results on both synthetic and real RNA-seq datasets show that the proposed methods improve transcriptome quantification and reconstruction accuracy compared to currently state of the art methods. We further present the assembly and annotation of Bugula neritina transcriptome (a marine colonial animal), and Tallapoosa darter genome (a species-rich radiation freshwater fish)

    Viral Quasispecies Reconstruction Using Next Generation Sequencing Reads

    Get PDF
    The genomic diversity of viral quasispecies is a subject of great interest, especially for chronic infections. Characterization of viral diversity can be addressed by high-throughput sequencing technology (454 Life Sciences, Illumina, SOLiD, Ion Torrent, etc.). Standard assembly software was originally designed for single genome assembly and cannot be used to assemble and estimate the frequency of closely related quasispecies sequences. This work focuses on parsimonious and maximum likelihood models for assembling viral quasispecies and estimating their frequencies from 454 sequencing data. Our methods have been applied to several RNA viruses (HCV, IBV) as well as DNA viruses (HBV), genotyped using 454 Life Sciences amplicon and shotgun methods

    Major v. Security Equipment Corp. Clerk\u27s Record v. 1 Dckt. 39414

    Get PDF
    https://digitalcommons.law.uidaho.edu/idaho_supreme_court_record_briefs/2187/thumbnail.jp
    corecore