11 research outputs found

    Stable stem enabled Shannon entropies distinguish non-coding RNAs from random backgrounds

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The computational identification of RNAs in genomic sequences requires the identification of signals of RNA sequences. Shannon base pairing entropy is an indicator for RNA secondary structure fold certainty in detection of structural, non-coding RNAs (ncRNAs). Under the Boltzmann ensemble of secondary structures, the probability of a base pair is estimated from its frequency across all the alternative equilibrium structures. However, such an entropy has yet to deliver the desired performance for distinguishing ncRNAs from random sequences. Developing novel methods to improve the entropy measure performance may result in more effective ncRNA gene finding based on structure detection.</p> <p>Results</p> <p>This paper shows that the measuring performance of base pairing entropy can be significantly improved with a constrained secondary structure ensemble in which only canonical base pairs are assumed to occur in energetically stable stems in a fold. This constraint actually reduces the space of the secondary structure and may lower the probabilities of base pairs unfavorable to the native fold. Indeed, base pairing entropies computed with this constrained model demonstrate substantially narrowed gaps of Z-scores between ncRNAs, as well as drastic increases in the Z-score for all 13 tested ncRNA sets, compared to shuffled sequences.</p> <p>Conclusions</p> <p>These results suggest the viability of developing effective structure-based ncRNA gene finding methods by investigating secondary structure ensembles of ncRNAs.</p

    Emerging Topics in Genome Sequencing and Analysis

    Get PDF
    This dissertation studies the emerging topics in genome sequencing and analysis with DNA and RNA. The optimal hybrid sequencing and assembly for accurate genome reconstruction and efficient detection approaches for novel ncRNAs in genomes are discussed. The next-generation sequencing is a significant topic that provides whole genetic information for the further biological research. Recent advances in high-throughput genome sequencing technologies have enabled the systematic study of various genomes by making whole genome sequencing affordable. To date, many hybrid genome assembly algorithms have been developed that can take reads from multiple read sources to reconstruct the original genome. An important aspect of hybrid sequencing and assembly is that the feasibility conditions for genome reconstruction can be satisfied by different combinations of the available read sources, opening up the possibility of optimally combining the sources to minimize the sequencing cost while ensuring accurate genome reconstruction. In this study, we derive the conditions for whole genome reconstruction from multiple read sources at a given confidence level and also introduce the optimal strategy for combining reads from different sources to minimize the overall sequencing cost. We show that the optimal read set, which simultaneously satisfies the feasibility conditions for genome reconstruction and minimizes the sequencing cost, can be effectively predicted through constrained discrete optimization. The availability of genome-wide sequences for a variety of species provides a large database for the further RNA analysis with computational methods. Recent studies have shown that noncoding RNAs (ncRNAs) are known to play crucial roles in various biological processes, and some ncRNAs are related to the genome stability and a variety of inherited diseases. The discovery of novel ncRNAs is hence an important topic, and there is a pressing need for accurate computational detection approaches that can be used to efficiently detect novel ncRNAs in genomes. One important issue is RNA structure alignment for comparative genome analysis, as RNA secondary structures are better conserved than the RNA sequences. Simultaneous RNA alignment and folding algorithms aim to accurately align RNAs by predicting the consensus structure and alignment at the same time, but the computational complexity of the optimal dynamic programming algorithm for simultaneous alignment and folding is extremely high. In this work, we proposed an innovative method, TOPAS, for RNA structural alignment that can efficiently align RNAs through topological networks. Although many ncRNAs are known to have a well conserved secondary structure, which provides useful clues for computational prediction, the prediction of ncRNAs is still challenging, since it has been shown that a structure-based approach alone may not be sufficient for detecting ncRNAs in a single sequence. In this study, we first develop a new approach by utilizing the n-gram model to classify the sequences and extract effective features to capture sequence homology. Based on this approach, we propose an advanced method, piRNAdetect, for reliable computational prediction of piRNAs in genome sequences. Utilizing the n-gram model can enhance the detection of ncRNAs that have sparse folding structures with many unpaired bases. By incorporating the n-gram model with the generalized ensemble defect, which assesses structure conservation and conformation to the consensus structure, we further propose RNAdetect, a novel computational method for accurate detection of ncRNAs through comparative genome analysis. Extensive performance evaluation based on the Rfam database and bacterial genomes demonstrates that our approaches can accurately and reliably detect novel ncRNAs, outperforming the current advanced methods

    RNA inverse folding and synthetic design

    Get PDF
    Thesis advisor: Welkin E. JohnsonThesis advisor: Peter G. CloteSynthetic biology currently is a rapidly emerging discipline, where innovative and interdisciplinary work has led to promising results. Synthetic design of RNA requires novel methods to study and analyze known functional molecules, as well as to generate design candidates that have a high likelihood of being functional. This thesis is primarily focused on the development of novel algorithms for the design of synthetic RNAs. Previous strategies, such as RNAinverse, NUPACK-DESIGN, etc. use heuristic methods, such as adaptive walk, ensemble defect optimization (a form of simulated annealing), genetic algorithms, etc. to generate sequences that minimize specific measures (probability of the target structure, ensemble defect). In contrast, our approach is to generate a large number of sequences whose minimum free energy structure is identical to the target design structure, and subsequently filter with respect to different criteria in order to select the most promising candidates for biochemical validation. In addition, our software must be made accessible and user-friendly, thus allowing researchers from different backgrounds to use our software in their work. Therefore, the work presented in this thesis concerns three areas: Create a potent, versatile and user friendly RNA inverse folding algorithm suitable for the specific requirements of each project, implement tools to analyze the properties that differentiate known functional RNA structures, and use these methods for synthetic design of de-novo functional RNA molecules.Thesis (PhD) — Boston College, 2016.Submitted to: Boston College. Graduate School of Arts and Sciences.Discipline: Biology

    Analysis of the protein-Ligand and protein-peptide interactions using a combined sequence- and structure-based approach

    Get PDF
    Proteins participate in most of the important processes in cells, and their ability to perform their function ultimately depends on their three-dimensional structure. They usually act in these processes through interactions with other molecules. Because of the importance of their role, proteins are also the common target for small molecule drugs that inhibit their activity, which may include targeting protein interactions. Understanding protein interactions and how they are affected by mutations is thus crucial for combating drug resistance and aiding drug design. This dissertation combines bioinformatics studies of protein interactions at both primary sequence and structural level. We analyse protein-protein interactions through linear motifs, as well as protein-small molecule interactions, and study how mutations affect them. This is done in the context of two systems. In the first study of drug resistance mutations in the protease of the human immunodeficiency virus type 1, we successfully apply molecular dynamics simulations to estimate the effects of known resistance-associated mutations on the free binding energy, also revealing molecular mechanisms of resistance. In the second study, we analyse consensus profiles of linear motifs that mediate the recognition by the mitogen-activated protein kinases of their target proteins. We thus gain insights into the cellular processes these proteins are involved in.Proteine sind an den meisten wichtigen Prozessen in Zellen beteiligt, und ihre Fähigkeit, ihre Funktion zu erfüllen, hängt letztlich von ihrer dreidimensionalen Struktur ab. In diesen Prozessen wirken sie normalerweise durch Wechselwirkungen mit anderen Molekülen. Aufgrund der Bedeutung ihrer Rolle sind Proteine auch die häufigsten Angriffspunkte für niedermolekulare Wirkstoffe, die ihre Aktivität hemmen. Dies kann das Targeting von Proteinwechselwirkungen umfassen. Um Wechselwirkungen mit Medikamenten zu bekämpfen und das Wirkstoffdesign zu unterstützen, ist es wichtig, die Wechselwirkungen zwischen Proteinen und deren Einfluss auf Mutationen zu verstehen. Diese Dissertation kombiniert bioinformatische Studien zu Proteinwechselwirkungen sowohl auf primärer als auch auf struktureller Ebene. Wir analysieren Protein-Protein-Wechselwirkungen anhand linearer Motive sowie Protein-Kleinmolekül-Wechselwirkungen und untersuchen, wie sich Mutationen auf sie auswirken. Dies wird untersucht im Kontext von zwei Systemen. In der ersten Studie zu Resistenzmutationen in der Protease des humanen Immundefizienzvirus Typ 1 haben wir molekulardynamische Simulationen erfolgreich eingesetzt, um die Auswirkungen bekannter Resistenz-assoziierter Mutationen auf die freie Bindungsenergie abzuschätzen und molekulare Resistenzmechanismen aufzuzeigen. In der zweiten Studie analysieren wir Konsensusprofile von linearen Motiven, die die Erkennung der Zielproteine durch die Mitogen-aktivierten Proteinkinasen vermitteln. So gewinnen wir Einblick in die zellulären Prozesse, an denen diese Proteine beteiligt sind

    A complex systems approach to education in Switzerland

    Get PDF
    The insights gained from the study of complex systems in biological, social, and engineered systems enables us not only to observe and understand, but also to actively design systems which will be capable of successfully coping with complex and dynamically changing situations. The methods and mindset required for this approach have been applied to educational systems with their diverse levels of scale and complexity. Based on the general case made by Yaneer Bar-Yam, this paper applies the complex systems approach to the educational system in Switzerland. It confirms that the complex systems approach is valid. Indeed, many recommendations made for the general case have already been implemented in the Swiss education system. To address existing problems and difficulties, further steps are recommended. This paper contributes to the further establishment complex systems approach by shedding light on an area which concerns us all, which is a frequent topic of discussion and dispute among politicians and the public, where billions of dollars have been spent without achieving the desired results, and where it is difficult to directly derive consequences from actions taken. The analysis of the education system's different levels, their complexity and scale will clarify how such a dynamic system should be approached, and how it can be guided towards the desired performance

    Task Allocation in Foraging Robot Swarms:The Role of Information Sharing

    Get PDF
    Autonomous task allocation is a desirable feature of robot swarms that collect and deliver items in scenarios where congestion, caused by accumulated items or robots, can temporarily interfere with swarm behaviour. In such settings, self-regulation of workforce can prevent unnecessary energy consumption. We explore two types of self-regulation: non-social, where robots become idle upon experiencing congestion, and social, where robots broadcast information about congestion to their team mates in order to socially inhibit foraging. We show that while both types of self-regulation can lead to improved energy efficiency and increase the amount of resource collected, the speed with which information about congestion flows through a swarm affects the scalability of these algorithms
    corecore