194 research outputs found

    CAD Tools for DNA Micro-Array Design, Manufacture and Application

    Get PDF
    Motivation: As the human genome project progresses and some microbial and eukaryotic genomes are recognized, numerous biotechnological processes have attracted increasing number of biologists, bioengineers and computer scientists recently. Biotechnological processes profoundly involve production and analysis of highthroughput experimental data. Numerous sequence libraries of DNA and protein structures of a large number of micro-organisms and a variety of other databases related to biology and chemistry are available. For example, microarray technology, a novel biotechnology, promises to monitor the whole genome at once, so that researchers can study the whole genome on the global level and have a better picture of the expressions among millions of genes simultaneously. Today, it is widely used in many fields- disease diagnosis, gene classification, gene regulatory network, and drug discovery. For example, designing organism specific microarray and analysis of experimental data require combining heterogeneous computational tools that usually differ in the data format; such as, GeneMark for ORF extraction, Promide for DNA probe selection, Chip for probe placement on microarray chip, BLAST to compare sequences, MEGA for phylogenetic analysis, and ClustalX for multiple alignments. Solution: Surprisingly enough, despite huge research efforts invested in DNA array applications, very few works are devoted to computer-aided optimization of DNA array design and manufacturing. Current design practices are dominated by ad-hoc heuristics incorporated in proprietary tools with unknown suboptimality. This will soon become a bottleneck for the new generation of high-density arrays, such as the ones currently being designed at Perlegen [109]. The goal of the already accomplished research was to develop highly scalable tools, with predictable runtime and quality, for cost-effective, computer-aided design and manufacturing of DNA probe arrays. We illustrate the utility of our approach by taking a concrete example of combining the design tools of microarray technology for Harpes B virus DNA data

    SIMARD: A simulated annealing based RNA design algorithm with quality pre-selection strategies

    Get PDF
    Most of the biological processes including expression levels of genes and translation of DNA to produce proteins within cells depend on RNA sequences, and the structure of the RNA plays vital role for its function. RNA design problem refers to the design of an RNA sequence that folds into given secondary structure. However, vast number of possible nucleotide combinations make this an NP-Hard problem. To solve the RNA design problem, a number of researchers have tried to implement algorithms using local stochastic search, context-free grammars, global sampling or evolutionary programming approaches. In this paper, we examine SIMARD, an RNA design algorithm that implements simulated annealing techniques. We also propose QPS, a mutation operator for SIMARD that pre-selects high quality sequences. Furthermore, we present experiment results of SIMARD compared to eight other RNA design algorithms using the Rfam datset. The experiment results indicate that SIMARD shows promising results in terms of Hamming distance between designed sequence and the target structure, and outperforms ERD in terms of free energy. © 2016 IEEE

    The identification of biologically important secondary structures in disease-causing RNA viruses

    Get PDF
    Masters of ScienceViral genomes consist of either deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The viral RNA molecules are responsible for two functions, firstly, their sequences contain the genetic code, which encodes the viral proteins, and secondly, they may form structural elements important in the regulation of the viral life-cycle. Using a host of computational and bioinformatics techniques we investigated how predicted secondary structure may influence the evolutionary dynamics of a group of single-stranded RNA viruses from the Picornaviridae family. We detected significant and marginally significant correlations between regions predicted to be structured and synonymous substitution constraints in these regions, suggesting that selection may be acting on those sites to maintain the integrity of certain structures. Additionally, coevolution analysis showed that nucleotides predicted to be base paired, tended to co-evolve with one another in a complimentary fashion in four out of the eleven species examined. Our analyses were then focused on individual structural elements within the genome-wide predicted structures. We ranked the predicted secondary structural elements according to their degree of evolutionary conservation, their associated synonymous substitution rates and the degree to which nucleotides predicted to be base paired coevolved with one another. Top ranking structures coincided with well characterized secondary structures that have been previously described in the literature. We also assessed the impact that genomic secondary structures had on the recombinational dynamics of picornavirus genomes, observing a strong tendency for recombination breakpoints to occur in non-coding regions. However, convincing evidence for the association between the distribution of predicted RNA structural elements and breakpoint clustering was not detected

    Computational Design and Experimental Validation of Functional Ribonucleic Acid Nanostructures

    Get PDF
    In living cells, two major classes of ribonucleic acid (RNA) molecules can be found. The first class called the messenger RNA (mRNA) contains the genetic information that allows the ribosome to read and translate it into proteins. The second class called non-coding RNA (ncRNA), do not code for proteins and are involved with key cellular processes, such as gene expression regulation, splicing, differentiation, and development. NcRNAs fold into an ensemble of thermodynamically stable secondary structures, which will eventually lead the molecule to fold into a specific 3D structure. It is widely known that ncRNAs carry their functions via their 3D structures as well as their molecular composition. The secondary structure of ncRNAs is composed of different types of structural elements (motifs) such as stacking base pairs, internal loops, hairpin loops and pseudoknots. Pseudoknots are specifically difficult to model, are abundant in nature and known to stabilize the functional form of the molecule. Due to the diverse range of functions of ncRNAs, their computational design and analysis have numerous applications in nano-technology, therapeutics, synthetic biology, and materials engineering. The RNA design problem is to find novel RNA sequences that are predicted to fold into target structure(s) while satisfying specific qualitative characteristics and constraints. RNA design can be modeled as a combinatorial optimization problem (COP) and is known to be computationally challenging or more precisely NP-hard. Numerous algorithms to solve the RNA design problem have been developed over the past two decades, however mostly ignore pseudoknots and therefore limit application to only a slice of real-world modeling and design problems. Moreover, the few existing pseudoknot designer methods which were developed only recently, do not provide any evidence about the applicability of their proposed design methodology in biological contexts. The two objectives of this thesis are set to address these two shortcomings. First, we are interested in developing an efficient computational method for the design of RNA secondary structures including pseudoknots that show significantly improved in-silico quality characteristics than the state of the art. Second, we are interested in showing the real-world worthiness of the proposed method by validating it experimentally. More precisely, our aim is to design instances of certain types of RNA enzymes (i.e. ribozymes) and demonstrate that they are functionally active. This would likely only happen if their predicted folding matched their actual folding in the in-vitro experiments. In this thesis, we present four contributions. First, we propose a novel adaptive defect weighted sampling algorithm to efficiently solve the RNA secondary structure design problem where pseudoknots are included. We compare the performance of our design algorithm with the state of the art and show that our method generates molecules that are thermodynamically more stable and less defective than those generated by state of the art methods. Moreover, we show when the effect of fitness evaluation is decoupled from the search and optimization process, our optimization method converges faster than the non-dominated sorting genetic algorithm (NSGA II) and the ant colony optimization (ACO) algorithm do. Second, we use our algorithmic development to implement an RNA design pipeline called Enzymer and make it available as an open source package useful for wet lab practitioners and RNA bioinformaticians. Enzymer uses multiple sequence alignment (MSA) data to generate initial design templates for further optimization. Our design pipeline can then be used to re-engineer naturally occurring RNA enzymes such as ribozymes and riboswitches. Our first and second contributions are published in the RNA section of the Journal of Frontiers in Genetics. Third, we use Enzymer to reengineer three different species of pseudoknotted ribozymes: a hammerhead ribozyme from the mouse gut metagenome, a hammerhead ribozyme from Yarrowia lipolytica and a glmS ribozyme from Thermoanaerobacter tengcogensis. We designed a total of 18 ribozyme sequences and showed the 16 of them were active in-vitro. Our experimental results have been submitted to the RNA journal and strongly suggest that Enzymer is a reliable tool to design pseudoknotted ncRNAs with desired secondary structure. Finally, we propose a novel architecture for a new ribozyme-based gene regulatory network where a hammerhead ribozyme modulates expression of a reporter gene when an external stimulus IPTG is present. Our in-vivo results show expected results in 7 out of 12 cases

    Computational protein design: assessment and applications

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Computational protein design aims at designing amino acid sequences that can fold into a target structure and perform a desired function. Many computational design methods have been developed and their applications have been successful during past two decades. However, the success rate of protein design remains too low to be of a useful tool by biochemists whom are not an expert of computational biology. In this dissertation, we first developed novel computational assessment techniques to assess several state-of-the-art computational techniques. We found that significant progresses were made in several important measures by two new scoring functions from RosettaDesign and from OSCAR-design, respectively. We also developed the first machine-learning technique called SPIN that predicts a sequence profile compatible to a given structure with a novel nonlocal energy-based feature. The accuracy of predicted sequences is comparable to RosettaDesign in term of sequence identity to wild type sequences. In the last two application chapters, we have designed self-inhibitory peptides of Escherichia coli methionine aminopeptidase (EcMetAP) and de novo designed barstar. Several peptides were confirmed inhibition of EcMetAP at the micromole-range 50% inhibitory concentration. Meanwhile, the assessment of designed barstar sequences indicates the improvement of OSCAR-design over RosettaDesign

    Analysis, Design, and Construction of Nucleic Acid Devices

    Get PDF
    Nucleic acids present great promise as building blocks for nanoscale devices. To achieve this potential, methods for the analysis and design of DNA and RNA need to be improved. In this thesis, traditional algorithms for analyzing nucleic acids at equilibrium are extended to handle a class of pseudoknots, with examples provided relevant to biologists and bioengineers. With these analytical tools in hand, nucleic acid sequences are designed to maximize the equilibrium probability of a desired fold. Upon analysis, it is concluded that both affinity and specificity are important when choosing a sequence; this conclusion holds for a wide range of target structures and is robust to random perturbations to the energy model. Applying the intuition gained from these studies, a process called hybridization chain reaction (HCR) is invented, and sequences are chosen that experimentally verify this phenomenon. In HCR, a small number of DNA or RNA molecules trigger a system wide configurational change, allowing the amplification and detection of specific, nucleic acid sequences. As an extension, HCR is combined with a pre-existing aptamer domain to successfully construct an ATP sensor, and the groundwork is laid for the future development of sensors for other small molecules. In addition, recent studies on multi-stranded algorithms and improvements to HCR are included in the appendices. Not only will these advancements increase our understanding of biological RNAs, but they will also provide valuable tools for the future development of nucleic acid nanotechnologies

    Simulated Annealing

    Get PDF
    The book contains 15 chapters presenting recent contributions of top researchers working with Simulated Annealing (SA). Although it represents a small sample of the research activity on SA, the book will certainly serve as a valuable tool for researchers interested in getting involved in this multidisciplinary field. In fact, one of the salient features is that the book is highly multidisciplinary in terms of application areas since it assembles experts from the fields of Biology, Telecommunications, Geology, Electronics and Medicine

    Using SetPSO to determine RNA secondary structure

    Get PDF
    RNA secondary structure prediction is an important field in Bioinformatics. A number of different approaches have been developed to simplify the determination of RNA molecule structures. RNA is a nucleic acid found in living organisms which fulfils a number of important roles in living cells. Knowledge of its structure is crucial in the understanding of its function. Determining RNA secondary structure computationally, rather than by physical means, has the advantage of being a quicker and cheaper method. This dissertation introduces a new Set-based Particle Swarm Optimisation algorithm, known as SetPSO for short, to optimise the structure of an RNA molecule, using an advanced thermodynamic model. Structure prediction is modelled as an energy minimisation problem. Particle swarm optimisation is a simple but effective stochastic optimisation technique developed by Kennedy and Eberhart. This simple technique was adapted to work with variable length particles which consist of a set of elements rather than a vector of real numbers. The effectiveness of this structure prediction approach was compared to that of a dynamic programming algorithm called mfold. It was found that SetPSO can be used as a combinatorial optimisation technique which can be applied to the problem of RNA secondary structure prediction. This research also included an investigation into the behaviour of the new SetPSO optimisation algorithm. Further study needs to be conducted to evaluate the performance of SetPSO on different combinatorial and set-based optimisation problems.Dissertation (MS)--University of Pretoria, 2009.Computer Scienceunrestricte

    Development of genetic algorithm for optimisation of predicted membrane protein structures

    Get PDF
    Due to the inherent problems with their structural elucidation in the laboratory, the computational prediction of membrane protein structure is an essential step toward understanding the function of these leading targets for drug discovery. In this work, the development of a genetic algorithm technique is described that is able to generate predictive 3D structures of membrane proteins in an ab initio fashion that possess high stability and similarity to the native structure. This is accomplished through optimisation of the distances between TM regions and the end-on rotation of each TM helix. The starting point for the genetic algorithm is from the model of general TM region arrangement predicted using the TMRelate program. From these approximate starting coordinates, the TMBuilder program is used to generate the helical backbone 3D coordinates. The amino acid side chains are constructed using the MaxSprout algorithm. The genetic algorithm is designed to represent a TM protein structure by encoding each alpha carbon atom starting position, the starting atom of the initial residue of each helix, and operates by manipulating these starting positions. To evaluate each predicted structure, the SwissPDBViewer software (incorporating the GROMOS force field software) is employed to calculate the free potential energy. For the first time, a GA has been successfully applied to the problem of predicting membrane protein structure. Comparison between newly predicted structures (tests) and the native structure (control) indicate that the developed GA approach represents an efficient and fast method for refinement of predicted TM protein structures. Further enhancement of the performance of the GA allows the TMGA system to generate predictive structures with comparable energetic stability and reasonable structural similarity to the native structure
    corecore