1,270 research outputs found

    Finite state machines implementation using DNA Techniques

    Get PDF
    Abstract A finite-state machine (FSM) is an abstract mathematical model of computation used to design both computer programs and sequential logic circuits. Considered as an abstract model of computation, the finite state machine is weak; it has less computational power than some other models of computation such as the Turing machine. This paper overview the finite-state automata based on Deoxyribonucleic Acid (DNA). Such automata uses massive parallel processing offered by molecular approach for computation and exhibits a number of advantages over traditional electronic implementations

    Computational and Experimental Approaches to Reveal the Effects of Single Nucleotide Polymorphisms with Respect to Disease Diagnostics

    Get PDF
    DNA mutations are the cause of many human diseases and they are the reason for natural differences among individuals by affecting the structure, function, interactions, and other properties of DNA and expressed proteins. The ability to predict whether a given mutation is disease-causing or harmless is of great importance for the early detection of patients with a high risk of developing a particular disease and would pave the way for personalized medicine and diagnostics. Here we review existing methods and techniques to study and predict the effects of DNA mutations from three different perspectives: in silico, in vitro and in vivo. It is emphasized that the problem is complicated and successful detection of a pathogenic mutation frequently requires a combination of several methods and a knowledge of the biological phenomena associated with the corresponding macromolecules

    Using Expressing Sequence Tags to Improve Gene Structure Annotation

    Get PDF
    Finding all gene structures is a crucial step in obtaining valuable information from genomic sequences. It is still a challenging problem, especially for vertebrate genomes, such as the human genome. Expressed Sequence Tags (ESTs) provide a tremendous resource for determining intron-exon structures. However, they are short and error prone, which prevents existing methods from exploiting EST information efficiently. This dissertation addresses three aspects of using ESTs for gene structure annotation. The first aspect is using ESTs to improve de novo gene prediction. Probability models are introduced for EST alignments to genomic sequence in exons, introns, interknit regions, splice sites and UTRs, representing the EST alignment patterns in these regions. New gene prediction systems were developed by combining the EST alignments with comparative genomics gene prediction systems, such as TWINSCAN and N-SCAN, so that they can predict gene structures more accurately where EST alignments exist without compromising their ability to predict gene structures where no EST exists. The accuracy of TWINSCAN_EST and NSCAN_EST is shown to be substantially better than any existing methods without using full-length cDNA or protein similarity information. The second aspect is using ESTs and de novo gene prediction to guide biology experiments, such as finding full ORF-containing-cDNA clones, which provide the most direct experimental evidence for gene structures. A probability model was introduced to guide experiments by summing over gene structure models consistent with EST alignments. The last aspect is a novel EST-to-genome alignment program called QPAIRAGON to improve the alignment accuracy by using EST sequencing quality values. Gene prediction accuracy can be improved by using this new EST-to-genome alignment program. It can also be used for many other bioinformatics applications, such as SNP finding and alternative splicing site prediction

    MICROBIAL CONTRIBUTIONS TO DISEASE PHENOTYPES

    Get PDF
    The unseen world of microbes has a profound affect on everyday life. Complex microbial communities play a role in everything from climate regulation to human health and disease pathogenesis. Advancements in the field of Metagenomics are providing a window into the world of microbial communities with an unprecedented resolution. Next-generation sequencing technology is allowing researchers to describe the relationships between these complex microbial communities and their host environments. The research in this dissertation investigates these complex microbial host relationships and the various tools and techniques needed to conduct metagenomic research. Chapter 1 presents a current overview of techniques at the disposal of researchers conducting metagenomics experiments. Topics discussed include qualitative DNA fingerprinting techniques, comparison between Next-generation sequencing platforms, and how to handle statistical analysis of large metagenomic datasets. Chapter 2 deals with the development of Peak Studio, a platform independent graphical user interface, intended to be a pre-processing tool for researchers conducting DNA fingerprinting experiments. Chapter 3 explores how time and microenvironment influence the structure of gut microbial communities in a mouse model. Two experimental cohorts of mice are analyzed through the use of Illumina HiSeq sequencing of the 16S rRNA targeted V6 hypervariable region. Also considered are the effects over time of inoculating mice with a founder microbial community. In total, this dissertation emphasizes the importance of experimental design and the development and use of technology in the exploration of complex microbial communities

    Investigation of type-I interferon dysregulation by arenaviruses : a multidisciplinary approach.

    Full text link

    Text Mining and Gene Expression Analysis Towards Combined Interpretation of High Throughput Data

    Get PDF
    Microarrays can capture gene expression activity for thousands of genes simultaneously and thus make it possible to analyze cell physiology and disease processes on molecular level. The interpretation of microarray gene expression experiments profits from knowledge on the analyzed genes and proteins and the biochemical networks in which they play a role. The trend is towards the development of data analysis methods that integrate diverse data types. Currently, the most comprehensive biomedical knowledge source is a large repository of free text articles. Text mining makes it possible to automatically extract and use information from texts. This thesis addresses two key aspects, biomedical text mining and gene expression data analysis, with the focus on providing high-quality methods and data that contribute to the development of integrated analysis approaches. The work is structured in three parts. Each part begins by providing the relevant background, and each chapter describes the developed methods as well as applications and results. Part I deals with biomedical text mining: Chapter 2 summarizes the relevant background of text mining; it describes text mining fundamentals, important text mining tasks, applications and particularities of text mining in the biomedical domain, and evaluation issues. In Chapter 3, a method for generating high-quality gene and protein name dictionaries is described. The analysis of the generated dictionaries revealed important properties of individual nomenclatures and the used databases (Fundel and Zimmer, 2006). The dictionaries are publicly available via a Wiki, a web service, and several client applications (Szugat et al., 2005). In Chapter 4, methods for the dictionary-based recognition of gene and protein names in texts and their mapping onto unique database identifiers are described. These methods make it possible to extract information from texts and to integrate text-derived information with data from other sources. Three named entity identification systems have been set up, two of them building upon the previously existing tool ProMiner (Hanisch et al., 2003). All of them have shown very good performance in the BioCreAtIvE challenges (Fundel et al., 2005a; Hanisch et al., 2005; Fundel and Zimmer, 2007). In Chapter 5, a new method for relation extraction (Fundel et al., 2007) is presented. It was applied on the largest collection of biomedical literature abstracts, and thus a comprehensive network of human gene and protein relations has been generated. A classification approach (Küffner et al., 2006) can be used to specify relation types further; e. g., as activating, direct physical, or gene regulatory relation. Part II deals with gene expression data analysis: Gene expression data needs to be processed so that differentially expressed genes can be identified. Gene expression data processing consists of several sequential steps. Two important steps are normalization, which aims at removing systematic variances between measurements, and quantification of differential expression by p-value and fold change determination. Numerous methods exist for these tasks. Chapter 6 describes the relevant background of gene expression data analysis; it presents the biological and technical principles of microarrays and gives an overview of the most relevant data processing steps. Finally, it provides a short introduction to osteoarthritis, which is in the focus of the analyzed gene expression data sets. In Chapter 7, quality criteria for the selection of normalization methods are described, and a method for the identification of differentially expressed genes is proposed, which is appropriate for data with large intensity variances between spots representing the same gene (Fundel et al., 2005b). Furthermore, a system is described that selects an appropriate combination of feature selection method and classifier, and thus identifies genes which lead to good classification results and show consistent behavior in different sample subgroups (Davis et al., 2006). The analysis of several gene expression data sets dealing with osteoarthritis is described in Chapter 8. This chapter contains the biomedical analysis of relevant disease processes and distinct disease stages (Aigner et al., 2006a), and a comparison of various microarray platforms and osteoarthritis models. Part III deals with integrated approaches and thus provides the connection between parts I and II: Chapter 9 gives an overview of different types of integrated data analysis approaches, with a focus on approaches that integrate gene expression data with manually compiled data, large-scale networks, or text mining. In Chapter 10, a method for the identification of genes which are consistently regulated and have a coherent literature background (Küffner et al., 2005) is described. This method indicates how gene and protein name identification and gene expression data can be integrated to return clusters which contain genes that are relevant for the respective experiment together with literature information that supports interpretation. Finally, in Chapter 11 ideas on how the described methods can contribute to current research and possible future directions are presented

    Rote-LCS learning classifier system for classification and prediction

    Get PDF
    Machine Learning (ML) involves the use of computer algorithms to solve for approximate solutions to problems with large, complex search spaces. Such problems have no known solution method, and search spaces too large to allow brute force search to be feasible. Evolutionary algorithms (EA) are a subset of machine learning algorithms which simulate fundamental concepts of evolution. EAs do not guarantee a perfect solution, but rather facilitate convergence to a solution of which the accuracy depends on a given EA\u27s learning architecture and the dynamics of the problem. Learning classifier systems (LCS) are algorithms comprising a subset of EAs. The Rote-LCS is a novel Pittsburgh-style LCS for supervised learning problems. The Rote models a solution space as a hyper-rectangle, where each independent variable represents a dimension. Rote rules are formed by binary trees with logical operators (decision trees) with relational hypotheses comprising the terminal nodes. In this representation, sub-rules (minor-hypotheses) are partitions on hyper-planes, and rules (major-hypotheses) are multidimensional partitions. The Rote-LCS has exhibited very high accuracy on classification problems, particularly Boolean problems, thus far. The Rote-LCS offers an additional attribute uncommon among machine learning algorithms - human readable solutions. Despite representing a multidimensional search space, Rote solutions may be graphed as two-dimensional trees. This makes the Rote-LCS a good candidate for supervised classification problems where insight is needed into the dynamics of a problem. Solutions generated by Rote-LCS could prospectively be used by scientists to form hypotheses regarding interactions between independent variables of a given problem. --Abstract, page iv

    Leveraging EST Evidence to Automatically Predict Alternatively Spliced Genes, Master\u27s Thesis, December 2006

    Get PDF
    Current methods for high-throughput automatic annotation of newly sequenced genomes are largely limited to tools which predict only one transcript per gene locus. Evidence suggests that 20-50% of genes in higher eukariotic organisms are alternatively spliced. This leaves the remainder of the transcripts to be annotated by hand, an expensive time-consuming process. Genomes are being sequenced at a much higher rate than they can be annotated. We present three methods for using the alignments of inexpensive Expressed Sequence Tags in combination with HMM-based gene prediction with N-SCAN EST to recreate the vast majority of hand annotations in the D.melanogaster genome. In our first method, we “piece together” N-SCAN EST predictions with clustered EST alignments to increase the number of transcripts per locus predicted. This is shown to be a sensitve and accurate method, predicting the vast majority of known transcripts in the D.melanogaster genome. We present an approach of using these clusters of EST alignments to construct a Multi-Pass gene prediction phase, again, piecing it together with clusters of EST alignments. While time consuming, Multi-Pass gene prediction is very accurate and more sensitive than single-pass. Finally, we present a new Hidden Markov Model instance, which augments the current N-SCAN EST HMM, that predicts multiple splice forms in a single pass of prediction. This method is less time consuming, and performs nearly as well as the multi-pass approach
    corecore