223 research outputs found

    Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome

    Full text link
    Tiling arrays make possible a large scale exploration of the genome thanks to probes which cover the whole genome with very high density until 2 000 000 probes. Biological questions usually addressed are either the expression difference between two conditions or the detection of transcribed regions. In this work we propose to consider simultaneously both questions as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge like annotation and spatial dependence between probes. Since probes are not biologically relevant units we propose a classification rule for non-connected regions covered by several probes. Applications to transcriptomic and ChIP-chip data of Arabidopsis thaliana obtained with a NimbleGen tiling array highlight the importance of a precise modeling and the region classification

    Parsimonious Higher-Order Hidden Markov Models for Improved Array-CGH Analysis with Applications to Arabidopsis thaliana

    Get PDF
    Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. Here, we develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling spatial dependencies. We apply parsimonious higher-order HMMs to the analysis of Array-CGH data of the accessions C24 and Col-0 of the model plant Arabidopsis thaliana. We compare these models against first-order HMMs and other existing methods using a reference of known deletions and sequence deviations. We find that parsimonious higher-order HMMs clearly improve the identification of these polymorphisms. Moreover, we perform a functional analysis of identified polymorphisms revealing novel details of genomic differences between C24 and Col-0. Additional model evaluations are done on widely considered Array-CGH data of human cell lines indicating that parsimonious HMMs are also well-suited for the analysis of non-plant specific data. All these results indicate that parsimonious higher-order HMMs are useful for Array-CGH analyses. An implementation of parsimonious higher-order HMMs is available as part of the open source Java library Jstacs (www.jstacs.de/index.php/PHHMM)

    Utilizing gene pair orientations for HMM-based analysis of promoter array ChIP-chip data

    Get PDF
    Motivation: Array-based analysis of chromatin immunoprecipitation (ChIP-chip) data is a powerful technique for identifying DNA target regions of individual transcription factors. The identification of these target regions from comprehensive promoter array ChIP-chip data is challenging. Here, three approaches for the identification of transcription factor target genes from promoter array ChIP-chip data are presented. We compare (i) a standard log-fold-change analysis (LFC); (ii) a basic method based on a Hidden Markov Model (HMM); and (iii) a new extension of the HMM approach to an HMM with scaled transition matrices (SHMM) that incorporates information about the relative orientation of adjacent gene pairs on DNA

    Spliced alignment and its application in Arabidopsis thaliana

    Get PDF
    This thesis describes the development and biological applications of GeneSeqer, which is a homology-based gene prediction program by means of spliced alignment. Additionally, a program named MyGV was written in JAVA as a browser to visualize the output of GeneSeqer. In order to test and demonstrate the performance, GeneSeqer was utilized to map 176,915 Arabidopsis EST sequences on the whole genome of Arabidopsis thaliana, which consists of five chromosomes, with about 117 million base pairs in total. All results were parsed and imported into a MySQL database. Information that was inferred from the Arabidopsis spliced alignment results may serve as valuable resource for a number of projects of special scientific interest, such as alternative splicing, non-canonical splice sites, mini-exons, etc. We also built AtGDB (Arabidopsis thaliana Genome DataBase, http://www.plantgdb.org/AtGDB/) to interactively browse EST spliced alignments and GenBank annotations for the Arabidopsis genome. Moreover, as one application of the Arabidopsis EST mapping data, U12-type introns were identified from the transcript-confirmed introns in the Arabidopsis genome, and the characteristics of these minor class introns were further explored

    DNA methylation inheritance in Arabidopsis: The next generation

    Get PDF

    DNA methylation inheritance in Arabidopsis: The next generation

    Get PDF
    Ons DNA, de drager van onze erfelijkheid, bestaat uit een code die is samengesteld uit vier letters; A, C, G en T. De genetica leert ons dat er mutaties in het DNA kunnen ontstaan (letter A verandert bijvoorbeeld in letter G) die het functioneren van genen beïnvloeden. Onderzoek aan planten maakt duidelijk dat het functioneren van genen ook beïnvloed kan worden door zogenoemde epigenetische veranderingen. Een bekend voorbeeld hiervan is de verandering van cytosine in 5-methylcytosine (de letter C), en vice versa. Men spreekt in dit geval van DNA-methylering. Een fundamentele doelstelling in de plantbiologie is het vaststellen van de stabiliteit van epigenetische veranderingen over verschillende generaties en de mate waarin deze bijdragen aan de variatie in eigenschappen (bijv. de hoogte).In dit proefschrift laten we resultaten zien van onderzoek aan een experimentele plantenpopulatie van de modelplant Arabidopsis thaliana (zandraket). Het unieke kenmerk van deze populatie is dat de planten bijna identieke DNA-sequenties hebben, maar sterke verschillen in DNA-methylatie patronen. We laten zien dat deze patronen doorgegeven worden aan de opeenvolgende generaties en effect hebben op vele belangrijke planteneigenschappen zoals bloeitijd en wortellengte. We laten ook zien dat methyleringsverschillen spontaan kunnen ontstaan in natuurlijke populaties en dat de snelheid hiervan vele malen groter is dan die van DNA-mutaties.Het feit dat veranderingen in DNA-methylatie worden overgeërfd en bijdragen aan variatie in planteneigenschappen laat zien dat erfelijke informatie verder reikt dan de vier letters waaruit ons DNA is samengesteld en levert nieuwe vraagstukken op aangaande de rol hiervan in plantenevolutie en de toepassing in agrarische teeltprogramma’s.Our DNA, the molecule that stores heritable information, consists of a four letter code; A, C, G and T. Text book genetics tell us that DNA can be mutated (e.g. letter A turns into letter G), and that such mutations can change the functions of genes. In plants, it is becoming increasingly clear that heritable alterations in gene function can also be acquired through so-called epigenetic changes. A well-known example of an epigenetic change is the gain or loss of DNA methylation, the chemical modification of a cytosine (the letter C in the DNA code) into 5-methylcytosine. A fundamental goal in plant biology is to assess how stable epigenetic changes are across generations and to which extent they affect observable traits (e.g. plant height).In this thesis we performed extensive bioinformatic and statistical analyses of an experimental population of the model plant Arabidopsis thaliana. The unique feature of this population is that all plants are nearly identical at the DNA level but show strong differences in DNA methylation patterns. We show that these patterns can be inherited for many generations, and affect important plants traits such as flowering time and root length. We also show that DNA methylation changes occur stochastically in natural populations, and at a rate far exceeding the known DNA mutation rate in this species.The fact that DNA methylation changes affect observable plant traits that are transmitted across generations shows that heritable information extends beyond the four letters encoded in our DNA and opens up new questions regarding its role in plant evolution and its use in agricultural breeding programs

    Extensive Natural Epigenetic Variation At A De Novo Originated Gene.

    Get PDF
    Epigenetic variation, such as heritable changes of DNA methylation, can affect gene expression and thus phenotypes, but examples of natural epimutations are few and little is known about their stability and frequency in nature. Here, we report that the gene Qua-Quine Starch (QQS) of Arabidopsis thaliana, which is involved in starch metabolism and that originated de novo recently, is subject to frequent epigenetic variation in nature. Specifically, we show that expression of this gene varies considerably among natural accessions as well as within populations directly sampled from the wild, and we demonstrate that this variation correlates negatively with the DNA methylation level of repeated sequences located within the 5'end of the gene. Furthermore, we provide extensive evidence that DNA methylation and expression variants can be inherited for several generations and are not linked to DNA sequence changes. Taken together, these observations provide a first indication that de novo originated genes might be particularly prone to epigenetic variation in their initial stages of formation.9e100343
    corecore