149 research outputs found

    Transcriptional landscape estimation from tiling array data using a model of signal shift and drift

    Get PDF
    Motivation: High-density oligonucleotide tiling array technology holds the promise of a better description of the complexity and the dynamics of transcriptional landscapes. In organisms such as bacteria and yeasts, transcription can be measured on a genome-wide scale with a resolution >25 bp. The statistical models currently used to handle these data remain however very simple, the most popular being the piecewise constant Gaussian model with a fixed number of breakpoints

    Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome

    Full text link
    Tiling arrays make possible a large scale exploration of the genome thanks to probes which cover the whole genome with very high density until 2 000 000 probes. Biological questions usually addressed are either the expression difference between two conditions or the detection of transcribed regions. In this work we propose to consider simultaneously both questions as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge like annotation and spatial dependence between probes. Since probes are not biologically relevant units we propose a classification rule for non-connected regions covered by several probes. Applications to transcriptomic and ChIP-chip data of Arabidopsis thaliana obtained with a NimbleGen tiling array highlight the importance of a precise modeling and the region classification

    Analysis of tiling array expression studies with flexible designs in Bioconductor (waveTiling)

    Get PDF
    Background: Existing statistical methods for tiling array transcriptome data either focus on transcript discovery in one biological or experimental condition or on the detection of differential expression between two conditions. Increasingly often, however, biologists are interested in time-course studies, studies with more than two conditions or even multiple-factor studies. As these studies are currently analyzed with the traditional microarray analysis techniques, they do not exploit the genome-wide nature of tiling array data to its full potential. Results: We present an R Bioconductor package, waveTiling, which implements a wavelet-based model for analyzing transcriptome data and extends it towards more complex experimental designs. With waveTiling the user is able to discover (1) group-wise expressed regions, (2) differentially expressed regions between any two groups in single-factor studies and in (3) multifactorial designs. Moreover, for time-course experiments it is also possible to detect (4) linear time effects and (5) a circadian rhythm of transcripts. By considering the expression values of the individual tiling probes as a function of genomic position, effect regions can be detected regardless of existing annotation. Three case studies with different experimental set-ups illustrate the use and the flexibility of the model-based transcriptome analysis. Conclusions: The waveTiling package provides the user with a convenient tool for the analysis of tiling array trancriptome data for a multitude of experimental set-ups. Regardless of the study design, the probe-wise analysis allows for the detection of transcriptional effects in both exonic, intronic and intergenic regions, without prior consultation of existing annotation

    <i>Staphylococcus aureus </i>Transcriptome Architecture:From Laboratory to Infection-Mimicking Conditions

    Get PDF
    Staphylococcus aureus is a major pathogen that colonizes about 20% of the human population. Intriguingly, this Gram-positive bacterium can survive and thrive under a wide range of different conditions, both inside and outside the human body. Here, we investigated the transcriptional adaptation of S. aureus HG001, a derivative of strain NCTC 8325, across experimental conditions ranging from optimal growth in vitro to intracellular growth in host cells. These data establish an extensive repertoire of transcription units and non-coding RNAs, a classification of 1412 promoters according to their dependence on the RNA polymerase sigma factors SigA or SigB, and allow identification of new potential targets for several known transcription factors. In particular, this study revealed a relatively low abundance of antisense RNAs in S. aureus, where they overlap only 6% of the coding genes, and only 19 antisense RNAs not co-transcribed with other genes were found. Promoter analysis and comparison with Bacillus subtilis links the small number of antisense RNAs to a less profound impact of alternative sigma factors in S. aureus. Furthermore, we revealed that Rho-dependent transcription termination suppresses pervasive antisense transcription, presumably originating from abundant spurious transcription initiation in this A+T-rich genome, which would otherwise affect expression of the overlapped genes. In summary, our study provides genome-wide information on transcriptional regulation and non-coding RNAs in S. aureus as well as new insights into the biological function of Rho and the implications of spurious transcription in bacteria

    Three Essential Ribonucleases—RNase Y, J1, and III—Control the Abundance of a Majority of Bacillus subtilis mRNAs

    Get PDF
    Bacillus subtilis possesses three essential enzymes thought to be involved in mRNA decay to varying degrees, namely RNase Y, RNase J1, and RNase III. Using recently developed high-resolution tiling arrays, we examined the effect of depletion of each of these enzymes on RNA abundance over the whole genome. The data are consistent with a model in which the degradation of a significant number of transcripts is dependent on endonucleolytic cleavage by RNase Y, followed by degradation of the downstream fragment by the 5′–3′ exoribonuclease RNase J1. However, many full-size transcripts also accumulate under conditions of RNase J1 insufficiency, compatible with a model whereby RNase J1 degrades transcripts either directly from the 5′ end or very close to it. Although the abundance of a large number of transcripts was altered by depletion of RNase III, this appears to result primarily from indirect transcriptional effects. Lastly, RNase depletion led to the stabilization of many low-abundance potential regulatory RNAs, both in intergenic regions and in the antisense orientation to known transcripts

    Wavelet-based detection of transcriptional activity on a novel Staphylococcus aureus tiling microarray

    Get PDF
    UPNa. Instituto de Agrobiotecnología. Laboratorio de Biofilms MicrobianosIncluye 7 ficheros de datosBackground: High-density oligonucleotide microarray is an appropriate technology for genomic analysis, and is particulary useful in the generation of transcriptional maps, ChIP-on-chip studies and re-sequencing of the genome. Transcriptome analysis of tiling microarray data facilitates the discovery of novel transcripts and the assessment of differential expression in diverse experimental conditions. Although new technologies such as next-generation sequencing have appeared, microarrays might still be useful for the study of small genomes or for the analysis of genomic regions with custom microarrays due to their lower price and good accuracy in expression quantification. Results: Here, we propose a novel wavelet-based method, named ZCL (zero-crossing lines), for the combined denoising and segmentation of tiling signals. The denoising is performed with the classical SUREshrink method and the detection of transcriptionally active regions is based on the computation of the Continuous Wavelet Transform (CWT). In particular, the detection of the transitions is implemented as the thresholding of the zero-crossing lines. The algorithm described has been applied to the public Saccharomyces cerevisiae dataset and it has been compared with two well-known algorithms: pseudo-median sliding window (PMSW) and the structural change model (SCM). As a proof-of-principle, we applied the ZCL algorithm to the analysis of the custom tiling microarray hybridization results of a S. aureus mutant deficient in the sigma B transcription factor. The challenge was to identify those transcripts whose expression decreases in the absence of sigma B. Conclusions: The proposed method archives the best performance in terms of positive predictive value (PPV) while its sensitivity is similar to the other algorithms used for the comparison. The computation time needed to process the transcriptional signals is low as compared with model-based methods and in the same range to those based on the use of filters. Automatic parameter selection has been incorporated and moreover, it can be easily adapted to a parallel implementation. We can conclude that the proposed method is well suited for the analysis of tiling signals, in which transcriptional activity is often hidden in the noise. Finally, the quantification and differential expression analysis of S. aureus dataset have demonstrated the valuable utility of this novel device to the biological analysis of the S. aureus transcriptome.This work was supported by the Spanish Torres-Quevedo fellowship [PTQ-08-03-07769] to VS. ATA and AMB were supported by Spanish Ministry of Science and Innovation ‘Ramón y Cajal’ contracts. This work was supported by the Spanish Ministry of Science and Innovation Grants BIO2008-05284-C02-01, BFU2011-23222, ERA-NET Pathogenomics PIM2010EPA-00606 and the agreement between ‘Fundación para la Investigación médica aplicada’ (FIMA) and the ’UTE project CIMA’

    Statistical methods for high-throughput genomic data

    Get PDF

    Systems Biology in Industrial Biotechnology and Disease

    Get PDF

    Evolutionary analyses of orphan genes in mouse lineages in the context of de novo gene birth

    No full text
    Gene birth is the process through which new genes appear. For a long time it was argued that the natural way of generating new genes was from copies of existing genes, and the possibility of de novo gene emergence was neglected. However, recent evidence has forced to reconsider old models and de novo gene birth gained recognition as a widespread phenomenon. De novo gene birth is the process by which a non-genic sequence is able to gain gene-like features through few mutations. The following work is a compilation of analyses that seek to highlight the importance and prevalence of de novo gene birth in genomes, suggesting that this is a process that is present at all times and which becomes very relevant upon ecological shifts. In the first chapter, I showed through phylostratigraphic analyses that new genes are substantially simpler than older, a trend which was consistent for several features and organisms, and suggestive of a frequent emergence of new genes through non-duplicative processes. In addition to this, I detected a strong association between gene birth and high transcriptional activity and chromosomal proximity. As part of this work, I was also able to use phylostratigraphy to evaluate a different model of gene birth, overprinting of alternative reading frames. In the following chapters of this dissertation, I made use of high-throughput sequencing of transcriptomes and genomes to ask questions about the origin and change of genes at closer time divergences than ever before, ranging from nearly 3000 years to 10 million years of divergence. I was able to detect the theoretically predicted effects of short time scale comparisons on the rate of protein evolution. Also, I contribute evidence that genes of different ages show different selective constraints even after only a few thousand years of divergence. Finally, in the last part of this thesis I evaluated the role of transcription in gene birth dynamics. Transcription seems to be a predominant feature of genomes, as most of the genome showed some level of transcription. In terms of de novo gene birth, I was able to identify 663 candidate loci from presence and absence of transcription. Analyses of these candidate loci indicated that gains are rather stable, meaning that subsequent losses were rarely found. In agreement with previous studies, I confirmed the role of testis as a driver of new genes. These results indicate that transcription is not a limiting factor in the emergence of new genes, and that our knowledge about the key regulatory elements of transcription and their turnover is still limited to explain why new genes seem to arise at a higher rate than they decay.Contents ......................................................................................................................................... 3 Summary of the thesis .................................................................................................................... 6 Zusammenfassung der Dissertation............................................................................................... 7 Acknowledgements ....................................................................................................................... 10 General introduction..................................................................................................................... 12 A brief historic perspective on the concepts of gene birth .................................................... 12 Gene duplication is the main source of new genes .............................................................. 12 Orphan genes and the genomics era .................................................................................... 14 Phylostratigraphy and the continuous emergence of new genes ......................................... 16 Not all genes come from other genes ................................................................................... 17 Considering gene birth from molecular and evolutionary perspectives ................................... 19 Overprinting: true innovation from existing genes .................................................................... 20 The life cycle of genes .............................................................................................................. 22 Overview................................................................................................................................... 24 Chapter 1: Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution ............................................................................................................................... 26 Introduction............................................................................................................................... 26 Results...................................................................................................................................... 27 Phylostratigraphy of mouse genes ........................................................................................ 27 Genomic features across ages.............................................................................................. 29 Chromosomal distribution ...................................................................................................... 33 Association with transcriptionally active sites ....................................................................... 33 Testis expressed genes......................................................................................................... 35 Alternative reading frames..................................................................................................... 36 Discussion ................................................................................................................................ 39 De novo evolution versus duplication-divergence ................................................................ 40 Regulatory evolution .............................................................................................................. 40 Overprinting ........................................................................................................................... 41 Conclusion................................................................................................................................ 42 Methods .................................................................................................................................... 43 Phylostratigraphy ................................................................................................................... 43 Gene structure analyses........................................................................................................ 43 Transcription associated regions........................................................................................... 44 Expression data for testis ...................................................................................................... 44 Secondary reading frames .................................................................................................... 44 Acknowledgements ................................................................................................................... 45 Chapter 2: Sequencing of genomes and transcriptomes of closely related mouse species....... 46 Introduction............................................................................................................................... 46 Using wild mice to understand gene birth at the transcriptome level ................................... 46 Phylogeographic distribution of the samples ........................................................................ 47 Methods .................................................................................................................................... 49 Biological material.................................................................................................................. 49 Transcriptome sequencing .................................................................................................... 49 Genome sequencing.............................................................................................................. 49 Raw data processing ............................................................................................................. 50 Transcriptome read mapping, annotation and quantification................................................ 50 Genome read mapping .......................................................................................................... 51 Available resources ................................................................................................................... 51 Chapter 3: Differential selective constrains across phylogenetic ages and their impact on the turnover of protein-coding genes. ................................................................................................. 53 Introduction............................................................................................................................... 53 Methods .................................................................................................................................... 53 Transcriptome assembly ....................................................................................................... 53 Generation of ortholog pairs and rate analyses .................................................................... 54 Overlapping genes................................................................................................................. 54 Reading frame polymorphism detection and annotation ...................................................... 55 Statistical analyses ................................................................................................................ 55 Results...................................................................................................................................... 55 Rate differences between genes of different ages ............................................................... 55 Overlapping genes are an unlikely source of bias ................................................................ 57 Impact of reading frame polymorphisms across phylogenetic time...................................... 59 Discussion ................................................................................................................................ 64 Acknowledgements ................................................................................................................... 66 Chapter 4: A transcriptomics approach to the gain and loss of de novo genes in mouse lineages...................................................................................................................................................... 67 Introduction............................................................................................................................... 67 How is a gene made? ............................................................................................................ 67 The early phase of new gene emergence............................................................................. 69 Pervasive transcription and junk-DNA as raw material for new genes ................................ 70 Methods .................................................................................................................................... 71 Transcriptome presence/absence matrix and mapping of gains and losses ....................... 71 Results...................................................................................................................................... 73 How much of the mouse genome has evidence of transcription? ........................................ 73 Genome-wide transcription: gain and loss dynamics ........................................................... 74 Phylogenetic patterns in genome-wide transcription ............................................................ 75 How much of the genome is transcribed in a lineage specific way? .................................... 77 Identification of cases of de novo transcripts ........................................................................ 81 Quantification of gain rates for curated genes ...................................................................... 84 What are the dynamics of transcription loss in known genes?............................................. 86 Where are new genes expressed?........................................................................................ 88 Discussion ................................................................................................................................ 89 Pervasive transcription can provide material for new genes ................................................ 89 Asymmetry in gains and losses of transcription.................................................................... 92 From transcribed protogenes to de novo genes ................................................................... 93 Differences in expression levels ............................................................................................ 95 Testis as a niche for new genes ............................................................................................ 95 Conclusion................................................................................................................................ 96 Concluding remarks ...................................................................................................................... 97 Perspectives................................................................................................................................. 98 References ................................................................................................................................... 99 Chapter contributions .................................................................................................................. 114 Appendices ................................................................................................................................ 115 Appendix A. Phylostratigraphic maps ..................................................................................... 115 Appendix B. Curation data from orphan genes ...................................................................... 115 Appendix C. Functional annotation clusters based on known genes with loss of expression ................................................................................................................................................ 117 Appendix D. Transcriptome information and statistics ........................................................... 118 Curriculum Vitae.......................................................................................................................... 119 Affidavit....................................................................................................................................... 12
    corecore