401 research outputs found
Preferentially Quantized Linker DNA Lengths in Saccharomyces cerevisiae
The exact lengths of linker DNAs connecting adjacent nucleosomes specify the intrinsic three-dimensional structures of eukaryotic chromatin fibers. Some studies suggest that linker DNA lengths preferentially occur at certain quantized values, differing one from another by integral multiples of the DNA helical repeat, âŒ10 bp; however, studies in the literature are inconsistent. Here, we investigate linker DNA length distributions in the yeast Saccharomyces cerevisiae genome, using two novel methods: a Fourier analysis of genomic dinucleotide periodicities adjacent to experimentally mapped nucleosomes and a duration hidden Markov model applied to experimentally defined dinucleosomes. Both methods reveal that linker DNA lengths in yeast are preferentially periodic at the DNA helical repeat (âŒ10 bp), obeying the forms 10n+5 bp (integer n). This 10 bp periodicity implies an ordered superhelical intrinsic structure for the average chromatin fiber in yeast
Unsupervised and semi-supervised training methods for eukaryotic gene prediction
This thesis describes new gene finding methods for eukaryotic gene prediction. The current methods for deriving model parameters for gene prediction algorithms are based on curated or experimentally validated set of genes or gene elements. These training sets often require time and additional expert efforts especially for the species that are in the initial stages of genome sequencing. Unsupervised training allows determination of model parameters from anonymous genomic sequence with. The importance and the practical applicability of the unsupervised training is critical for ever growing rate of eukaryotic genome sequencing.
Three distinct training procedures are developed for diverse group of eukaryotic species. GeneMark-ES is developed for species with strong donor and acceptor site signals such as Arabidopsis thaliana, Caenorhabditis elegans and Drosophila melanogaster. The second version of the algorithm, GeneMark-ES-2, introduces enhanced intron model to better describe the gene structure of fungal species with posses with relatively weak donor and acceptor splice sites and well conserved branch point signal. GeneMark-LE, semi-supervised training approach is designed for eukaryotic species with small number of introns.
The results indicate that the developed unsupervised training methods perform well as compared to other training methods and as estimated from the set of genes supported by EST-to-genome alignments.
Analysis of novel genomes reveals interesting biological findings and show that several candidates of under-annotated and over-annotated fungal species are present in the current set of annotated of fungal genomes.Ph.D.Committee Chair: Mark Borodovky; Committee Member: Jung H. Choi; Committee Member: King Jordan; Committee Member: Leonid Bunimovich; Committee Member: Yury Chernof
Recommended from our members
Investigating the spatial regulation of meiotic recombination in S. cerevisiae
In order for a species to engage in and reap the evolutionary benefits of sexual reproduction, a subset of cells in each individual must undergo a complex ordeal known as meiosisâa specialised cell division. By halving the genome content and âshuffling the deckâ, meiosis generates genetically diverse haploid gametes (eggs, sperm) or spores from diploid cells. Such a monumental task is by no means easy or risk free: during the meiotic programme, cells intentionally damage their own genomes through widespread induction of DNA double-strand breaks (DSBs) in order to initiate homologous recombinationâa DNA-repair processâand subsequent crossover (CO) formation. The success of meiosis is, however, not left up to chance. Rather, a complicated web of regulation acts at multiple stages to ensure this dangerous tradeoff pays dividends. Notably, the spatial pattern of meiotic recombination across the genome is complex and non-random. Whilst ultimately stochastic in nature, recombination events within any given meiotic cell display relatively even distributions along each chromosomeâa phenomenon mediate by processes of âinterferenceâ acting at two key stages in meiosis: DSB and CO formation. Despite wide ranging historical observation, relatively little is known about how either form of interference is accomplished. Genome-wide mapping of recombination within S. cerevisiae has, however, provided a unique opportunity to investigate the underlying mechanisms. By computationally and mathematically analysing genome-wide data, work presented throughout this thesis seeks to: (i) investigate CO distribution and CO interference within various DNA damage response and DNA repair mutants (Tel1ATM, Mec1ATR, Rad24, Msh2) (Chapter 2) (ii) develop novel approaches to DSB mapping (Chapter 3) (iii) characterise the hyperlocal regulation of DSB formation (Chapter 3) and (iv) examine the mechanics of DSB interference (Chapter 4). Moreover, widely applicable simulation platforms for investigating DSB and CO formation have been developed (Chapter 2, 4). Collectively, this thesis further elucidates the mechanisms that underpin the spatial regulation of meiotic recombination in S. cerevisiae
Nucleosome rotational setting is associated with transcriptional regulation in promoters of tissue-specific human genes
Human genes contain a 10 bp repeat of RR dinucleotides focused around the first nucleosome position suggesting a role in transcriptional control
Recommended from our members
Spatiotemporal control of gene expression in Caenorhabditis elegans
Cell-type specific regulation of transcription drives the production of the myriad of different cells generated during development. Profiling genome-wide gene expression landscapes in different tissues has improved our understanding of the physiological and pathological processes taking place during development. Yet, the mechanisms underlying cell-type specific transcription are not well understood. Promoters and enhancers are the key loci that orchestrate spatiotemporal patterns of gene expression. Their activities can range from ubiquitous to highly cell-type specific, and their composition and arrangement define the regulatory grammar directing gene transcription across development. More comprehensive in vivo studies of these regulatory grammars would improve our understanding of how different patterns of gene expression are obtained across tissues.
Caenorhabditis elegans is an important model organism for studying develop- mental processes. At the beginning of my PhD, I helped characterize the dynamics of gene expression and chromatin activity across development and aging. Follow- ing this, I aimed to identify and characterize the regulatory elements involved in tissue-specific control of transcription in C. elegans. I jointly profiled chromatin accessibility and gene expression landscapes across the five main tissues of the adult nematode. To achieve this, I developed a method to sort fluorescently labelled nuclei from individual C. elegans tissues. Analyzing the datasets I generated, I first showed that around 80% of the regulatory elements in C. elegans are specifically active in subsets of tissues. I then revealed fundamental differences in the genetic structure and regulatory architecture of genes expressed ubiquitously or in individual tissues, and I defined two distinctive regulatory grammars associated with specific sets of genes. I also uncovered striking and unsuspected differences in nucleosome arrangement and sequence features of ubiquitous and germline-specific promoters compared to somatic promoters. Finally, I optimized a single nucleus method to analyze chromatin accessibility and gene expression during embryogenesis and did a pilot study of early embryo development.
My work provides a comprehensive resource of chromatin accessibility and transcription patterns in the different tissues of C. elegans. It sheds light on fundamental differences between the mechanisms of transcription regulation of germline-active genes or somatic tissue-specific genes. The outcome of this work will greatly enable and push forward C. elegans transcription regulation research. The first datasets jointly profiling chromatin accessibility and nuclear transcription across the majority of tissues in a multicellular organism will also be of benefit for the broader community studying gene regulation in eukaryotes
Genome annotation and evolution of chemosensory receptors in spider mites
Understanding the evolution of species and speciation, the mechanism producing the diversity of life on Earth, has always fascinated scientists. In recent years, advances in next generation sequencing techniques, together with the development of data analyzing software tools, allow us to sequence and analyze genomes of many species and reconstruct their evolutionary history. We can detect the evolutionary changes of a group of species or of different populations of a single species. In this thesis, we perform studies on three spider mite genomes, Tetranychus urticae, Tetranychus evansi and Tetranychus lintearius. The spider mites belong to the Chelicerata, the second largest group of arthropods after insects. While many insect genomes were sequenced and analyzed already, Tetranychus urticae represents the first complete chelicerate genome.
This thesis has been organized into five chapters. The introductory Chapter 1 provides an overview of the explosion of genome sequences in times of the fast development of next generation sequencing techniques, describes genome annotation information, methods and pipelines to give biological meaning to these genomes, and explains the importance of genome based research for the evolution of arthropod-plant interactions. In addition, a short overview of the chemosensory receptors is provided since in the thesis we have particularly studied the annotation and evolution of this gene family in three different spider mites. Chapter 2 provides the results of annotation and analysis of the Tetranychus urticae genome (London strain). T. urticae represents one of the most polyphagous arthropod herbivores, feeding on more than 1,100 plant species including species known to produce toxic compounds. We have annotated the T. urticae genome with support of RNA-seq data and made it publicly available to the research community. The T. urticae genome sequence reveals herbivorous pest adaptations with strong signatures of polyphagy and detoxification in gene families associated with feeding on different hosts and in new gene families acquired by lateral gene transfer. Moreover, how this pest responds to a changing host environment is shown by deep transcriptome analysis of T. urticae feeding on different plants. Thus, the T. urticae genome sequence opens up new avenues for understanding the evolution of arthropods as well as the fundamentals of plantâherbivore interactions.
The next two chapters (Chapter 3 and Chapter 4) present studies on the annotation and evolution of chemosensory receptors (CRs) in three different spider mites. Chemosensory receptors help animals to detect certain chemical components in their environment to find food, to locate shelter, mates and offspring, and to avoid danger. In Chapter 3, starting from Daphnia and insect chemosensory receptors, we describe mining the T. urticae genome for putative chemosensory receptors, including the ones related to insect gustatory receptors (GRs), the ionotropic receptors (IRs) and the epithelial Na+ channels (ENaCs). T. urticae has a huge repertoire of GRs, many more than the total number of GRs and odorant receptors (ORs) found to date in any other arthropod. Similar to Daphnia pulex, we observed the complete lack of ORs in T. urticae. This is consistent with the hypothesis that ORs are an insect-specific class of GR-related chemosensory receptors. Futhermore, we compare chemosensory receptor genes among three strains (London, Montpellier, and EtoxR). We find that GR genes that are intact in some T. urticae populations appeared to be inactived in other populations. Next, in Chapter 4, we describe the annotation of GR genes in T. evansi and T. lintearius, and the evolutionary analysis of this gene family in the three spider mites. We identify many GR gene expansions in the polyphagous T. urticae, a few gene expansions and many gene losses in the oligophagous T. evansi, and no gene expansion but also many gene losses in the monophagous T. lintearius. Finally, general remarks are discussed in the Chapter 5
- âŠ