717 research outputs found

    High-coverage genomes to elucidate the evolution of penguins

    Get PDF
    Background: Penguins (Sphenisciformes) are a remarkable order of flightless wing-propelled diving seabirds distributed widely across the southern hemisphere. They share a volant common ancestor with Procellariiformes close to the Cretaceous-Paleogene boundary (66 million years ago) and subsequently lost the ability to fly but enhanced their diving capabilities. With ∼20 species among 6 genera, penguins range from the tropical Galápagos Islands to the oceanic temperate forests of New Zealand, the rocky coastlines of the sub-Antarctic islands, and the sea ice around Antarctica. To inhabit such diverse and extreme environments, penguins evolved many physiological and morphological adaptations. However, they are also highly sensitive to climate change. Therefore, penguins provide an exciting target system for understanding the evolutionary processes of speciation, adaptation, and demography. Genomic data are an emerging resource for addressing questions about such processes. Results: Here we present a novel dataset of 19 high-coverage genomes that, together with 2 previously published genomes, encompass all extant penguin species. We also present a well-supported phylogeny to clarify the relationships among penguins. In contrast to recent studies, our results demonstrate that the genus Aptenodytes is basal and sister to all other extant penguin genera, providing intriguing new insights into the adaptation of penguins to Antarctica. As such, our dataset provides a novel resource for understanding the evolutionary history of penguins as a clade, as well as the fine-scale relationships of individual penguin lineages. Against this background, we introduce a major consortium of international scientists dedicated to studying these genomes. Moreover, we highlight emerging issues regarding ensuring legal and respectful indigenous consultation, particularly for genomic data originating from New Zealand Taonga species. Conclusions: We believe that our dataset and project will be important for understanding evolution, increasing cultural heritage and guiding the conservation of this iconic southern hemisphere species assemblage.Fil: Pan, Hailin. Bgi-shenzhen; ChinaFil: Cole, Theresa L. University Of Otago; CanadáFil: Bi, Xupeng. Bgi-shenzhen; ChinaFil: Fang, Miaoquan. Bgi-shenzhen; ChinaFil: Zhou, Chengran. Bgi-shenzhen; ChinaFil: Yang, Zhengtao. Bgi-shenzhen; ChinaFil: Ksepka, Daniel T. Bruce Museum; Estados UnidosFil: Hart, Tom. University of Oxford; Reino UnidoFil: Bouzat, Juan L.. Bowling Green State University; Estados UnidosFil: Boersma, P. Dee. University of Washington; Estados UnidosFil: Bost, Charles-André. Centre Detudes Biologiques de Chizé; FranciaFil: Cherel, Yves. Centre Detudes Biologiques de Chizé; FranciaFil: Dann, Peter. Phillip Island Nature Parks; AustraliaFil: Mattern, Thomas. University of Otago; Nueva ZelandaFil: Ellenberg, Ursula. Global Penguin Society; Estados Unidos. La Trobe University; AustraliaFil: Garcia Borboroglu, Jorge Pablo. University of Washington; Estados Unidos. Global Penguin Society; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico. Centro para el Estudio de Sistemas Marinos; ArgentinaFil: Argilla, Lisa S.. Otago Polytechnic; Nueva ZelandaFil: Bertelsen, Mads F.. Copenhagen Zoo; Dinamarca. University of Copenhagen; DinamarcaFil: Fiddaman, Steven R.. University of Oxford; Reino UnidoFil: Howard, Pauline. Hornby Veterinary Centre; Nueva Zelanda. South Island Wildlife Hospital; Nueva ZelandaFil: Labuschagne, Kim. National Zoological Garden; SudáfricaFil: Miller, Gary. University of Western Australia; Australia. University of Tasmania; AustraliaFil: Parker, Patricia. University of Missouri St. Louis; Estados UnidosFil: Phillips, Richard A.. Natural Environment Research Council; Reino UnidoFil: Quillfeldt, Petra. Justus-Liebig-Universit ̈ at Giessen; AlemaniaFil: Ryan, Peter G.. University of Cape Town; SudáfricaFil: Taylor, Helen. Vet Services Hawkes Bay Ltd; Nueva Zelanda. Wairoa Farm Vets; Nueva ZelandaFil: Zhang, De-Xing. Chinese Academy of Sciences; República de ChinaFil: Zhang, Guojie. BGI-Shenzhen; China. Chinese Academy of Sciences; República de China. University of Copenhagen; DinamarcaFil: McKinlay, Bruce. Department of Conservation; Nueva Zeland

    Outlier Identification in Spatio-Temporal Processes

    Full text link
    This dissertation answers some of the statistical challenges arising in spatio-temporal data from Internet traffic, electricity grids and climate models. It begins with methodological contributions to the problem of anomaly detection in communication networks. Using electricity consumption patterns for University of Michigan campus, the well known spatial prediction method kriging has been adapted for identification of false data injections into the system. Events like Distributed Denial of Service (DDoS), Botnet/Malware attacks, Port Scanning etc. call for methods which can identify unusual activity in Internet traffic patterns. Storing information on the entire network though feasible cannot be done at the time scale at which data arrives. In this work, hashing techniques which can produce summary statistics for the network have been used. The hashed data so obtained indeed preserves the heavy tailed nature of traffic payloads, thereby providing a platform for the application of extreme value theory (EVT) to identify heavy hitters in volumetric attacks. These methods based on EVT require the estimation of the tail index of a heavy tailed distribution. The traditional estimators (Hill et al. (1975)) for the tail index tend to be biased in the presence of outliers. To circumvent this issue, a trimmed version of the classic Hill estimator has been proposed and studied from a theoretical perspective. For the Pareto domain of attraction, the optimality and asymptotic normality of the estimator has been established. Additionally, a data driven strategy to detect the number of extreme outliers in heavy tailed data has also been presented. The dissertation concludes with the statistical formulation of m-year return levels of extreme climatic events (heat/cold waves). The Generalized Pareto distribution (GPD) serves as good fit for modeling peaks over threshold of a distribution. Allowing the parameters of the GPD to vary as a function of covariates such as time of the year, El-Nino and location in the US, extremes of the areal impact of heat waves have been well modeled and inferred.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145789/1/shrijita_1.pd

    Sustaining Low Inflation in Ukraine

    Get PDF
    This publication presents the collection of papers written in 2004 within the project that aimed to broaden the knowledge about sources of inflation in Ukraine and indicate policies that can support low inflation in the future. While working on analyses of monetary policies and inflation, the authors used the experience of other transitional countries, Polish in particular. The project team1 hopes that the research gathered in this volume will contribute to the understanding of the sources of inflation in Ukraine and to the influence of monetary policy instruments on other variables. And that the results presented here can be of practical use for the National Bank of Ukraine.Ukraine, transition, monetary policy, inflation, core inflation, financial market, monetary transmission

    Does intraspecies variation in Aspergillus fumigatus affect infection outcomes? : a phenotype/genotype study using an insect model

    Get PDF
    Aspergillus fumigatus is a saprophytic soil-fungus and an opportunistic human pathogen. This haploid mould reproduces asexually using spores that can readily become airborne. In immunocompromised individuals, inhalation of A. fumigatus spores can lead to a pulmonary infection termed ‘invasive aspergillosis’ (IA). Despite extensive research on human immunity and treatment, the relative contribution of fungal genetic and phenotypic variation to the outcomes of infection is yet to be elucidated. In the present study, I sought to determine the pathogenic relevance of the intraspecies variation of A. fumigatus. Clinical isolates were characterised using phenotypic assays (UV resistance, amphotericin-B resistance, radial growth rate) and whole genome sequenced to determine genetic relatedness. These data were integrated with virulence data generated in an insect infection model, Tenebrio molitor larvae, to determine the relevance of fungal variation to clinical outcomes, identify potential virulence factors, and further our understanding of A. fumigatus pathogenesis in invasive aspergillosis. I observed a high level of intraspecies heterogeneity in all pathogenesis-associated phenotypic properties. The spectrum of core-genome single nucleotide polymorphisms (SNPs) present and virulence in T. molitor larvae also varied between isolates. Patterns of intraspecies variation aligned with clinical origin for two properties: growth rate on nutrient rich media and virulence in T. molitor. The correlation between clinical origin and both growth rate and virulence suggests a contribution of fungal biology towards clinical outcomes. The low level of virulence displayed by IA isolates relative to colonisers suggests the biology of IA isolates may be optimised for overcoming clinical challenges not modelled in T. molitor larvae. Finally, the absence of strong clustering of isolates based on their clinical origin suggests more focused or non-SNP based assays of variation may be necessary to reveal any genomic markers of a strains ability to cause invasive disease

    High-coverage genomes to elucidate the evolution of penguins

    Get PDF
    Penguins (Sphenisciformes) are a remarkable order of flightless wing-propelled diving seabirds distributed widely across the southern hemisphere. They share a volant common ancestor with Procellariiformes close to the Cretaceous-Paleogene boundary (66 million years ago) and subsequently lost the ability to fly but enhanced their diving capabilities. With ∼20 species among 6 genera, penguins range from the tropical Galápagos Islands to the oceanic temperate forests of New Zealand, the rocky coastlines of the sub-Antarctic islands, and the sea ice around Antarctica. To inhabit such diverse and extreme environments, penguins evolved many physiological and morphological adaptations. However, they are also highly sensitive to climate change. Therefore, penguins provide an exciting target system for understanding the evolutionary processes of speciation, adaptation, and demography. Genomic data are an emerging resource for addressing questions about such processes

    Fish genomes : a powerful tool to uncover new functional elements in vertebrates

    Get PDF
    This thesis spans several years of work dedicated to understanding fish genomes. In the first chapter it describes the genome of the first fish for which the entire genome was sequenced through a large-scale international project, Fugu rubripes. the pufferfish. In particular, it highlights how this fish has a genome that contains as many genes as the human genome, although it is ten times smaller. It also shows that the majority of genes that are found in the human genome can be found in this fish genome as well. In the second chapter we compared fish genomes to the human genome to find regions that have been preserved during evolution and which are therefore likely to have a function, even though they are not genes. We showed that indeed they are functional, and they help to regulate other genes. Knowing all the genes in the genome we could then interrogate how they are expressed, i.e. if they are switched __on__ or __off__ and in particular in chapter 4 we looked at how a specific gene is in charge of gradually switching off genes that are inherited from the mother in a newborn fish embryo. Finally in the last chapter since genome sequencing is now becoming much cheaper and simpler to achieve we set out to map the genome of the common carp and we discuss the best approaches and strategies to obtain a good genome sequence for this species. The common carp is a candidate model system for high-troughput screening.LEI Universiteit LeidenEuopean Commission Framework VI grant TRANSCODE (LSHG-CT-2004-511990 ) , A-STAR Singapore and Temasek Life Sciences Laboratory SingaporeAlgorithm

    Molecular insights to crustacean phylogeny

    Get PDF
    This thesis aims to resolve internal relationships of the major crustacean groups inferring phylogenies with molecular data. New molecular and neuroanatomical data support the scenario that the Hexapoda might have evolved from Crustacea. Most molecular studies of crustaceans relied on single gene or multigene analyses in which for most cases partly sequenced rRNA genes were used. However, intensive data quality and alignment assessments prior to phylogenetic reconstructions are not conducted in most studies. One methodological aim in this thesis was to implement new tools to infer data quality, to improve alignment quality and to test the impact of complex modeling of the data. Two of the three phylogenetic analyses in this thesis are also based on rRNA genes. In analysis (A) 16S rRNA, 18S rRNA and COI sequences were analyzed. RY coding of the COI fragment, an alignment procedure that considers the secondary structure of RNA molecules and the exclusion of alignment positions of ambiguous positional homology was performed to improve data quality. Anyhow, by extensive network reconstructions it was shown that the signal quality in the chosen and commonly used markers is not suitable to infer crustacean phylogeny, despite the extensive data processing and optimization. This result draws a new light on previous studies relying on these markers. In analyses (B) completely sequenced 18S and 28S rRNA genes were used to reconstruct the phylogeny. Base compositional heterogeneity was taken into account based on the finding of analysis (A), additionally to secondary structure alignment optimization and alignment assessment. The complex modeling to compare time-heterogeneous versus time-homogenous processes in combination with mixed models for an implementation of secondary structures was only possible applying the Bayesian software package PHASE. The results clearly demonstrated that complex modeling counts and that ignoring time-heterogeneous processes can mislead phylogenetic reconstructions. Some results enlight the phylogeny of Crustaceans, for the first time the Cephalocarida (Hutchinsoniella macracantha) were placed in a clade with the Branchiopoda, which morphologically is plausible. Compared to the time-homogeneous tree the time-heterogeneous tree gives lower support values for some nodes. It can be suggested, that the incorporation of base compositional heterogeneity in phylogenetic analysis improves the reliability of the topology. The Pancrustacea are supported maximally in both approaches, but internal relations are not reliably reconstructed. One result of this analysis is that the phylogenetic signal in rRNA data might be eroded for crustaceans. Recent publications presented analyses based on phylogenomic data, to reconstruct mainly metazoan phylogeny. The supermatrix method seems to outperform the supertree approach. In this analysis the supermatrix approach was applied. Crustaceans were collected to conduct EST sequencing projects and to include the resulting sequences combined with public sequence data into a phylogenomic analysis (C). New and innovative reduction heuristics were performed to condense the dataset. The results showed that the matrix implementation of the reduced dataset ends in a more reliable topology in which most node values are highly supported. In analysis (C) the Branchiopoda were positioned as sister-group to Hexapoda, a differing result to analysis (A) and (B), but that is in line with other phylogenomic studies

    Advancing the analysis of bisulfite sequencing data in its application to ecological plant epigenetics

    Get PDF
    The aim of this thesis is to bridge the gap between the state-of-the-art bioinformatic tools and resources, currently at the forefront of epigenetic analysis, and their emerging applications to non-model species in the context of plant ecology. New, high-resolution research tools are presented; first in a specific sense, by providing new genomic resources for a selected non-model plant species, and also in a broader sense, by developing new software pipelines to streamline the analysis of bisulfite sequencing data, in a manner which is applicable to a wide range of non-model plant species. The selected species is the annual field pennycress, Thlaspi arvense, which belongs in the same lineage of the Brassicaceae as the closely-related model species, Arabidopsis thaliana, and yet does not benefit from such extensive genomic resources. It is one of three key species in a Europe-wide initiative to understand how epigenetic mechanisms contribute to natural variation, stress responses and long-term adaptation of plants. To this end, this thesis provides a high-quality, chromosome-level assembly for T. arvense, alongside a rich complement of feature annotations of particular relevance to the study of epigenetics. The genome assembly encompasses a hybrid approach, involving both PacBio continuous long reads and circular consensus sequences, alongside Hi-C sequencing, PCR-free Illumina sequencing and genetic maps. The result is a significant improvement in contiguity over the existing draft state from earlier studies. Much of the basis for building an understanding of epigenetic mechanisms in non-model species centres around the study of DNA methylation, and in particular the analysis of bisulfite sequencing data to bring methylation patterns into nucleotide-level resolution. In order to maintain a broad level of comparison between T. arvense and the other selected species under the same initiative, a suite of software pipelines which include mapping, the quantification of methylation values, differential methylation between groups, and epigenome-wide association studies, have also been developed. Furthermore, presented herein is a novel algorithm which can facilitate accurate variant calling from bisulfite sequencing data using conventional approaches, such as FreeBayes or Genome Analysis ToolKit (GATK), which until now was feasible only with specifically-adapted software. This enables researchers to obtain high-quality genetic variants, often essential for contextualising the results of epigenetic experiments, without the need for additional sequencing libraries alongside. Each of these aspects are thoroughly benchmarked, integrated to a robust workflow management system, and adhere to the principles of FAIR (Findability, Accessibility, Interoperability and Reusability). Finally, further consideration is given to the unique difficulties presented by population-scale data, and a number of concepts and ideas are explored in order to improve the feasibility of such analyses. In summary, this thesis introduces new high-resolution tools to facilitate the analysis of epigenetic mechanisms, specifically relating to DNA methylation, in non-model plant data. In addition, thorough benchmarking standards are applied, showcasing the range of technical considerations which are of principal importance when developing new pipelines and tools for the analysis of bisulfite sequencing data. The complete “Epidiverse Toolkit” is available at https://github.com/EpiDiverse and will continue to be updated and improved in the future.:ABSTRACT ACKNOWLEDGEMENTS 1 INTRODUCTION 1.1 ABOUT THIS WORK 1.2 BIOLOGICAL BACKGROUND 1.2.1 Epigenetics in plant ecology 1.2.2 DNA methylation 1.2.3 Maintenance of 5mC patterns in plants 1.2.4 Distribution of 5mC patterns in plants 1.3 TECHNICAL BACKGROUND 1.3.1 DNA sequencing 1.3.2 The case for a high-quality genome assembly 1.3.3 Sequence alignment for NGS 1.3.4 Variant calling approaches 2 BUILDING A SUITABLE REFERENCE GENOME 2.1 INTRODUCTION 2.2 MATERIALS AND METHODS 2.2.1 Seeds for the reference genome development 2.2.2 Sample collection, library preparation, and DNA sequencing 2.2.3 Contig assembly and initial scaffolding 2.2.4 Re-scaffolding 2.2.5 Comparative genomics 2.3 RESULTS 2.3.1 An improved reference genome sequence 2.3.2 Comparative genomics 2.4 DISCUSSION 3 FEATURE ANNOTATION FOR EPIGENOMICS 3.1 INTRODUCTION 3.2 MATERIALS AND METHODS 3.2.1 Tissue preparation for RNA sequencing 3.2.2 RNA extraction and sequencing 3.2.3 Transcriptome assembly 3.2.4 Genome annotation 3.2.5 Transposable element annotations 3.2.6 Small RNA annotations 3.2.7 Expression atlas 3.2.8 DNA methylation 3.3 RESULTS 3.3.1 Transcriptome assembly 3.3.2 Protein-coding genes 3.3.3 Non-coding loci 3.3.4 Transposable elements 3.3.5 Small RNA 3.3.6 Pseudogenes 3.3.7 Gene expression atlas 3.3.8 DNA Methylation 3.4 DISCUSSION 4 BISULFITE SEQUENCING METHODS 4.1 INTRODUCTION 4.2 PRINCIPLES OF BISULFITE SEQUENCING 4.3 EXPERIMENTAL DESIGN 4.4 LIBRARY PREPARATION 4.4.1 Whole Genome Bisulfite Sequencing (WGBS) 4.4.2 Reduced Representation Bisulfite Sequencing (RRBS) 4.4.3 Target capture bisulfite sequencing 4.5 BIOINFORMATIC ANALYSIS OF BISULFITE DATA 4.5.1 Quality Control 4.5.2 Read Alignment 4.5.3 Methylation Calling 4.6 ALTERNATIVE METHODS 5 FROM READ ALIGNMENT TO DNA METHYLATION ANALYSIS 5.1 INTRODUCTION 5.2 MATERIALS AND METHODS 5.2.1 Reference species 5.2.2 Natural accessions 5.2.3 Read simulation 5.2.4 Read alignment 5.2.5 Mapping rates 5.2.6 Precision-recall 5.2.7 Coverage deviation 5.2.8 DNA methylation analysis 5.3 RESULTS 5.4 DISCUSSION 5.5 A PIPELINE FOR WGBS ANALYSIS 6 THERE AND BACK AGAIN: INFERRING GENOMIC INFORMATION 6.1 INTRODUCTION 6.1.1 Implementing a new approach 6.2 MATERIALS AND METHODS 6.2.1 Validation datasets 6.2.2 Read processing and alignment 6.2.3 Variant calling 6.2.4 Benchmarking 6.3 RESULTS 6.4 DISCUSSION 6.5 A PIPELINE FOR SNP VARIANT ANALYSIS 7 POPULATION-LEVEL EPIGENOMICS 7.1 INTRODUCTION 7.2 CHALLENGES IN POPULATION-LEVEL EPIGENOMICS 7.3 DIFFERENTIAL METHYLATION 7.3.1 A pipeline for case/control DMRs 7.3.2 A pipeline for population-level DMRs 7.4 EPIGENOME-WIDE ASSOCIATION STUDIES (EWAS) 7.4.1 A pipeline for EWAS analysis 7.5 GENOTYPING-BY-SEQUENCING (EPIGBS) 7.5.1 Extending the epiGBS pipeline 7.6 POPULATION-LEVEL HAPLOTYPES 7.6.1 Extending the EpiDiverse/SNP pipeline 8 CONCLUSION APPENDICES A. SUPPLEMENT: BUILDING A SUITABLE REFERENCE GENOME B. SUPPLEMENT: FEATURE ANNOTATION FOR EPIGENOMICS C. SUPPLEMENT: FROM READ ALIGNMENT TO DNA METHYLATION ANALYSIS D. SUPPLEMENT: INFERRING GENOMIC INFORMATION BIBLIOGRAPH
    corecore