17 research outputs found

    Inference of Biogeographical Ancestry Under Resource Constraints

    Get PDF
    We study the problem of predicting human biogeographical ancestry using genomic data. While continental level ancestry prediction is relatively simple using genomic information, distinguishing between individuals from closely associated sub-populations (e.g., from the same continent) is still a difficult challenge. In particular, we focus on the case where the analysis is constrained to using single nucleotide polymorphisms (SNPs) from just one chromosome. We thus propose methods to construct ancestry informative SNP panels analyzing variants from a single chromosome, and evaluate the performance of such panels for both continental-level and sub-continental level ancestry prediction.;Efficient selection of ancestry informative SNPs is the key to successful ancestry prediction. The removal of redundant and noisy SNP features is essential prior to applying a learning algorithm. Here we propose two distinct methods of SNP selection: one is correlation-based SNP selection which uses a correlation metric to evaluate the usefulness of SNP features, while the other is random subspace projection based SNP selection which uses the learning algorithm itself to evaluate the worth of the SNP features. Correlation-based SNP selection approach can construct a small panel of useful SNPs for both continental level classification as well as binary classification of sub-populations. Unlike the correlation-based selection, random subspace projection based selection can construct efficient panel of SNP markers to address the difficult task of multinomial classification with multiple closely related sub-populations. We include results that demonstrate the performance of both methods, including comparison with other recently published related methods

    Genomic architecture of selection for adaptation to challenging environments in aquaculture

    Get PDF
    Aquaculture, including freshwater and marine farming, has been important for global fish production during the past few decades. However, climate change presents a major risk threatening both quality and quantity of aquaculture production. The environmental stressors in aquaculture resulting from climate change, are temperature rise, salinity changes, sea level rise, acidification and changes of other chemical properties and changes of oxygen levels. Although a reasonable genetic gain can be achieved by selective breeding, this genetic response may not be enough to adapt fish species to the effects of climate change. Marker assisted selection focusing on specific genes or alleles that allow fish to cope with these changes would allow more rapid adaptation of fish to these new environments. In this thesis, I focused on three essential environmental stressors - dissolved oxygen, salinity and temperature as primarily determined in aquaculture production. The main objective is to provide insight in the genomic architecture underlying the mechanism of adaptation to challenging environments of aquaculture species under farming conditions. First, I determined candidate QTL associated with phenotypic variation during adaptation to hypoxia or normoxia. I identified overrepresented pathways that could explain the genetic regulation of hypoxia on growth. To identify fish with better hypoxia tolerance and growth under a hypoxic environment, I quantified the genetic correlations between an indicator trait for hypoxia tolerance (critical swimming performance) and growth. Moreover, the genomic architecture associated with swimming performance was demonstrated, while the effect of significant QTLs on growth was estimated. Beyond applying genome-wide association studies, I used selection signatures to identify QTLs and genes contributing to salinity tolerance. In addition, I also compared the genome of the saline-tolerant and highly productive tilapia “Sukamandi”, that was developed by the aquaculture research institute in Indonesia, to that of blue tilapia and Nile tilapia, to identify the QTLs contributing to salinity tolerance. Finally, I investigated QTLs associated with growth-related traits and organ weights at two distinct commercial Mediterranean product sites differing in temperature (farms in Spain and Greece). Overall, this thesis considerably adds to insight into how fish adapt to challenging environments, which will aid marker-assisted selection for improved resilience of aquaculture species under climate change

    Designing synthetic spike-in controls for next-generation sequencing and beyond

    Full text link
    Next-generation sequencing (NGS) is a revolutionary tool that can be used for a myriad of applications, ranging from clinical genome sequencing, to gene expression profiling with RNA sequencing (RNA-seq), to the detection of microbes within environmental samples or isolates. However, significant analytical challenges remain with NGS data due to the complexity of genome architecture, as well as a range of biases introduced during library preparation, sequencing and analysis. These biases and challenges can be understood and mitigated through the use of spike-in controls – DNA or RNA oligonucleotides with known sequence and length that are added to samples prior to library preparation. While spike-in controls have previously been developed for transcriptomics, they were designed for technologies that predated the advent of NGS and consequently suffer from several limitations. In this thesis, I present a novel design framework for synthetic spike-in standards (‘sequins’) that can be applied to a range of NGS applications, and demonstrate how sequins can be used as internal controls to assist in the analysis of accompanying samples. In Chapter 1, I develop a set of spliced synthetic RNA standards that are encoded by artificial gene loci on an accompanying in silico chromosome. RNA sequins enable the assessment of important but previously intractable RNA-seq properties including split-read alignment, alternative splicing, isoform-level quantification and fusion gene detection. In Chapter 2, I present the design of a set of DNA sequins comprising a synthetic community of artificial microbial genomes, which can be used in metagenome sequencing and analysis. Importantly, DNA sequins facilitate the accurate resolution of microbial abundance shifts between samples, which are otherwise imperceptible with NGS. Finally, in Chapter 3, I show how RNA sequins can be used in the analysis of complex brain transcriptomes generated using targeted RNA-seq. This includes an assessment of capture efficiency, quantitative accuracy, and the setting of empirical thresholds to distinguish signal from noise. These transcriptomes are presented as an atlas that can be used to link gene expression with neurological phenotypes. The technologies, associated datasets and analytical methods developed herein provide a qualitative and quantitative reference with which to navigate the complexity of genome biology

    Advances in Forensic Genetics

    Get PDF
    The book has 25 articles about the status and new directions in forensic genetics. Approximately half of the articles are invited reviews, and the remaining articles deal with new forensic genetic methods. The articles cover aspects such as sampling DNA evidence at the scene of a crime; DNA transfer when handling evidence material and how to avoid DNA contamination of items, laboratory, etc.; identification of body fluids and tissues with RNA; forensic microbiome analysis with molecular biology methods as a supplement to the examination of human DNA; forensic DNA phenotyping for predicting visible traits such as eye, hair, and skin colour; new ancestry informative DNA markers for estimating ethnic origin; new genetic genealogy methods for identifying distant relatives that cannot be identified with conventional forensic DNA typing; sensitive DNA methods, including single-cell DNA analysis and other highly specialised and sensitive methods to examine ancient DNA from unidentified victims of war; forensic animal genetics; genetics of visible traits in dogs; statistical tools for interpreting forensic DNA analyses, including the most used IT tools for forensic STR-typing and DNA sequencing; haploid markers (Y-chromosome and mitochondria DNA); inference of ethnic origin; a comprehensive logical framework for the interpretation of forensic genetic DNA data; and an overview of the ethical aspects of modern forensic genetics

    Evolutionary genomics : statistical and computational methods

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

    Statistical Population Genomics

    Get PDF
    This open access volume presents state-of-the-art inference methods in population genomics, focusing on data analysis based on rigorous statistical techniques. After introducing general concepts related to the biology of genomes and their evolution, the book covers state-of-the-art methods for the analysis of genomes in populations, including demography inference, population structure analysis and detection of selection, using both model-based inference and simulation procedures. Last but not least, it offers an overview of the current knowledge acquired by applying such methods to a large variety of eukaryotic organisms. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, pointers to the relevant literature, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls. Authoritative and cutting-edge, Statistical Population Genomics aims to promote and ensure successful applications of population genomic methods to an increasing number of model systems and biological questions

    Evolutionary Genomics

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

    PSA 2016

    Get PDF
    These preprints were automatically compiled into a PDF from the collection of papers deposited in PhilSci-Archive in conjunction with the PSA 2016

    Going viral : an integrated view on virological data analysis from basic research to clinical applications

    Get PDF
    Viruses are of considerable interest for several fields of life science research. The genomic richness of these entities, their environmen- tal abundance, as well as their high adaptability and, potentially, pathogenicity make treatment of viral diseases challenging. This thesis proposes three novel contributions to antiviral research that each concern analysis procedures of high-throughput experimen- tal genomics data. First, a sensitive approach for detecting viral genomes and transcripts in sequencing data of human cancers is presented that improves upon prior approaches by allowing de- tection of viral nucleotide sequences that consist of human-viral homologs or are diverged from known reference sequences. Sec- ond, a computational method for inferring physical protein contacts from experimental protein complex purification assays is put for- ward that allows statistically meaningful integration of multiple data sets and is able to infer protein contacts of transiently binding protein classes such as kinases and molecular chaperones. Third, an investigation of minute changes in viral genomic populations upon treatment of patients with the mutagen ribavirin is presented that first characterizes the mutagenic effect of this drug on the hepatitis C virus based on deep sequencing data.Viren sind von betrĂ€chtlichem Interesse fĂŒr die biowissenschaftliche Forschung. Der genetische Reichtum, die hohe Vielfalt, wie auch die AnpassungsfĂ€higkeit und mögliche PathogenitĂ€t dieser Organismen erschwert die Behandlung von viralen Erkrankungen. Diese Promotionsschrift enthĂ€lt drei neuartige BeitrĂ€ge zur antiviralen Forschung welche die Analyse von experimentellen Hochdurchsatzdaten der Genomik betreffen: erstens, ein sensitiver Ansatz zur Entdeckung viraler Genome und Transkripte in Sequenzdaten humaner Karzinome, der die Identifikation von viralen Nukleotidsequenzen ermöglicht, die von Referenzgenomen ab- weichen oder homolog zu humanen Faktoren sind. Zweitens, eine computergestĂŒtzte Methode um physische Proteinkontakte von experimentellen Proteinkomplex-Purifikationsdaten abzuleiten welche die statistische Integration von mehreren DatensĂ€tzen erlaubt um insbesondere Proteinkontakte von flĂŒchtig interagierenden Proteinklassen wie etwa Kinasen und Chaperonen aus den Daten ableiten zu können. Drittens, eine Untersuchung von kleinsten Änderungen viraler Genompopulationen wĂ€hrend der Behandlung von Patienten mit dem Mutagen ribavirin die zum ersten Mal die mutagene Wirkung dieses Medikaments auf das Hepatitis C Virus mittels Tiefensequenzdaten nachweist

    PSA 2016

    Get PDF
    These preprints were automatically compiled into a PDF from the collection of papers deposited in PhilSci-Archive in conjunction with the PSA 2016
    corecore