Search CORE

238 research outputs found

Doctor of Philosophy

Author: Flygare Steven
Publication venue: University of Utah
Publication date: 01/01/2015
Field of study

dissertationAdvances in technology have produced efficient and powerful scientific instruments for measuring biological phenomena. In particular, modern microscopes and nextgeneration sequencing machines produce data at such a rate that manual analysis is no longer practical or feasible for meaningful scientific inquiries. Thus, there is a great need for computational strategies to organize and analyze huge amounts of data produced by biological experiments. My work presents computational strategies and software solutions for application in image analysis, human variant prioritization, and metagenomics. The information content of images can be leveraged to answer an extremely broad spectrum of questions ranging from inquiries about basic biological processes to highly specific, application-driven inquiries like the efficacy of a pharmaceutical drug. Modern microscopes can produce images at a rate at which rigorous manual analysis is impossible. I have created software pipelines that automate image analysis in two specific applications domains. In addition, I discuss general image analysis strategies that can be applied to a wide variety of problems. There are tens of millions of known human genetic variants. Prioritizing human variants based on how likely they are to cause disease is of huge importance because of the potential impact on human health. Current variant prioritization methods are limited by their scope, efficiency, and accuracy. I present a variant prioritization method, the VAAST variant prioritizer, which is superior in its scope, efficiency, and accuracy to existing variant prioritization methods. The rise of next-generation sequencing enables huge quantities of sequence to be generated in a short period of time. No field of study has been affected by rapid sequencing more than metagenomics. Metagenomics, the genomic analysis of a population v of microorganisms, has important implications for pathogen detection because metagenomics enables the culture-free detection of microorganisms. I have created Taxonomer, a comprehensive metagenomics pipeline that enables the real-time analysis of read datasets derived from environmental samples

The University of Utah: J. Willard Marriott Digital Library

Patterns of adaptive and purifying selection in the genomes of phocid seals

Author: Gaughran Stephen John
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/04/2021
Field of study

Modern genomic sequencing technologies provide the opportunity to address long-standing questions in molecular evolution with empirical data. In this dissertation, I combine this new technology with advances in statistical population genetics to describe how deleterious mutations and adaptive evolution have shaped the genomic evolution of phocid seals. In Chapter 1, I model historical demographic processes using whole genome sequences of eight seal taxa: the Hawaiian monk seal, the Mediterranean monk seal, the northern elephant seal, the southern elephant seal, the Weddell seal, the grey seal, the Baltic ringed seal, and the Saimaa ringed seal. Through this, I establish that the endangered monk seal species have long-term small population sizes, as do grey seals. On the other hand, the elephant seals, Weddell seal, and ringed seals had much larger populations in the distant past. Notably, the most recent glaciation (c. 12,000-120,000 years ago) appeared to have a dramatic effect on phocid populations throughout the world. With this knowledge of historical population sizes, I test a fundamental premise of molecular evolution: that the rate of mutation accumulation will be higher in smaller populations due to less efficient purifying selection. I show that there is not a higher substitution rate or overall rate of mutation accumulation in the long-term small populations of monk seals compared to other seal species. On the contrary, overall rates of mutation accumulation appear to be lower in monk seals and grey seals, both of which show smaller long-term population sizes compared to the other species. This suggests either that the distribution of fitness effects may differ across seal species in a way that depends on population size and history. In Chapter 2, I use population genomic data and a newly developed statistical model to detect positive selection in the protein coding genes of phocid seals (monk seals, elephant seals, Weddell seals, grey seals, and ringed seals). In addition, I use a phylogenetic framework to detect parallel evolution across multiple lineages of seals, relating to traits such as polar adaptations, hypoxia tolerance during long dives, and mating behavior. I develop a new bioinformatic tool to process raw BAM files and transform them into useable input for MASS-PRF, a tool to detect selection from polymorphism and divergence data. Through these analyses, I identify thousands of genes that show positive selection across multiple seal lineages. Genes associated with immune function, sperm competition, and blubber composition show positive selection in all lineages, highlighting how complex and important these traits are in seals. In the deep-diving elephant seals, the list of positively selected genes was enriched for genes relating to cardiac muscle development and function, providing important insight into how adaptive protein evolution has helped allow these seals to survive sustained bradycardia during dives that last over an hour. Weddell seals, on the other hand, showed enrichment for genes relating to neuronal development, which may relate to molecular adaptations that allow their neurons to survive hypoxic conditions during long dives. Because MASS-PRF allows for site-specific tests of selection, I am able to show how parallel evolution in the same genes across lineages sometimes may or may not involve positive selection at the same genic site. In Chapter 3, I use the population genomic data from Chapter 2 to model the distribution of fitness effects (DFE) of segregating alleles in each population. Due to sample size issues, only parameters for the Hawaiian monk seal were confidently estimated. Using the site frequency spectrum of synonymous sites, I show that the Hawaiian monk seal has had a long-term effective population size below 5000, in agreement with the results from Chapter 1. In addition, I should that after the arrival of humans in Hawaii, the monk seal experienced a 95% decline in effective population size, in line with the current census size of fewer than 1500 individuals. Conditioning the model on the Hawaiian monk seal demographic parameters, I am able to estimate the shape of DFE in Hawaiian monk seals using the site frequency spectrum of nonsynonymous sites. I estimate a DFE for the Hawaiian monk seal that is nearly identical to the one estimated in humans. This DFE, however, is different than the one estimated for mouse, with the seal and human DFEs having a higher proportion of more strongly deleterious alleles. This pattern cannot be explained by phylogenetic relatedness or differences in phenotypic complexity, but instead is likely related to differences in effective population size. I discuss how the geometric model of evolution predicts such a shift in DFE in response to the epistatic effect of fixed deleterious mutations in smaller populations

Yale University

Analyses of non-coding somatic drivers in 2,658 cancer whole genomes.

Author: Abascal Federico
Akdemir Kadir C.
Alvarez Eva G.
Amin Samirkumar B.
Bader Gary D.
Baez-Ortega Adrian
Bandopadhayay Pratiti
Barenboim Jonathan
Beroukhim Rameen
Bertl Johanna
Boroevich Keith A.
Boutros Paul C.
Bowtell David D. L.
Brors Benedikt
Brunak Soren
Burns Kathleen H.
Busanovich John
Campbell Peter J.
Carlevaro-Fita Joana
Chakravarty Dimple
Chan Calvin Wing Yiu
Chan Kin
Chen Ken
Choi Jung Kyoon
CortesCiriano Isidro
Craft David
Deu-Pons Jordi
Dhingra Priyanka
Diamanti Klev
Dueso-Barroso Ana
Dunford Andrew J.
Edwards Paul A.
Estivill Xavier
Etemadmoghadam Dariush
Feuerbach Lars
Fink J. Lynn
Fonseca Nuno A.
Frenkel-Morgenstern Milana
Frigola Joan
Gambacorti-Passerini Carlo
Garsed Dale W.
Gerstein Mark
Getz Gad
Gonzalez-Perez Abel
Gordenin Dmitry A.
Guo Qianyun
Gut Ivo G.
Haan David
Haber James E.
Hamilton Mark P.
Haradhvala Nicholas J.
Harmanci Arif O.
Helmy Mohamed
Herrmann Carl
Hess Julian M.
Hobolth Asger
Hodzic Ermin
Hong Chen
Hornshoj Henrik
Hutter Barbara
Imielinski Marcin
Isaev Keren
Izarzugaza Jose M. G.
Johnson Rory
Johnson Todd A.
Jones David T. W.
Ju Young Seok
Juul Malene
Juul Randi Istrup
Kahles Andre
Kahraman Abdullah
Kazanov Marat D.
Kellis Manolis
Khurana Ekta
Kim Jaegil
Kim Jong K.
Kim Youngwook
Klimczak Leszek J.
Koh Youngil
Komorowski Jan
Korbel Jan O.
Kumar Kiran
Kumar Sushant
Lanzos Andres
Larsson Erik
Lawrence Michael S.
Lee Donghoon
Lee Eunjung Alice
Lee Jake June-Koo
Lehmann Kjong-Van
Li Shantao
Li Xiaotong
Li Yilong
Lin Ziao
Liu Eric Minwei
Lochovsky Lucas
Lopez-Bigas Nuria
Lou Shaoke
Lynch Andy G.
Macintyre Geoff
Madsen Tobias
Marchal Kathleen
Markowetz Florian
Martincorena Inigo
Martinez-Fundichely Alexander
Maruvka Yosef E.
McGillivray Patrick D.
Meyerson Matthew
Meyerson William
Miyano Satoru
Muinos Ferran
Mularoni Loris
Nakagawa Hidewaki
Navarro Fabio C. P.
Nielsen Morten Muhlig
Ossowski Stephan
Paczkowska Marta
Park Keunchil
Park Kiejung
Park Peter J.
Pearson John, V
Pedersen Jakob Skou
Pich Oriol
Pons Tirso
Puiggros Montserrat
Pulido-Tamayo Sergio
Raphael Benjamin J.
Reimand Juri
Reyes-Salazar Iker
Reyna Matthew A.
Rheinbay Esther
Rippe Karsten
Roberts Nicola D.
Roberts Steven A.
RodriguezMartin Bernardo
Rubin Mark A.
Rubio-Perez Carlota
Sabarinathan Radhakrishnan
Sahinalp S. Cenk
Saksena Gordon
Salichos Leonidas
Sander Chris
Schumacher Steven E.
Scully Ralph
Shackleton Mark
Shapira Ofer
Shen Ciyue
Shrestha Raunak
Shuai Shimin
Sidiropoulos Nikos
Sieverling Lina
Sinnott-Armstrong Nasa
Stein Lincoln D.
Stewart Chip
Stuart Joshua M.
Tamborero David
Tiao Grace
Torrents David
Tsunoda Tatsuhiko
Tubio Jose M. C.
Umer Husen Muhammad
Uuskula-Reimand Liis
Valencia Alfonso
Vazquez Miguel
Verbeke Lieven P. C.
Villasante Izar
von Mering Christian
Waddell Nicola
Wadelius Claes
Wadi Lina
Wala Jeremiah A.
Wang Jiayin
Warrell Jonathan
Waszak Sebastian M.
Weischenfeldt Joachim
Wheeler David A.
Wu Guanming
Yang Lixing
Yao Xiaotong
Yoon Sung-Soo
Yu Jun
Zamora Jorge
Zhang Cheng-Zhong
Zhang Jing
Zhang Xuanping
Zhang Yan
Zhao Zhongming
Zou Lihua
Publication venue: Nature
Publication date: 01/01/2020
Field of study

The discovery of drivers of cancer has traditionally focused on protein-coding genes1-4. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium5 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods. For structural variants, we present two methods of driver discovery, and identify regions that are significantly affected by recurrent breakpoints and recurrent somatic juxtapositions. Our analyses confirm previously reported drivers6,7, raise doubts about others and identify novel candidates, including point mutations in the 5' region of TP53, in the 3' untranslated regions of NFKBIZ and TOB1, focal deletions in BRD4 and rearrangements in the loci of AKR1C genes. We show that although point mutations and structural variants that drive cancer are less frequent in non-coding genes and regulatory sequences than in protein-coding genes, additional examples of these drivers will be found as more cancer genomes become available

Publikationsserver der Universität Tübingen

Digitala Vetenskapliga Arkivet - Academic Archive On-line

UPF Digital Repository

Repository for Publications and Research Data

DSpace@MIT

Lund University Publications

Ghent University Academic Bibliography

Publikationer från Uppsala Universitet

UCL Discovery

Copenhagen University Research Information System

eScholarship - University of California

Apollo (Cambridge)

Bern Open Repository and Information System (BORIS)

University of St. Andrews - Pure

St Andrews Research Repository

Possible A2E Mutagenic Effects on RPE Mitochondrial DNA from Innovative RNA-Seq Bioinformatics Pipeline

Author: Alibrandi Simona
D'Angelo Rosalia
Donato Luigi
Pitruzzella Alessandro
Scalia Federica
Scimone Concetta
Sidoti Antonina
Publication venue: MDPI
Publication date: 20/11/2020
Field of study

Mitochondria are subject to continuous oxidative stress stimuli that, over time, can impair their genome and lead to several pathologies, like retinal degenerations. Our main purpose was the identification of mtDNA variants that might be induced by intense oxidative stress determined by N-retinylidene-N-retinylethanolamine (A2E), together with molecular pathways involving the genes carrying them, possibly linked to retinal degeneration. We performed a variant analysis comparison between transcriptome profiles of human retinal pigment epithelial (RPE) cells exposed to A2E and untreated ones, hypothesizing that it might act as a mutagenic compound towards mtDNA. To optimize analysis, we proposed an integrated approach that foresaw the complementary use of the most recent algorithms applied to mtDNA data, characterized by a mixed output coming from several tools and databases. An increased number of variants emerged following treatment. Variants mainly occurred within mtDNA coding sequences, corresponding with either the polypeptide-encoding genes or the RNA. Time-dependent impairments foresaw the involvement of all oxidative phosphorylation complexes, suggesting a serious damage to adenosine triphosphate (ATP) biosynthesis, that can result in cell death. The obtained results could be incorporated into clinical diagnostic settings, as they are hypothesized to modulate the phenotypic expression of mtDNA pathogenic variants, drastically improving the field of precision molecular medicine

Archivio istituzionale della ricerca - Università di Palermo

Population Genomics of Polistes Wasps

Author: Dogantzis Kathleen Andrea
Publication venue
Publication date: 27/07/2017
Field of study

The molecular mechanisms influencing the evolution of social behaviour in insects are of great interest and have been the focus of many recent studies. Chapter one of this thesis reviews several major hypotheses regarding the evolution of sociality. Chapter two outlines the methodological steps taken to generate a high quality population genomic data set for primitively eusocial paper wasps in the genus Polistes. The third chapter of the thesis uses the dataset generated in chapter two to estimate patterns of natural selection on the Polistes genome, and to evaluate the importance of novel and caste biased genes on the fitness of this primitively eusocial species

YorkSpace

New Insights Into Mitochondrial DNA Reconstruction and Variant Detection in Ancient Samples.

Author: Alessandra Modi
David Caramelli
Luca Sineo
Maria Angela Diroma
Martina Lari
Stefania Vai
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2021
Field of study

Ancient DNA (aDNA) studies are frequently focused on the analysis of the mitochondrial DNA (mtDNA), which is much more abundant than the nuclear genome, hence can be better retrieved from ancient remains. However, postmortem DNA damage and contamination make the data analysis difficult because of DNA fragmentation and nucleotide alterations. In this regard, the assessment of the heteroplasmic fraction in ancient mtDNA has always been considered an unachievable goal due to the complexity in distinguishing true endogenous variants from artifacts. We implemented and applied a computational pipeline for mtDNA analysis to a dataset of 30 ancient human samples from an Iron Age necropolis in Polizzello (Sicily, Italy). The pipeline includes several modules from well-established tools for aDNA analysis and a recently released variant caller, which was specifically conceived for mtDNA, applied for the first time to aDNA data. Through a fine-tuned filtering on variant allele sequencing features, we were able to accurately reconstruct nearly complete (>88%) mtDNA genome for almost all the analyzed samples (27 out of 30), depending on the degree of preservation and the sequencing throughput, and to get a reliable set of variants allowing haplogroup prediction. Additionally, we provide guidelines to deal with possible artifact sources, including nuclear mitochondrial sequence (NumtS) contamination, an often-neglected issue in ancient mtDNA surveys. Potential heteroplasmy levels were also estimated, although most variants were likely homoplasmic, and validated by data simulations, proving that new sequencing technologies and software are sensitive enough to detect partially mutated sites in ancient genomes and discriminate true variants from artifacts. A thorough functional annotation of detected and filtered mtDNA variants was also performed for a comprehensive evaluation of these ancient samples

Florence Research

Archivio istituzionale della ricerca - Università di Palermo

On the Origin of Phenotypic Variation: Novel Technologies to Dissect Molecular Determinants of Phenotype

Author: Vallania Francesco
Publication venue: Washington University Open Scholarship
Publication date: 18/12/2013
Field of study

This thesis describes the conception, design, and development of novel computational tools, theoretical models, and experimental techniques applied to the dissection of molecular factors underlying phenotypic variation. The first part of my work is focused on finding rare genetic variants in pooled DNA samples, leading to the development of a novel set of algorithms, SNPseeker and SPLINTER, applied to next-generation sequencing data. The second part of my work describes the creation of a reporter system for DNA methylation for the purpose of dissecting the genetic contribution of tissue-specific patterns of DNA methylation across the genome. Finally the last part of my work is focused on understanding the basis of stochastic variation in gene expression with a focus on modeling and dissecting the relationship between single-cell protein variance and mean at a genome-wide scale

Washington University St. Louis: Open Scholarship

Integrating Human Population Genetics And Genomics To Elucidate The Etiology Of Brain Disorders

Author: Sulovari Arvis
Publication venue: UVM ScholarWorks
Publication date: 01/01/2017
Field of study

Brain disorders present a significant burden on affected individuals, their families and society at large. Existing diagnostic tests suffer from a lack of genetic biomarkers, particularly for substance use disorders, such as alcohol dependence (AD). Numerous studies have demonstrated that AD has a genetic heritability of 40-60%. The existing genetics literature of AD has primarily focused on linkage analyses in small family cohorts and more recently on genome-wide association analyses (GWAS) in large case-control cohorts, fueled by rapid advances in next generation sequencing (NGS). Numerous AD-associated genomic variations are present at a common frequency in the general population, making these variants of public health significance. However, known AD-associated variants explain only a fraction of the expected heritability. In this dissertation, we demonstrate that systems biology applications that integrate evolutionary genomics, rare variants and structural variation can dissect the genetic architecture of AD and elucidate its heritability. We identified several complex human diseases, including AD and other brain disorders, as potential targets of natural selection forces in diverse world populations. Further evidence of natural selection forces affecting AD was revealed when we identified an association between eye color, a trait under strong selection, and AD. These findings provide strong support for conducting GWAS on brain disorder phenotypes. However, with the ever-increasing abundance of rare genomic variants and large cohorts of multi-ethnic samples, population stratification becomes a serious confounding factor for GWAS. To address this problem, we designed a novel approach to identify ancestry informative single nucleotide polymorphisms (SNPs) for population stratification adjustment in association analyses. Furthermore, to leverage untyped variants from genotyping arrays – particularly rare variants – for GWAS and meta-analysis through rapid imputation, we designed a tool that converts genotype definitions across various array platforms. To further elucidate the genetic heritability of brain disorders, we designed approaches aimed at identifying Copy Number Variations (CNVs) and viral insertions into the human genome. We conducted the first CNV-based whole genome meta-analysis for AD. We also designed an integrated approach to estimate the sensitivity of NGS-based methods of viral insertion detection. For the first time in the literature, we identified herpesvirus in NGS data from an Alzheimer’s disease brain sample. The work in this dissertation represents a three-faceted advance in our understanding of brain disease etiology: 1) evolutionary genomic insights, 2) novel resources and tools to leverage rare variants, and 3) the discovery of disease-associated structural genomic aberrations. Our findings have broad implications on the genetics of complex human disease and hold promise for delivering clinically useful knowledge and resources

ScholarWorks @ UVM

Development of Integrated Machine Learning and Data Science Approaches for the Prediction of Cancer Mutation and Autonomous Drug Discovery of Anti-Cancer Therapeutic Agents

Author: Agajanian Steven
Publication venue: Chapman University Digital Commons
Publication date: 01/01/2020
Field of study

Few technological ideas have captivated the minds of biochemical researchers to the degree that machine learning (ML) and artificial intelligence (AI) have. Over the last few years, advances in the ML field have driven the design of new computational systems that improve with experience and are able to model increasingly complex chemical and biological phenomena. In this dissertation, we capitalize on these achievements and use machine learning to study drug receptor sites and design drugs to target these sites. First, we analyze the significance of various single nucleotide variations and assess their rate of contribution to cancer. Following that, we used a portfolio of machine learning and data science approaches to design new drugs to target protein kinase inhibitors. We show that these techniques exhibit strong promise in aiding cancer research and drug discovery

Chapman University Digital Commons

Interpretation of Mutations, Expression, Copy Number in Somatic Breast Cancer: Implications for Metastasis and Chemotherapy

Author: Dorman Stephanie
Publication venue: Scholarship@Western
Publication date: 15/09/2015
Field of study

Breast cancer (BC) patient management has been transformed over the last two decades due to the development and application of genome-wide technologies. The vast amounts of data generated by these assays, however, create new challenges for accurate and comprehensive analysis and interpretation. This thesis describes novel methods for fluorescence in-situ hybridization (FISH), array comparative genomic hybridization (aCGH), and next generation DNA- and RNA-sequencing, to improve upon current approaches used for these technologies. An ab initio algorithm was implemented to identify genomic intervals of single copy and highly divergent repetitive sequences that were applied to FISH and aCGH probe design. FISH probes with higher resolution than commercially available reagents were developed and validated on metaphase chromosomes. An aCGH microarray was developed that had improved reproducibility compared to the standard Agilent 44K array, which was achieved by placing oligonucleotide probes distant from conserved repetitive sequences. Splicing mutations are currently underrepresented in genome-wide sequencing analyses, and there are limited methods to validate genome-wide mutation predictions. This thesis describes Veridical, a program developed to statistically validate aberrant splicing caused by a predicted mutation. Splicing mutation analysis was performed on a large subset of BC patients previously analyzed by the Cancer Genome Atlas. This analysis revealed an elevated number of splicing mutations in genes involved in NCAM pathways in basal-like and HER2-enriched lymph node positive tumours. Genome-wide technologies were leveraged further to develop chemosensitivity models that predict BC response to paclitaxel and gemcitabine. A type of machine learning, called support vector machines (SVM), was used to create predictive models from small sets of biologically-relevant genes to drug disposition or resistance. SVM models generated were able to predict sensitivity in two groups of independent patient data. High variability between individuals requires more accurate and higher resolution genomic data. However the data themselves are insufficient; also needed are more insightful analytical methods to fully exploit these data. This dissertation presents both improvements in data quality and accuracy as well as analytical procedures, with the aim of detecting and interpreting critical genomic abnormalities that are hallmarks of BC subtypes, metastasis and therapy response

Scholarship@Western