27 research outputs found

    GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations

    Get PDF
    Modern DNA sequencing technologies enable geneticists to rapidly identify genetic variation among many human genomes. However, isolating the minority of variants underlying disease remains an important, yet formidable challenge for medical genetics. We have developed GEMINI (GEnome MINIng), a flexible software package for exploring all forms of human genetic variation. Unlike existing tools, GEMINI integrates genetic variation with a diverse and adaptable set of genome annotations (e.g., dbSNP, ENCODE, UCSC, ClinVar, KEGG) into a unified database to facilitate interpretation and data exploration. Whereas other methods provide an inflexible set of variant filters or prioritization methods, GEMINI allows researchers to compose complex queries based on sample genotypes, inheritance patterns, and both pre-installed and custom genome annotations. GEMINI also provides methods for ad hoc queries and data exploration, a simple programming interface for custom analyses that leverage the underlying database, and both command line and graphical tools for common analyses. We demonstrate GEMINI's utility for exploring variation in personal genomes and family based genetic studies, and illustrate its ability to scale to studies involving thousands of human samples. GEMINI is designed for reproducibility and flexibility and our goal is to provide researchers with a standard framework for medical genomics

    Whole-genome sequencing reveals new Alzheimer's disease-associated rare variants in loci related to synaptic function and neuronal development

    Get PDF
    Introduction Genome-wide association studies have led to numerous genetic loci associated with Alzheimer's disease (AD). Whole-genome sequencing (WGS) now permits genome-wide analyses to identify rare variants contributing to AD risk. Methods We performed single-variant and spatial clustering–based testing on rare variants (minor allele frequency [MAF] ≤1%) in a family-based WGS-based association study of 2247 subjects from 605 multiplex AD families, followed by replication in 1669 unrelated individuals. Results We identified 13 new AD candidate loci that yielded consistent rare-variant signals in discovery and replication cohorts (4 from single-variant, 9 from spatial-clustering), implicating these genes: FNBP1L, SEL1L, LINC00298, PRKCH, C15ORF41, C2CD3, KIF2A, APC, LHX9, NALCN, CTNNA2, SYTL3, and CLSTN2. Discussion Downstream analyses of these novel loci highlight synaptic function, in contrast to common AD-associated variants, which implicate innate immunity and amyloid processing. These loci have not been associated previously with AD, emphasizing the ability of WGS to identify AD-associated rare variants, particularly outside of the exome

    Automated, highly scalable Ribonucleic acid -sequencing analysis

    No full text
    Thesis: Ph. D., Harvard-MIT Program in Health Sciences and Technology, 2015.Cataloged from PDF version of thesis.Includes bibliographical references (pages [119]-139).RNA-sequencing is a sensitive method for inferring gene expression and provides additional information regarding splice variants, polymorphisms and novel genes and isoforms. Using this extra information greatly increases the complexity of an analysis and prevents novice investigators from analyzing their own data. The first chapter of this work introduces a solution to this issue. It describes a community-curated, scalable RNA-seq analysis framework for performing differential transcriptome expression, transcriptome assembly, variant and RNA-editing calling. It handles the entire stack of an analysis, from downloading and installing hundreds of tools, libraries and genomes to running an analysis that is able to be scaled to handle thousands of samples simultaneously. It can be run on a local machine, any high performance cluster or on the cloud and new tools can be plugged in at will. The second chapter of this work uses this software to examine transcriptome changes in the cortex of a mouse model of tuberous sclerosis with a neuron-specific knockout of Tsc1. We show that upregulation of the serotonin receptor Htr2c causes aberrant calcium spiking in the Tsc1 knockout mouse, and implicate it as a novel therapeutic target for tuberous sclerosis. The third chapter of this work investigates transcriptome regulation in the superior colliculus with prolonged eye closure. We show that while the colliculus undergoes long term anatomical changes with light deprivation, the gene expression in the colliculus is unchanged, barring a module of genes involved in energy production. We use the gene expression data to resolve a long-standing debate regarding the expression of dopamine receptors in the superior colliculus and found a striking segregation of the Drd1 and Drd2 dopamine receptors into distinct functional zones.by Rory Kirchner.Ph. D

    Community Development of Validated Variant Calling Pipelines

    No full text
    <p>Presentation at Genome Informatics 2013:</p> <p> </p> <p>Translational research relies on accurate identification of genomic variants. However, rapidly changing best practice approaches in alignment and variant calling, coupled with large data sizes, make it a challenge to create reliable and reproducible variant calls. Coordinated community development can help overcome these challenges by sharing testing and updates across multiple groups. We describe bcbio-nextgen, a distributed multi-architecture pipeline that automates variant calling, validation and organization of results for query and visualization. It creates an easily installable, reliable infrastructure from best-practice open source tools with the following goals:</p> <p>Quantifiable: Validates variant calls against known reference materials developed by the Genome in a Bottle consortium. The bcbio.variationtoolkit automates scoring and assessment of calls to identify regressions in variant identification as calling pipelines evolve. Incorporation of multiple variant calling approaches from Broad’s GATK best practices and the Marth lab’s gkno software enables informed comparisons between current and future algorithms.Scalable: bcbio-nextgen handles large population studies with hundreds of whole genome samples by parallelizing on a wide variety of schedulers and multicore machines, setting up different ad hoc cluster configurations for each workflow step. Work in progress includes integration with virtual environments, including Amazon Web Services and OpenStack.Accessible: Results automatically feed into tools for query and investigation of variants. The GEMINI framework provides a queryable database associating variants with a wide variety of genome annotations. The o8 web-based tool visualizes the work of variant prioritization and assessment.Community developed: bcbio-nextgen is widely used in multiple sequencing centers and research laboratories. We actively encourage contributors to the code base and make it easy to get started with a fully automated installer and updater that prepares all third party software and reference genomes.</p> <p> </p

    Sequencing of Captive Target Transcripts Identifies the Network of Regulated Genes and Functions of Primate-Specific miR-522

    Get PDF
    Identifying microRNA (miRNA)-regulated genes is key to understanding miRNA function. However, many miRNA recognition elements (MREs) do not follow canonical “seed” base-pairing rules, making identification of bona fide targets challenging. Here, we apply an unbiased sequencing-based systems approach to characterize miR-522, a member of the oncogenic primate-specific chromosome 19 miRNA cluster, highly expressed in poorly differentiated cancers. To identify miRNA targets, we sequenced full-length transcripts captured by a biotinylated miRNA mimic. Within these targets, mostly noncanonical MREs were identified by sequencing RNase-resistant fragments. miR-522 overexpression reduced mRNA, protein levels, and luciferase activity of >70% of a random list of candidate target genes and MREs. Bioinformatic analysis suggested that miR-522 regulates cell proliferation, detachment, migration, and epithelial-mesenchymal transition. miR-522 induces G1 cell-cycle arrest and causes cells to detach without anoikis, become invasive, and express mesenchymal genes. Thus, our method provides a simple but effective technique for identifying miRNA-regulated genes and biological function

    Variant mining and tool development with the GEMINI database framework.

    No full text
    <p>(<b>A</b>) Storing variants and annotations in the same database framework enables <i>ad hoc</i> SQL data exploration through both the query module and a Python programming interface. Analysis queries can filter variants based on pre-installed annotations (e.g., in_dbsnp = 0) and custom annotations (e.g., my_disease_regions = 1). Users may also select and filter variants based upon the genotypes of specific individuals (e.g., gt_types.mom =  = HET), thus allowing one to identify variants meeting specific inheritance patters, as shown here. (<b>B</b>) The GEMINI database framework also enables the development of tools that facilitate automated analyses for routine analysis tasks. (<b>C</b>) Moreover, it serves as a standard interface for developers to develop new tools and algorithms and to implement improved statistical tests for population and medical genetics.</p

    The GEMINI browser interface.

    No full text
    <p>In an effort to enable collaborative research and to support users who are less comfortable working on a UNIX command line, we also provide a web browser interface to GEMINI databases. This figure depicts the browser interface to the GEMINI query module; and, as illustrated in the navigation bar, interfaces also exist to other built-in analysis tools (e.g., for finding de novo mutations) and to the GEMINI documentation. (<b>A</b>) The browser interface to the query module allows users to run custom analysis queries in order to identify variants of interest. (<b>B</b>) Users may also enforce “genotype filters” that restrict the returned variants to those that meet specific genotype conditions or inheritance patterns. (<b>C</b>) Additional options are provided allowing the user to 1) add column headers describing the name of each column selected, 2) to create automatic links to the Integrative Genomics Viewer (IGV) from the reported variants, thus facilitating data exploration and validation, and 3) to report results to either the web browser or to a text file for downstream analysis.</p

    High-throughput functional comparison of promoter and enhancer activities

    No full text
    Promoters initiate RNA synthesis, and enhancers stimulate promoter activity. Whether promoter and enhancer activities are encoded distinctly in DNA sequences is unknown. We measured the enhancer and promoter activities of thousands of DNA fragments transduced into mouse neurons. We focused on genomic loci bound by the neuronal activity-regulated coactivator CREBBP, and we measured enhancer and promoter activities both before and after neuronal activation. We find that the same sequences typically encode both enhancer and promoter activities. However, gene promoters generate more promoter activity than distal enhancers, despite generating similar enhancer activity. Surprisingly, the greater promoter activity of gene promoters is not due to conventional core promoter elements or splicing signals. Instead, we find that particular transcription factor binding motifs are intrinsically biased toward the generation of promoter activity, whereas others are not. Although the specific biases we observe may be dependent on experimental or cellular context, our results suggest that gene promoters are distinguished from distal enhancers by specific complements of transcriptional activators.National Institute of Mental Health (U.S.) (Grant R01 MH101528-01
    corecore