556 research outputs found

    Common genetic variation drives molecular heterogeneity in human iPSCs.

    Get PDF
    Technology utilizing human induced pluripotent stem cells (iPS cells) has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterization of many existing iPS cell lines limits their potential use for research and therapy. Here we describe the systematic generation, genotyping and phenotyping of 711 iPS cell lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative. Our study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5-46% of the variation in different iPS cell phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of genomic copy-number alterations that are repeatedly observed in iPS cells. In addition, we present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells

    Designing a Robust and Portable Workflow for Detecting Genetic Variants Associated with Molecular Phenotypes Across Multiple Studies

    Get PDF
    Kvantitatiivse tunnuse lookusteks (quantitative trait locus, QTL) nimetatakse geneetilisi variante, millel on statistiline seos mõne molekulaarse tunnusega. QTL analüüs võimaldab paremini aru saada komplekshaiguseid ja tunnuseid mõjutavatest molekulaarsetest mehhanismidest. Tüüpiline QTL analüüs koosneb suurest hulgast sammudest, mille kõigi jaoks on olemas palju erinevaid tööriistu, kuid mida ei ole siiani kokku pandud ühte lihtsasti kasutatavasse, teisaldatavasse ning korratavasse töövoogu. Käesolevas töös loodud töövoog koosneb kolmest moodulist: huvipakkuva tunnuse kvantifitseerimine (i), andmete normaliseerimine ja kvaliteedikontroll (ii) ning QTL analüüs (iii). Kvantifitseerimise ja QTL analüüsi moodulite jaoks kasutasime Nextflow töövoo juhtimise süsteemi ning järgisime kõiki nf-core raamistiku parimaid praktikaid. Mõlemad töövoo moodulid on avatud lähekoodiga ning kasutavad tarkvarakonteinereid, mis võimaldab kasutajatel neid lihtsalt laiendada ning jooksutada erinevates arvutuskeskkondades. Kvaliteedikontrolli teostamiseks ning andmete normaliseerimiseks arendasime välja skripti, mis automaatselt arvutab välja erinevad kvaliteedimõõdikud ning esitab need kasutajale. Juhtprojekti raames viisime läbi geeniekspressiooni QTL analüüsi 15 andmestikus ja 40 erinevas bioloogilises kontekstis ning tuvastasime vähemalt ühe statistiliselt olulise QTLi enam kui 9000 geenile. Loodud töövoogude laialdasem kasutuselevõtt võimaldab muuta QTL analüüsi korratavamaks, teisaldatavamaks ning lihtsamini kasutatavaks.Quantitative trait locus (QTL) analysis links variations in molecular phenotype expression levels to genotype variation. This analysis has become a standard practice to better understand molecular mechanisms underlying complex traits and diseases. Typical QTL analysis consists of multiple steps. Although a diverse set of tools is available to perform these individual analysis, the tools have so far not been integrated into a reproducible and scalable workflow that is easy to use across a wide range computational environments. Our analysis workflow consists of three modules. The analysis starts with quantification of the phenotype of interest, proceeds with normalisation and quality control and finishes with the QTL analysis. For phenotype quantification and QTL mapping modules we developed pipelines following best practices of the nf-core framework. The pipelines are containerized, open-source, extensible and eligible to be parallelly executed in a variety computational environments. For quality control module we developed a script which automatically computes the measures of quality and provides user with information. As a proof of concept, we uniformly processed more than 40 context specific groups from more than 15 studies and discovered at least one significant eQTL for more than 9000 genes. We believe that adopting our pipelines will increase reproducibility, portability and robustness of QTL analysis in comparison to existing approaches

    A survey of best practices for RNA-seq data analysis.

    Get PDF
    RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.This is the final published version. It first appeared at http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8

    RNA‐seq: Applications and Best Practices

    Get PDF
    RNA‐sequencing (RNA‐seq) is the state‐of‐the‐art technique for transcriptome analysis that takes advantage of high‐throughput next‐generation sequencing. Although being a powerful approach, RNA‐seq imposes major challenges throughout its steps with numerous caveats. There are currently many experimental options available, and a complete comprehension of each step is critical to make right decisions and avoid getting into inconclusive results. A complete workflow consists of: (1) experimental design; (2) sample and library preparation; (3) sequencing; and (4) data analysis. RNA‐seq enables a wide range of applications such as the discovery of novel genes, gene/transcript quantification, and differential expression and functional analysis. This chapter will encompass the main aspects from sample preparation to downstream data analysis. It will be discussed how to obtain high‐quality samples, replicates amount, library preparation, sequencing platforms and coverage, focusing on best recommended practices based on specialized literature. Basic techniques and well‐known algorithms are presented and discussed, guiding both beginners and experienced users in the implementation of reliable experiments

    Allelome.PRO, a pipeline to define allele-specific genomic features from high-throughput sequencing data

    Get PDF
    Detecting allelic biases from high-throughput sequencing data requires an approach that maximises sensitivity while minimizing false positives. Here, we present Allelome.PRO, an automated user-friendly bioinformatics pipeline, which uses high-throughput sequencing data from reciprocal crosses of two genetically distinct mouse strains to detect allele-specific expression and chromatin modifications. Allelome.PRO extends approaches used in previous studies that exclusively analyzed imprinted expression to give a complete picture of the ‘allelome’ by automatically categorising the allelic expression of all genes in a given cell type into imprinted, strain-biased, biallelic or non-informative. Allelome.PRO offers increased sensitivity to analyze lowly expressed transcripts, together with a robust false discovery rate empirically calculated from variation in the sequencing data. We used RNA-seq data from mouse embryonic fibroblasts from F1 reciprocal crosses to determine a biologically relevant allelic ratio cutoff, and define for the first time an entire allelome. Furthermore, we show that Allelome.PRO detects differential enrichment of H3K4me3 over promoters from ChIP-seq data validating the RNA-seq results. This approach can be easily extended to analyze histone marks of active enhancers, or transcription factor binding sites and therefore provides a powerful tool to identify candidate cis regulatory elements genome wide
    corecore