23 research outputs found

    SQANTI : extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification

    Get PDF
    High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes

    Genetic perturbation of PU.1 binding and chromatin looping at neutrophil enhancers associates with autoimmune disease.

    Get PDF
    Neutrophils play fundamental roles in innate immune response, shape adaptive immunity, and are a potentially causal cell type underpinning genetic associations with immune system traits and diseases. Here, we profile the binding of myeloid master regulator PU.1 in primary neutrophils across nearly a hundred volunteers. We show that variants associated with differential PU.1 binding underlie genetically-driven differences in cell count and susceptibility to autoimmune and inflammatory diseases. We integrate these results with other multi-individual genomic readouts, revealing coordinated effects of PU.1 binding variants on the local chromatin state, enhancer-promoter contacts and downstream gene expression, and providing a functional interpretation for 27 genes underlying immune traits. Collectively, these results demonstrate the functional role of PU.1 and its target enhancers in neutrophil transcriptional control and immune disease susceptibility

    GWAS of genetic factors affecting white blood cell morphological parameters in Sardinians uncovers influence of chromosome 11 innate immunity gene cluster on eosinophil morphology

    Get PDF
    Few genome-wide association studies (GWAS) analyzing genetic regulation of morphological traits of white blood cells have been reported. We carried out a GWAS of 12 morphological traits in 869 individuals from the general population of Sardinia, Italy. These traits, included measures of cell volume, conductivity and light scatter in four white-cell populations (eosinophils, lymphocytes, monocytes, neutrophils). This analysis yielded seven statistically significant signals, four of which were novel (four novel, PRG2, P2RX3, two of CDK6). Five signals were replicated in the independent INTERVAL cohort of 11 822 individuals. The most interesting signal with large effect size on eosinophil scatter (P-value = 8.33 x 10-32, beta = -1.651, se = 0.1351) falls within the innate immunity cluster on chromosome 11, and is located in the PRG2 gene. Computational analyses revealed that a rare, Sardinian-specific PRG2:p.Ser148Pro mutation modifies PRG2 amino acid contacts and protein dynamics in a manner that could possibly explain the changes observed in eosinophil morphology. Our discoveries shed light on genetics of morphological traits. For the first time, we describe such large effect size on eosinophils morphology that is relatively frequent in Sardinian population.Intramural Research Program of the National Institute on Aging (N01-AG-1-2109 and HHSN271201100005C); National Institutes of Health (NIH); by research grants from the Ministry of Science and Innovation (PGC2018-096049-B-I00); European Regional Development Fund (FEDER); Andalusian Government (BIO-198, US-1254317, US-1257019, P18-FR-3487 and P18HO-4091, US/JUNTA/FEDER, UE), University of Seville (VI PPIT) and the Ramón Areces Foundation. G.P.-M. was awarded a PhD fellowship from the Spanish Ministry of Education, Culture and Sport (FPU17/04604).Peer reviewe

    The Polygenic and Monogenic Basis of Blood Traits and Diseases

    Get PDF
    Blood cells play essential roles in human health, underpinning physiological processes such as immunity, oxygen transport, and clotting, which when perturbed cause a significant global health burden. Here we integrate data from UK Biobank and a large-scale international collaborative effort, including data for 563,085 European ancestry participants, and discover 5,106 new genetic variants independently associated with 29 blood cell phenotypes covering a range of variation impacting hematopoiesis. We holistically characterize the genetic architecture of hematopoiesis, assess the relevance of the omnigenic model to blood cell phenotypes, delineate relevant hematopoietic cell states influenced by regulatory genetic variants and gene networks, identify novel splice-altering variants mediating the associations, and assess the polygenic prediction potential for blood traits and clinical disorders at the interface of complex and Mendelian genetics. These results show the power of large-scale blood cell trait GWAS to interrogate clinically meaningful variants across a wide allelic spectrum of human variation. Analysis of blood cell traits in the UK Biobank and other cohorts illuminates the full genetic architecture of hematopoietic phenotypes, with evidence supporting the omnigenic model for complex traits and linking polygenic burden with monogenic blood diseases

    The Polygenic and Monogenic Basis of Blood Traits and Diseases

    Get PDF
    Blood cells play essential roles in human health, underpinning physiological processes such as immunity, oxygen transport, and clotting, which when perturbed cause a significant global health burden. Here we integrate data from UK Biobank and a large-scale international collaborative effort, including data for 563,085 European ancestry participants, and discover 5,106 new genetic variants independently associated with 29 blood cell phenotypes covering a range of variation impacting hematopoiesis. We holistically characterize the genetic architecture of hematopoiesis, assess the relevance of the omnigenic model to blood cell phenotypes, delineate relevant hematopoietic cell states influenced by regulatory genetic variants and gene networks, identify novel splice-altering variants mediating the associations, and assess the polygenic prediction potential for blood traits and clinical disorders at the interface of complex and Mendelian genetics. These results show the power of large-scale blood cell trait GWAS to interrogate clinically meaningful variants across a wide allelic spectrum of human variation.</p

    Uso de la SOO-3 para la inmunoterapia adyuvante contra el cáncer

    No full text
    La presente invención se refiere al uso de la Superóxido Oismutasa 3 (SOO-3) como adyuvante en la inmunoterapia contra el cáncer.Peer reviewedConsejo Superior de Investigaciones Científicas (España)A1 Solicitud de patente con informe sobre el estado de la técnic

    Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data

    No full text
    Alternative splicing leverages genomic content by allowing the synthesis of multiple transcripts and, by implication, protein isoforms, from a single gene. However, estimating the abundance of transcripts produced in a given tissue from short sequencing reads is difficult and can result in both the construction of transcripts that do not exist, and the failure to identify true transcripts. An alternative approach is to catalog the events that make up isoforms (splice junctions and exons). We present here the Event Analysis (EA) approach, where we project transcripts onto the genome and identify overlapping/unique regions and junctions. In addition, all possible logical junctions are assembled into a catalog. Transcripts are filtered before quantitation based on simple measures: the proportion of the events detected, and the coverage. We find that mapping to a junction catalog is more efficient at detecting novel junctions than mapping in a splice aware manner. We identify 99.8% of true transcripts while iReckon identifies 82% of the true transcripts and creates more transcripts not included in the simulation than were initially used in the simulation. Using PacBio Iso-seq data from a mouse neural progenitor cell model, EA detects 60% of the novel junctions that are combinations of existing exons while only 43% are detected by STAR. EA further detects ∼5,000 annotated junctions missed by STAR. Filtering transcripts based on the proportion of the transcript detected and the number of reads on average supporting that transcript captures 95% of the PacBio transcriptome. Filtering the reference transcriptome before quantitation, results in is a more stable estimate of isoform abundance, with improved correlation between replicates. This was particularly evident when EA is applied to an RNA-seq study of type 1 diabetes (T1D), where the coefficient of variation among subjects (n = 81) in the transcript abundance estimates was substantially reduced compared to the estimation using the full reference. EA focuses on individual transcriptional events. These events can be quantitate and analyzed directly or used to identify the probable set of expressed transcripts. Simple rules based on detected events and coverage used in filtering result in a dramatic improvement in isoform estimation without the use of ancillary data (e.g., ChIP, long reads) that may not be available for many studies

    Supplemental Material for Newman et al., 2018

    No full text
    <div>Supplementary figures, tables and files for Newman et. al, 2018, "Event Analysis: using transcript events to improve estimates of abundance in RNA-seq data", G3, manuscript# G3/2018/200373R1.</div><div><br></div><div>Additional File 1 contains Supplementary Figures 1-6 and Supplementaty Table 1</div><div><br></div><div>Supplementary File 1 contains the results of simulations to compare the Event Analysis approach to STAR (for junction detection) and iReckon (for transcript identification).</div><div><br></div><div>Supplementary File 2 contains the results of applying Event Analysis to eXpress, and the results of using iReckon to identify possible transcripts in the mouse neural data used in the study.<br></div><div><br></div><div>Supplementary File 3 contains the results of comparing Event Analysis (using Bowtie or SOAP2 as the aligner) against STAR for benchmarking junction detection using the mouse neural data. Results are benchmarked against the set of junctions observed in PacBio-sequenced transcripts in the mouse neural data.<br></div><div><br></div><div>Supplementary File 4 contains the comparison between Event Analysis and iReckon for the mouse neural data, benchmarked against PacBio-sequenced transcripts.</div

    Supplemental Material for Newman et al., 2018

    No full text
    Comparison of Event analysis with RSEM to Event Analysis with eXpress and these approaches to transcript estimation with iRecko
    corecore