507 research outputs found

    Computational Methods for Sequencing and Analysis of Heterogeneous RNA Populations

    Get PDF
    Next-generation sequencing (NGS) and mass spectrometry technologies bring unprecedented throughput, scalability and speed, facilitating the studies of biological systems. These technologies allow to sequence and analyze heterogeneous RNA populations rather than single sequences. In particular, they provide the opportunity to implement massive viral surveillance and transcriptome quantification. However, in order to fully exploit the capabilities of NGS technology we need to develop computational methods able to analyze billions of reads for assembly and characterization of sampled RNA populations. In this work we present novel computational methods for cost- and time-effective analysis of sequencing data from viral and RNA samples. In particular, we describe: i) computational methods for transcriptome reconstruction and quantification; ii) method for mass spectrometry data analysis; iii) combinatorial pooling method; iv) computational methods for analysis of intra-host viral populations

    SARS-CoV-2 Wastewater Genomic Surveillance: Approaches, Challenges, and Opportunities

    Full text link
    During the SARS-CoV-2 pandemic, wastewater-based genomic surveillance (WWGS) emerged as an efficient viral surveillance tool that takes into account asymptomatic cases and can identify known and novel mutations and offers the opportunity to assign known virus lineages based on the detected mutations profiles. WWGS can also hint towards novel or cryptic lineages, but it is difficult to clearly identify and define novel lineages from wastewater (WW) alone. While WWGS has significant advantages in monitoring SARS-CoV-2 viral spread, technical challenges remain, including poor sequencing coverage and quality due to viral RNA degradation. As a result, the viral RNAs in wastewater have low concentrations and are often fragmented, making sequencing difficult. WWGS analysis requires advanced computational tools that are yet to be developed and benchmarked. The existing bioinformatics tools used to analyze wastewater sequencing data are often based on previously developed methods for quantifying the expression of transcripts or viral diversity. Those methods were not developed for wastewater sequencing data specifically, and are not optimized to address unique challenges associated with wastewater. While specialized tools for analysis of wastewater sequencing data have also been developed recently, it remains to be seen how they will perform given the ongoing evolution of SARS-CoV-2 and the decline in testing and patient-based genomic surveillance. Here, we discuss opportunities and challenges associated with WWGS, including sample preparation, sequencing technology, and bioinformatics methods.Comment: V Munteanu and M Saldana contributed equally to this work A Smith and S Mangul jointly supervised this work For correspondence: [email protected]

    Exploring viral infection using single-cell sequencing.

    Get PDF
    Single-cell sequencing (SCS) has emerged as a valuable tool to study cellular heterogeneity in diverse fields, including virology. By studying the viral and cellular genome and/or transcriptome, the dynamics of viral infection can be investigated at single cell level. Most studies have explored the impact of cell-to-cell variation on the viral life cycle from the point of view of the virus, by analyzing viral sequences, and from the point of view of the cell, mainly by analyzing the cellular host transcriptome. In this review, we will focus on recent studies that use single-cell sequencing to explore viral diversity and cell variability in response to viral replication

    CRISPR Screens in Synthetic Lethality and Combinatorial Therapies for Cancer

    Get PDF
    Cancer is a complex disease resulting from the accumulation of genetic dysfunctions. Tumor heterogeneity causes the molecular variety that divergently controls responses to chemotherapy, leading to the recurrent problem of cancer reappearance. For many decades, efforts have focused on identifying essential tumoral genes and cancer driver mutations. More recently, prompted by the clinical success of the synthetic lethality (SL)-based therapy of the PARP inhibitors in homologous recombinant deficient tumors, scientists have centered their novel research on SL interactions (SLI). The state of the art to find new genetic interactions are currently large-scale forward genetic CRISPR screens. CRISPR technology has rapidly evolved to be a common tool in the vast majority of laboratories, as tools to implement CRISPR screen protocols are available to all researchers. Taking advantage of SLI, combinatorial therapies have become the ultimate model to treat cancer with lower toxicity, and therefore better efficiency. This review explores the CRISPR screen methodology, integrates the up-to-date published findings on CRISPR screens in the cancer field and proposes future directions to uncover cancer regulation and individual responses to chemotherapy

    Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens

    Get PDF
    Genetic screens help infer gene function in mammalian cells, but it has remained difficult to assay complex phenotypes—such as transcriptional profiles—at scale. Here, we develop Perturb-seq, combining single-cell RNA sequencing (RNA-seq) and clustered regularly interspaced short palindromic repeats (CRISPR)-based perturbations to perform many such assays in a pool. We demonstrate Perturb-seq by analyzing 200,000 cells in immune cells and cell lines, focusing on transcription factors regulating the response of dendritic cells to lipopolysaccharide (LPS). Perturb-seq accurately identifies individual gene targets, gene signatures, and cell states affected by individual perturbations and their genetic interactions. We posit new functions for regulators of differentiation, the anti-viral response, and mitochondrial function during immune activation. By decomposing many high content measurements into the effects of perturbations, their interactions, and diverse cell metadata, Perturb-seq dramatically increases the scope of pooled genomic assays. Keywords: single-cell RNA-seq; pooled screen; CRISPR; epistasis; genetic interaction

    Elucidating the cellular dynamics of the brain with single-cell RNA sequencing

    Get PDF
    Single-cell RNA-sequencing (scRNA-seq) has emerged in recent years as a breakthrough technology to understand RNA metabolism at cellular resolution. In addition to allowing new cell types and states to be identified, scRNA-seq can permit cell-type specific differential gene expression changes, pre-mRNA processing events, gene regulatory networks and single-cell developmental trajectories to be uncovered. More recently, a new wave of multi-omic adaptations and complementary spatial transcriptomics workflows have been developed that facilitate the collection of even more holistic information from individual cells. These developments have unprecedented potential to provide penetrating new insights into the basic neural cell dynamics and molecular mechanisms relevant to the nervous system in both health and disease. In this review we discuss this maturation of single-cell RNA-sequencing over the past decade, and review the different adaptations of the technology that can now be applied both at different scales and for different purposes. We conclude by highlighting how these methods have already led to many exciting discoveries across neuroscience that have furthered our cellular understanding of the neurological disease

    Computational methods to analyze molecular determinants behind phenotypes

    Get PDF
    Phenotype is a collection of an organism's observable features that can be characterized both on individual level and on single cell level. Phenotypes are largely determined by their molecular processes which also explains their inheritance and plasticity. Some of the molecular background of phenotypes can be characterized by inherited genetic variations and alterations in gene expression. The high-throughput measurement technologies enable the measurement of molecular determinants in cells. However, measurement technologies produce remarkable large data sets and the research questions have become increasingly complex. Thus computational methods are needed to discover molecular mechanisms behind the phenotypes. In many cases, analysis of molecular determinants that contribute to the phenotype proceeds by first identifying putative candidates by using a priori information and high-throughput measurements. Then further analysis can focus on most promising molecules. In many cases, the aim is to identify relevant markers or targets from a set of candidate molecules. Often biomedical studies result in a long list of candidate genes, and to interpret these candidates, information on their context in cell functions is needed. This context information can give insight to synergistic effects of molecular machinery in cells when functions of individual molecules do not explain the observed phenotype. In addition, the context information can be used to generate candidates. One of the methods in this thesis provides a computational data integration method that provides a link in between candidate genes from molecular pathways and genetic variants. It uses publicly available biological knowledge bases to systematically create functional context of candidate genes. This approach is especially important when studying cancer, that is dependent of complex molecular signaling. Genotypes associated with inherited disease predispositions have been studied successfully in the past, however, traditional methods are not applicable in wide variety of analysis conditions. Thus, this thesis introduces a method that uses haplotype sharing to identify genetic loci inherited by multiple distantly related individuals. It is flexible and can be used in various settings, also with very limited number of samples. Increasing the number of biological replicates in gene expression analysis increases the reliability of the results. In many cases, however, the number of samples is limited. Therefore, pooling gene expression data from multiple published studies can increase the understanding of the molecular background behind cell types. This is shown in this thesis by an analysis that identifies gene expression differences in two cell types using publicly available gene expression samples from previous studies. Finally, when candidate molecules are available to characterize phenotypes, they can be compiled into biomarkers. In many cases, a combination of multiple molecules serves as a better biomarker than a single molecule. This thesis also includes a machine learning approach that is used to discover a classifier that predicts the phenotype.Fenotyyppi on joukko organismin piirteitä, jotka ovat havaittavissa joko yksilön tasolla tai yksittäisten solujen tasolla. Molekulaariset prosessit määräävät pitkälti fenotyyppien ilmentymistä, joten taustalla vaikuttavat molekulaariset prosessit myös selittävät fenotyyppien perinnöllisyyttä sekä niiden mukautumista. Fenotyyppien molekulaarista taustaa voidaan kartoittaa tunnistamalla geneettistä variaatiota sekä muutoksia geenien aktiivisuudessa. Määrääviä molekulaarisia tekijöitä voidaan havaita soluissa käyttämällä high-throughput -mittausteknologioita. Nämä mittausteknologiat tuottavat erittäin suuria data-aineistoja ja samalla tutkimuskysymykset ovat tulleet entistä monimutkaisemmiksi. Nämä seikat ovat johtaneet siihen, että laskennallisia menetelmiä tarvitaan fenotyyppien molekulaarisen mekanismien tunnistamisessa. Usein tutkimus etenee ensin tunnistamalla lupaavia kandidaatteja käyttämällä a priori tietoa sekä high-throughput -mittauksia. Jatkoanalyysit voivat keskittyä lupaavimpiin molekyyleihin. Tällöin tavoitteena saattaa olla käyttökelpoisimpien biomarkkereiden tunnistaminen tai kohdegeenien valitseminen kandidaattien joukosta. Usein biolääketieteen tutkimus tuottaa joukon kandidaattigeenejä, jolloin tulosten tulkinta vaatii tietoa kandidaattigeenien suhteesta solun muuhun molekulaariseen toimintaan. Kun tämä molekulaarinen toiminta kontekstina otetaan huomioon, on mahdollista ymmärtää geenien yhteisvaikutuksia solun toimintaan silloin kun yksittäiset geenit eivät selitä havaittua fenotyyppiä. Solun molekulaarista kontekstia voi käyttää myös kandidaattigeenien luomiseen. Yksi väitöskirjassa esitelty menetelmä tarjoaa laskennallisen menetelmän, jolla voidaan yhdistää kandidaatit tunnetuilta pathwaylta geneettisiin variantteihin. Tämä menetelmä käyttää julkisia tietokantoja, joista se systemaattisesti kerää molekulaarisen kontekstin kandidaattigeeneille. Tällainen lähestymistapa on erityisen hyödyllinen syöpätutkimuksessa, sillä syöpä on tyypillisesti riippuvainen monimutkaisista molekyylien signalointiverkoista. Perittyjen genotyyppien ja sairauksien välisiä yhteyksiä on tutkittu pitkään menestyksekkäästi, mutta perinteisesti käytetyt menetelmät soveltuvat vain tiettyihin tapauksiin. Tässä väitöskirjassa esitellään menetelmä, joka käyttää haplotyyppien jakamista tunnistaakseen genomiset alueet, jotka ovat periytyneet useille kaukaisesti sukua oleville henkilöille. Tätä menetelmää voi käyttää useissa erilaisissa tutkimuskysymyksissä, ja se tuottaa luotettavia tuloksia myös hyvin vähäisellä näytemäärällä. Geeniekspressioanalyysin tulosten luotettavuus kasvaa samalla kun biologisten kopioiden määrä aineistossa kasvaa. Huolimatta tästä, näytemäärät ovat usein rajallisia. Tämän vuoksi geeniekspressiomittausten yhdistäminen useista jo julkaistuista tutkimuksista voi lisätä ymmärrystä solutyypin määräävistä biologisista prosesseista. Tässä väitöskirjassa esitellään analyysi, jolla tunnistetaan geeniekspressioeroja käyttäen geeniekspressioainestoa, joka on yhdistetty julkaistuista tutkimuksista. Viimein, kun fenotyyppiä selittävät kandidaattimolekyylit on tunnistettu, niistä voidaan luoda biomarkkereita. Monesti useamman molekyylin mittaus on parempi biomarkkeri kuin yksikään molekyyli yksinään. Tässä väitöskirjassa esitellään myös koneoppimisanalyysi, jolla luodaan geeniekspressiomittauksista fenotyyppiä ennustava luokittelija

    Modeling the Transcriptional Landscape of in vitro Neuronal Differentiation and ALS Disease

    Get PDF
    The spinal cord is a complex structure responsible for processing sensory inputs and motor outputs. As such, the developmental and spatial organization of cells is highly organized. Diseases affecting the spinal cord, such as Amyotrophic Lateral Sclerosis (ALS), result in the disruption of normal cellular function and intercellular interactions, culminating in neurodegeneration. Deciphering disease mechanisms requires a fundamental understanding of both the normal development of cells within the spinal cord as well as the homeostatic environment that allows for proper function. Biological processes such as cellular differentiation, maturation, and disease progression proceed in an asynchronous and cell type-specific manner. Until recently, bulk measurements of a mixed population of cells have been key in understanding these events. However, bulk measurements can obscure the molecular mechanisms involved in branched or coinciding processes, such as differential transcriptional responses occurring between subpopulations of cells. Measurements in individual cells have largely been restricted to 4 color immunofluorescence assays, which provide a solid but limited view of molecular-level changes. Recently, developments in single cell RNA-sequencing (scRNA-Seq) have provided an avenue of accurately profiling the RNA expression levels of thousands of genes concomitantly in an individual cell. With this increased experimental precision comes the ability to explore pathways that are differentially activated in subpopulations of cells, and to determine the transcriptional programs that underlie complex biological processes. In this dissertation, I will first review the key features of scRNA-Seq and downstream analysis. I will then discuss two applications of scRNA-seq: 1) the in vitro differentiation of mouse embryonic stem cells into motor neurons, and 2) the effect of the ALS-associated gene SOD1G93A expression on cultured motor neurons in a cellular model of ALS

    Integrating -Omics For Studying Functional Role Of Ulcerative Colitis Risk Associated Loci.

    Get PDF
    Background: Ulcerative Colitis is chronic inflammatory condition of unknown etiology. Genome Wide Association Studies have successfully identified large number of UC risk associated loci, majority of which are located in non-protein-coding DNA regions and been showed to be enriched within regulatory elements, such as enhancers. However, the function of these UC risk associated variants is still unknow. Aim: To delineate the functional role of GWAS risk associated loci in UC relevant cell types. Method: We assessed chromatin activity (ATAC seq) and transcriptional behavior (RNA seq) of primary cell types extracted from intestinal biopsies and blood from diseased and healthy participant. Next, to pinpoint the mechanistic of how UC associated loci contributes to disease risk, we intersected our disease and cell type specific differential expression and differential chromatin accessibility data with GWAS dataset. Results: Unfortunately, due to technical and financial reasons we failed to reach the target sequencing depth for both ATAC seq and RNA seq experiments. In addition, when combined with very low participant numbers, our data sets were not strong enough to reliably identify the functional role of GWAS variants. However, for practice, we proceeded with slightly simplistic proximity-based modeling and showed that intersecting the 3 -omics studies allowed us to identify 10 regions where the lowest p-value associated SNP was in proximity to differentially expressed gene and differentially accessible chromatin region. Conclusion: We were able to compare the first time the expression levels and chromatin conformation in purified immune cell populations from intestinal tissue and peripheral blood. Unfortunately, due to poor experimental design this study was markedly underpowered and any findings from RNA seq and ATAC seq experiments should be further validated before any biological conclusions are made or used for reliable prediction of functional role of UC associated risk variants.MedImmune and the Cambridge NIHR BRC PhD studentshi
    corecore