5 research outputs found

    Computational framework for systematic and scalable analysis of deep sequencing transcriptomics data

    Get PDF
    High-throughput technologies have had a profound impact in transcriptomics. Prior to microarrays, measuring gene expression was not possible in a massively parallel way. As of late, deep RNA sequencing has been constantly gaining ground to microarrays in transcriptomics analysis. RNA-Seq promises several advantages over microarray technologies, but it also comes with its own set of challenges. Different approaches exist to tackle each of the required processing steps of the RNA-Seq data. The proposed solutions need to be carefully evaluated to find the best methods depending on the particularities of the datasets and the specific research questions that are being addressed. In this thesis I propose a computational framework that allows the efficient analysis of RNA-Seq datasets. The parallelization of tasks and organization of the data files was handled by the Anduril framework on which the workflow was implemented. Particular emphasis was bestowed on the quality control of the RNA-Seq files. Several measures were taken to prune the data of low quality bases and reads that hamper the alignment step. Furthermore, various existing processing algorithms for transcript assembly and abundance estimation were tested. The best methods have been coupled together into an automated pipeline that takes the raw reads and delivers expression matrices at isoform and gene level. Additionally, a module for obtaining sets of differentially expressed genes under different conditions or when measuring an experiment across a time course is included

    Transcriptomics analysis and its applications in cancer

    Get PDF
    Cancer is a collection of diseases that combined are one of the leading causes of deaths worldwide. Although great strides have been made in finding cures for certain cancers, the heterogeneity caused by both the tissue in which cancer originates and the mutations acquired in the cell’s DNA results in unsuccessful treatments for some patients. The genetic alterations caused by carcinogenics or by random mutations acquired during normal cell division promotes changes in the cell’s metabolism. These changes are usually reflected in abnormal gene expression that can be studied to understand the underlying mechanisms giving rise to cancer as well as suggest treatments that can exploit each tumor’s specific vulnerabilities. RNA-Seq is a technology that allows the identification and quantification of the genes that are being expressed inside the cell in a given moment. RNA-Seq has several characteristics and advantages that allow a diversity of applications to exist. For example, apart from quantifying gene expression, it can be used to detect different variants of the same gene, has base pair resolution which is informative of the gene sequence, and can also be used to quantify other RNA molecules besides messenger RNA (mRNA), such as microRNAs. The two main aims of this work are to provide computational methods for data analysis of RNA-Seq and to show specific applications of RNA-Seq that can shed light into cancer mechanisms. In Publications I and IV we developed the Sequence Processesing Integration and Analysis (SePIA) and the Fusion Gene Integration (FUNGI) toolsets that facilitate the creation of reproducible pipelines for investigating different aspects of the cancer transcriptome. SePIA’s utility is showcased with the analysis of datasets from two public data repositories. One of the analysis shows a standard RNA-Seq analysis, while the second one produced a pipeline for mRNA-microRNA integration. The second toolset, FUNGI, is aimed specifically at finding reliable gene fusions with oncogenic potential. To demonstrate FUNGI’s features, we analyzed 107 in-house samples and processed over 400 public samples from a public data repository. FUNGI allowed us to detect fusions in ovarian cancer with a higher prevalence than previously recognized. Additionally, we identified a fusion gene that has not been reported before in ovarian cancer, but that can be targeted with a drug currently in clinical trials. In Publication II we investigated the role of alternative splicing in diffuse large B-cell lymphoma and were able to show that isoform-level instead of gene-level is better at discriminating between subtypes. Additionally, specific isoforms, such as APH1A, KCNH6, and ABCB1, were correlated with survival. In Publication III, we used RNA-Seq to complement the phasing of genetic variants with somatic mutations in tumor suppressor genes. In this study we found enrichment of haplotype combinations that suggest that haploinsufficiency of tumor suppressor genes is enriched in cancer patients. SePIA and FUNGI are tools that can be used by the community to explore their datasets and contribute to the acquisition of knowledge in the field of cancer genetics with next generation sequencing. The applications of RNA-Seq studies included in this dissertation showed that RNA-Seq can be effectively used to aid in the classification of cancer subtypes, and that RNA-Seq can be used in combination with DNA sequencing to explore gene expression mediated by genetic variation in cancer.Syöpä on kokoelma sairauksia, jotka yhdessä ovat yksi suurimmista kuolemaan johtavista syistä maailmanlaajuisesti. Vaikka monien syöpien hoidossa on tapahtunut suuria edistysaskelia, joidenkin potilaiden kohdalla hoidot epäonnistuvat koska kudos, josta kasvain saa alkunsa, sekä muutokset, joita kertyy solun DNA:han, aiheuttavat suuria eroavaisuuksia sekä kasvainten kesken, että niiden sisällä. Karsinogeenien aiheuttamat tai normaalin solunjakautumisen yhteydessä sattumalta tapahtuvat muutokset solun perimässä aiheuttavat muutoksia solun aineenvaihdunnassa. Nämä muutokset heijastuvat yleensä epänormaalina geenien ilmentymisenä, joita tutkimalla voidaan selvittää syövän syntyyn vaikuttavia mekanismeja ja ehdottaa hoitoja, jotka kohdistuvat yksittäisen kasvaimen erityisiin heikkouksiin. Tämän väitöskirjatyön päätavoitteina on ollut kehittää laskennallisia menetelmiä geenien ilmentymisen analysointiin sekä osoittaa käytännössä, miten niiden avulla saadaan lisätietoa syövän syntymekanismeista. Tätä varten kehitimme kaksi työkalupakettia, jotka edesauttavat luotettavasti toistettavien työnkulkujen luomista sýövän transkriptomin, eli sen kaikkien RNA-molekyylien, tutkimiseen. Näiden työkalujen avulla pystyimme vakioimaan transkriptiodatan analyysin ja tunnistamaan sellaisia poikkeuksia geenisekvensseissä tai muutoksia geenien ilmentymisessä, joilla on merkitystä syövän etenemisessä tai jotka liittyvät syövän eri alatyyppeihin

    SePIA : RNA and small RNA sequence processing, integration, and analysis

    Get PDF
    Background: Large-scale sequencing experiments are complex and require a wide spectrum of computational tools to extract and interpret relevant biological information. This is especially true in projects where individual processing and integrated analysis of both small RNA and complementary RNA data is needed. Such studies would benefit from a computational workflow that is easy to implement and standardizes the processing and analysis of both sequenced data types. Results: We developed SePIA (Sequence Processing, Integration, and Analysis), a comprehensive small RNA and RNA workflow. It provides ready execution for over 20 commonly known RNA-seq tools on top of an established workflow engine and provides dynamic pipeline architecture to manage, individually analyze, and integrate both small RNA and RNA data. Implementation with Docker makes SePIA portable and easy to run. We demonstrate the workflow's extensive utility with two case studies involving three breast cancer datasets. SePIA is straightforward to configure and organizes results into a perusable HTML report. Furthermore, the underlying pipeline engine supports computational resource management for optimal performance. Conclusion: SePIA is an open-source workflow introducing standardized processing and analysis of RNA and small RNA data. SePIA's modular design enables robust customization to a given experiment while maintaining overall workflow structure.Peer reviewe

    Validation of the RIM Score-COVID in the Spanish SEMI-COVID-19 Registry

    Full text link

    Does admission acetylsalicylic acid uptake in hospitalized COVID-19 patients have a protective role? Data from the Spanish SEMI-COVID-19 Registry

    Full text link
    corecore