164 research outputs found

    Transcriptional dissection of pancreatic tumors engrafted in mice.

    Get PDF
    BACKGROUND: Engraftment of primary pancreas ductal adenocarcinomas (PDAC) in mice to generate patient-derived xenograft (PDX) models is a promising platform for biological and therapeutic studies in this disease. However, these models are still incompletely characterized. Here, we measured the impact of the murine tumor environment on the gene expression of the engrafted human tumoral cells. METHODS: We have analyzed gene expression profiles from 35 new PDX models and compared them with previously published microarray data of 18 PDX models, 53 primary tumors and 41 cell lines from PDAC. The results obtained in the PDAC system were further compared with public available microarray data from 42 PDX models, 108 primary tumors and 32 cell lines from hepatocellular carcinoma (HCC). We developed a robust analysis protocol to explore the gene expression space. In addition, we completed the analysis with a functional characterization of PDX models, including if changes were caused by murine environment or by serial passing. RESULTS: Our results showed that PDX models derived from PDAC, or HCC, were clearly different to the cell lines derived from the same cancer tissues. Indeed, PDAC- and HCC-derived cell lines are indistinguishable from each other based on their gene expression profiles. In contrast, the transcriptomes of PDAC and HCC PDX models can be separated into two different groups that share some partial similarity with their corresponding original primary tumors. Our results point to the lack of human stromal involvement in PDXs as a major factor contributing to their differences from the original primary tumors. The main functional differences between pancreatic PDX models and human PDAC are the lower expression of genes involved in pathways related to extracellular matrix and hemostasis and the up- regulation of cell cycle genes. Importantly, most of these differences are detected in the first passages after the tumor engraftment. CONCLUSIONS: Our results suggest that PDX models of PDAC and HCC retain, to some extent, a gene expression memory of the original primary tumors, while this pattern is not detected in conventional cancer cell lines. Expression changes in PDXs are mainly related to pathways reflecting the lack of human infiltrating cells and the adaptation to a new environment. We also provide evidence of the stability of gene expression patterns over subsequent passages, indicating early phases of the adaptation process

    Mitmemõõtmeliste andmete statistiline analüüs bioinformaatikas

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsioone.Valgud on organismide ühed tähtsaimad ehituskivid. Nende kogust ja omavahelisi seoseid uurides on võimalik saada infot organismi seisundi kohta. Tänapäevased seadmed võimaldavad koguda lühikese ajaga palju valkudega seotud andmeid. Nende analüüs on aga suhteliselt keerukas ja on loonud uue teadusharu nimega bioinformaatika. Käesoleva doktoritöö eesmärgiks on kirjeldada mitmemõõtmeliste andmete statistilise analüüsiga seotud probleeme ja nende lahendusi. Näidatakse, kuidas sellised andmed saab esitada maatriksi kujul. Antakse ülevaade andmeallikatest ja analüüsimeetoditest ning näidatakse, kuidas neid saab praktikas kasutada. Kirjeldatakse üleeuroopalist vähiuuringute projekti PREDECT, kus paljud organisatsioonid osalevad vähimudelite täiustamises. Antakse ülevaade metaandmete kogumisest paljudelt partneritelt, samuti veebitööriistadest, mis loodi esmaseks andmeanalüüsiks. Kirjeldatakse uudse rinnavähi mudeliga seotud analüüsi ja koelõikude võrdlust erinevates laboritingimustes. Tutvustatakse vabalt kasutatavat veebitööriista, millega saab teha kirjeldavat andmeanalüüsi. Järgmistes peatükkides kirjeldatakse andmeanalüüsi erinevates uuringutes. Inimese platsentas leiti mitmeid uusi alleelispetsiifilise ekspressiooniga geene. Uuriti atoopilise dermatiidi molekulaarseid mehhanisme, täpsemalt valgu gamma-interferoon mõju sellele haigusele. Leiti mikroRNAsid, mida saab kasutada endometrioosi markeritena, ja loodi klassifitseerija endometrioosihaigete eristamiseks tervetest.Proteins are one of the most important building blocks of an organism. By investigating the abundance and relations between different proteins, it is possible to get information about the current state of the organism. Modern technologies allow to collect a large amount of data related to proteins in a short period of time. This type of analysis is quite complicated and has created a new field of science called bioinformatics. The aim of the dissertation is to describe problems and solutions related to statistical analysis of multivariate data. It is shown how this type of data can be presented as a matrix. An overview of data sources and analysis methods is given and it is shown how they can be used in practice. A pan-European project PREDECT is described where many organizations are contributing to develop better cancer models. An overview is given about collecting metadata from multiple partners, and about web tools created for initial data analysis. An analysis concerning a novel breast cancer model is described, and a comparison of tissue slices in different cultivation conditions is made. A freely available web tool is introduced which allows to perform exploratory data analysis. Next chapters describe data analysis in various projects. Multiple novel genes were found in the human placenta that have an allele-specific expression. Molecular mechanisms of a disease called atopic dermatitis were examined, more specifically the influence of the protein interferon-gamma. MicroRNAs were found that can be used as markers for a disease called endometriosis, and a classifier was built to differentiate people with endometriosis from healthy people

    Analyzing Acute Myeloid Leukemia by RNA-sequencing

    Get PDF
    Bulk and single cell RNA sequencing have revolutionized biomedical research and empower researchers to quantify the global gene expression of populations and single cells to further understand the development, manifestation and the treatment of diseases like cancer. Acute myeloid leukemia (AML), a cancer of the myeloid line of blood cells, could benefit from these technologies as relapse and mortality rates remain high despite the extensive research conducted over several decades. This is partly because AML is a heterogeneous disease, differing substantially between patients and hence requiring more fine-grained classifications and specialised treatment strategies, for example by incorporating expression profiles. In addition, single cell RNA sequencing (scRNA-seq) can resolve genetic and epigenetic subclonal structures within a patient to improve understanding and treatment of AML. However, improving and adapting RNA-seq technologies is still often necessary to efficiently and reliably obtain expression profiles, especially from small or suboptimally processed samples. To this end, we developed a bulk RNA-seq protocol, which copes with the major challenges of limited sample quantities, different sample types, throughput and costs and subsequently applied this method to further understand the subclonal structures in AML. We were able to characterize a plastic cell state of AML cells that is defined by increased stemness and dormancy and could influence treatment outcome and relapse. For this, we isolated non-dividing AML cells based on a proliferation-sensitive dye from patient derived xenograft (PDX) models of two AML patients. We found that these cells have low levels of cell cycle genes confirming dormancy, and additionally had similar expression patterns to previously described dormant minimal residual disease (MRD) cells in lymphoblastic leukemia (ALL). This included high expression levels of cell adhesion molecules, potentially reflecting the persistence of dormant AML and ALL cells in the hematopoietic niche. Lastly, we could show that resting and cycling AML cells can transition between these two states, indicating that dormancy might be a general property of AML cells and not depend on particular genetic subclones. In a second project, we optimized a single cell RNA-seq technology. We used a systematic approach to evaluate experimental conditions of SCRB-seq, a powerful and efficient scRNA-seq method. Focussing on reverse transcription, arguably the most important and inefficient reaction, , we used a standardized human RNA (UHRR) and systematically tested nine different RT enzymes, several reaction enhancers and primer compositions to increase sensitivity. We found that Maxima H- showed the highest sensitivity and that molecular crowding using poylethylene glycol (PEG) could increase the efficiency of the reaction significantly. Together with several smaller changes in the workflow, primer design and PCR conditions, we developed mcSCRB-seq (molecular crowding SCRB-seq). We verified the 2.5x increase in sensitivity using mES cells in a side by side test between SCRB-seq and mcSCRB-seq, and further found mcSCRB-seq to be amongst the most sensitive methods using artificial RNA spike in molecules (ERCCS). Lastly, since method comparisons between studies suffer from missing accuracy due to batch effects and external factors, we participated in a complex scRNA-seq benchmark study aiming to provide a fair comparison between methods concerning sensitivity, accuracy and applicability for building expression atlases. In contrast to before, we found that in this particular setting, mcSCRB-seq did not perform well and ídentified fields for further improvement. In conclusion, my work described in this thesis not only contributes towards a deeper understanding of the emergence and progression of AML but also towards the development of experimental bulk and single-cell RNA sequencing methods, improving their widespread application to biomedical problems such as leukemia

    Exploring the Intersection of Multi-Omics and Machine Learning in Cancer Research

    Get PDF
    Cancer biology and machine learning represent two seemingly disparate yet intrinsically linked fields of study. Cancer biology, with its complexities at the cellular and molecular levels, brings up a myriad of challenges. Of particular concern are the deviations in cell behaviour and rearrangements of genetic material that fuel transformation, growth, and spread of cancerous cells. Contemporary studies of cancer biology often utilise wide arrays of genomic data to pinpoint and exploit these abnormalities with an end-goal of translating them into functional therapies. Machine learning allows machines to make predictions based on the learnt data without explicit programming. It leverages patterns and inferences from large datasets, making it an invaluable tool in the modern era of large scale genomics. To this end, this doctoral thesis is underpinned by three themes: the application of machine learning, multi-omics, and cancer biology. It focuses on employment of machine learning algorithms to the tasks of cell annotation in single-cell RNA-seq datasets and drug response prediction in pre-clinical cancer models. In the first study, the author and colleagues developed a pipeline named Ikarus to differentiate between neoplastic and healthy cells within single-cell datasets, a task crucial for understanding the cellular landscape of tumours. Ikarus is designed to construct cancer cell-specific gene signatures from expert-annotated scRNA-seq datasets, score these genes, and distribute the scores to neighbouring cells via network propagation. This method successfully circumvents two common challenges in single-cell annotation: batch effects and unstable clustering. Furthermore, Ikarus utilises a multi-omic approach by incorporating CNVs inferred from scRNA-seq to enhance classification accuracy. The second study investigated how multi-omic analysis could enhance drug response prediction in pre-clinical cancer models. The research suggests that the typical practice of panel sequencing — a deep profiling of select, validated genomic features — is limited in its predictive power. However, incorporating transcriptomic features into the model significantly improves predictive ability across a variety of cancer models and is especially effective for drugs with collateral effects. This implies that the combined use of genomic and transcriptomic data has potential advantages in the pharmacogenomic arena. This dissertation recapitulates the findings of two aforementioned studies, which were published in Genome Biology and Cancers journals respectively. The two studies illustrate the application of machine learning techniques and multi-omic approaches to address conceptually distinct problems within the realm of cancer biology.Die Krebsbiologie und das maschinelle Lernen sind zwei scheinbar konträre, aber intrinsisch verbundene Forschungsbereiche. Insbesondere die Krebsbiologie ist auf zellul ̈arer und molekularer Ebene hoch komplex und stellt den Forschenden vor eine Vielzahl von Herausforderungen. Zu verstehen wie abweichendes Zellverhalten und die Umstrukturierung genetischer Komponente die Transformation, das Wachstum und die Ausbreitung von Krebszellen antreiben, ist hierbei eine besondere Herausforderung. Gleichzeitig bestrebt die Krebsbiologie diese Abnormalitäten zu nutzen zu machen, Wissen aus ihnen zu gewinnen und sie so in funktionale Therapien umzusetzen. Maschinelles Lernen ermöglicht es Vorhersagen auf der Grundlage von gelernten Daten ohne explizite Programmierung zu treffen. Es erkennt Muster in großen Datensätzen, erschließt sich so Erkenntnisse und ist deswegen ein unschätzbar wertvolles Werkzeug im modernen Zeitalter der Hochdurchsatz Genomforschung. Aus diesem Grund ist maschinelles Lernen eines der drei Haupthemen dieser Doktorarbeit, neben Multi-Omics und Krebsbiologie. Der Fokus liegt hierbei insbesondere auf dem Einsatz von maschinellen Lernalgorithmen zum Zweck der Zellannotation in Einzelzell RNA-Sequenzdatensätzen und der Vorhersage der Arzneimittelwirkung in präklinischen Krebsmodellen. In der ersten, hier präsentierten Studie, entwickelten der Autor und seine Kollegen eine Pipeline namens Ikarus. Diese kann zwischen neoplastischen und gesunden Zellen in Einzelzell-Datensätzen unterscheiden. Eine Aufgabe, die für das Verst ̈andnis der zellulären Landschaft von Tumoren entscheidend ist. Ikarus ist darauf ausgelegt, krebszellenspezifische Gensignaturen aus expertenanotierten scRNA-seq-Datensätzen zu konstruieren, diese Gene zu bewerten und die Bewertungen über Netzwerkverbreitung auf benachbarte Zellen zu verteilen. Diese Methode umgeht erfolgreich zwei häufige Herausforderungen bei der Einzelzellannotation: den Chargeneffekt und die instabile Clusterbildung. Darüber hinaus verwendet Ikarus, durch das Einbeziehen von scRNA-seq abgeleiteten CNVs, einen Multi-Omic-Ansatz der die Klassifikationsgenauigkeit verbessert. Die zweite Studie untersuchte, wie Multi-Omic-Analysen die Vorhersage der Arzneimittelwirkung in präklinischen Krebsmodellen optimieren können. Die Forschung legt nahe, dass die übliche Praxis des Panel Sequenzierens - die umfassende Profilierung ausgewählter, validierter genomischer Merkmale - in ihrer Vorhersagekraft begrenzt ist. Durch das Einbeziehen transkriptomischer Merkmale in das Modell konnte jedoch die Vorhersagefähigkeit bei verschiedenen Krebsmodellen signifikant verbessert werden, ins besondere für Arzneimittel mit Nebenwirkungen. Diese Dissertation fasst die Ergebnisse der beiden oben genannten Studien zusammen, die jeweils in Genome Biology und Cancers Journalen veröffentlicht wurden. Die beiden Studien veranschaulichen die Anwendung von maschinellem Lernen und Multi-Omic-Ansätzen zur Lösung konzeptionell unterschiedlicher Probleme im Bereich der Krebsbiologie

    Analyse und Interpretation der Varianz von Genexpressionsdaten

    Get PDF
    Die vorliegende Dissertationsschrift fasst vier Arbeiten unter der Überschrift „Analyse und Interpretation der Varianz von Genexpressionsdaten“ zusammen. Zunächst wird der Begriff der „Technischen Varianz“ von dem der „Biologischen Varianz“ abgegrenzt. In der Genexpressionsanalyse mit Microarrays wird unter technischer Varianz der traditionell hohe Messfehler verstanden. Die Gründe hierfür scheinen jedoch mannigfaltig zu sein. Höchst umstritten ist hierbei der Effekt von Kreuzhybridisierungen, also unspezifischen Bindungen von RNA-Fragmenten an die Sonden des Arrays. Einige Forscher halten diesen Effekt für die maßgebliche Fehlerquelle, andere beurteilen ihn als vernachlässigbar. In den ersten zwei Arbeiten wird gezeigt, dass Kreuzhybridisierungen in der Tat erheblich für den Messfehler bei Microarray-Experimenten verantwortlich sind. Gleichzeitig werden, mit einem Satz neuer Chip Definition Files und einer Handreichung zum Design neuer Microarrays, Werkzeuge zum Umgang mit unspezifischen Bindungen zur Verfügung gestellt. Varianz, die auf tatsächlich vorhandenen biologischen Unterschieden basiert, wird biologische Varianz genannt. Bei der Auswertung eines Genexpressionsexperiments werden mittels Analyse der Streuungsparameter mögliche Markertranskripte identifiziert, die bei einer üblichen mittelwertbasierten Auswertung nicht gefunden werden. Durch Mapping der Transkripte auf KEGG-Pathways kann ausgeschlossen werden, dass es sich um falsch positive Treffer handelt. In der vierten Arbeit wird eine Ähnlichkeitsanalyse mit Hilfe von Korrelationskoeffizienten durchgeführt. Durch Auswertung mit der Korrelation nach Kendall können Hypothesen über den funktionalen Pathway in der induzierten Abwehr von Pflanzen gewonnen werden

    Integrative and Comparative Analysis of Retinoblastoma and Osteosarcoma

    Get PDF
    In the last one and a half decades, the generalization of high throughput methods in molecular biology has led to the generation of vast amounts of datasets that unraveled the unfathomed complexity of the cell regulatory mechanisms. The recently published results of the ENCODE project (ENCODE Project Consortium et al., 2012) demonstrated the extend of these in the human genome and certainly more regulation mechanisms will be discovered in the future. Already, this complexity within a single cell - without taking into account cell-cell interaction or micro-environment influences - cannot be abstracted by the human mind. However, understanding it is the key to devise adapted treatments to genetic diseases or disorders, among which is cancer. In mathematics, such complex problems are addressed using methods that reduce their complexity, so that they can be modeled in a solvable manner. In biology, it led researchers to develop the concept of systems biology as a mean to abstract the complexity of the cell regulatory network. To date, most of the published studies using high throughput technologies only focus on one kind of regulatory mechanism and hence cannot be used as such to investigate the interactions between these. Moreover, distinguishing causative from confounding factors within such studies is difficult. These were my original motivations to develop analytical and statistical methods that control for confounding factors effects and allow the integrative and comparative analysis of different kinds of datasets. In fine, three different tools were developed to achieve this goal. First, "customCDF": a tool to redefine the Custom Definition File (CDF) of Affymetrix GeneChips. It results in the increased sensitivity of downstream analyses as these bene fit from the constantly evolving human genome reference and annotations. Second, "aSim": a tool to simulate microarray data, which was required to benchmark the developed algorithms. Third, for the integrative analysis, a set of combined statistical methods and finally for the comparative analysis, a modification of the integrative analysis approach. These were bundled in the "crossChip" R package. The "customCDF" and "aSim" tools were first validated on independant datasets. The developed analytical methods ("crossChip") were first validated on "aSim" simulated data and publicly available datasets and then used to answer two biological questions. First, using two retinoblastoma datasets, the effect of genomic copy number variations on gene-expression was investigated. Then, motivated by the fact that retinoblastoma patients have a higher chance to develop osteosarcoma later in life than the average population, datasets of both these tumors were comparatively analyzed to assess these tumors similarities and differences. Despite a rather limited number of samples within the selected datasets, the developed approaches with their higher sensitivity and sensibility were successful and set the ground for larger scale analyses. Indeed, the integrative analysis applied to retinoblastoma revealed the high importance of the chromosome 6 gain at a later stage of the disease, indicating that many genes on that chromosome are beneficial to cancerogenesis. Moreover, in comparison to standard microarray analyses, it demonstrated its efficacy at detecting the interplay of regulatory mechanisms: examples of positive and negative compensation of gene expression in lost and gained regions, respectively, as well as examples of antisense transcription, pseudogene and snRNAs regulation were identified in this dataset. The comparative analysis on the other hand revealed the high similarity of the retinoblastoma and osteosarcoma tumors, while at the same time showing that either of them take advantage of their distinct micro-environment and consequently appear to make use of different signaling pathways, PKC/calmodulin in retinoblastoma and GPCR/RAS in osteosarcoma. The developed tools and statistical methods have demonstrated their validity and utility by giving sensible answers to the two biological questions addressed. Moreover, they generated a large number of interesting hypotheses that need further investigations. And as they are not limited to microarray analysis but can be applied to analyze any high-throughput generated data, they demonstrated the usefulness of "systems biology" approaches to study cancerogenesis

    Investigating Synovial Sarcoma, X breakpoint proteins in ovarian cancer

    Get PDF
    Ovarian cancer (OC) affects around 7500 women in the UK every year, but despite this, there is no effective screening strategy or standard treatment. There is a strong correlation between OC prognosis and the stage of diagnosis. If diagnosed during stage I, OC has over a 90% 5-year survival rate, however vague symptoms often lead to late stage diagnosis and a correspondingly poor survival rate. CA-125 is the ‘gold standard’ clinically used serum biomarker for confirming and monitoring OC in patients but is not always present in early stage disease. Previous research has identified expression of the cancer testis antigen SSX2 in stage I and II OC at significantly higher levels than CA- 125. The primary objective of this study was to evaluate patient tumour samples for SSX2 expression alongside other SSX variants, SSX3 and SSX4, to expand investigation into SSX family members as biomarkers for OC. Levels of SSX variants were evaluated for correlations against OC subtype as well as disease stage to acknowledge the diverse histotypes of OC. SSX2 and SSX3 expression was shown to be increased in early stage OC when compared to normal adjacent tissue. Metastatic OC was also found to significantly express SSX2, suggesting that SSX3 may be a superior early stage biomarker candidate. Collectively this data revealed variations in the level of SSX expression across OC stages and histotypes. Following successful development of in vitro overexpression models of SSX family members, global transcriptomic profiling was used to identify novel transcripts and pathways altered by SSX2A, SSX2B and SSX3. Transcriptional profiling of each overexpression model highlighted that SSX variants modulate unique and common downstream targets and resulting functions. This work suggested that SSX2A, SSX2B and SSX3, alone or in combination, appear to contribute to cancer progression and clinical outcomes. Consequently, in vitro and in vivo studies were performed to assess the impact of each SSX family member on cell proliferation and motility in OC. All SSX variants in this study, SSX2A, SSX2B, SSX3 and SSX4 significantly promoted cell proliferation in an OC cell line. Changes were also seen in key epithelial to mesenchymal transition markers, including an increase in the expression of transcription factor SLUG. Further, novel tumour xenograft models developed in this study highlighted that SSX3 overexpression increased OC proliferation in vivo. Overall, these results highlight the importance of investigating SSX family members in OC, as biomarkers and promoters of cancer development

    The Role of Long Noncoding RNA SChLAP1 in Prostate Cancer

    Full text link
    Prostate cancer is the most common malignancy in U.S. men, accounting for nearly 30,000 deaths annually. While the majority of prostate cancers are indolent, a subset of patients has aggressive disease. However, the molecular basis for this clinical heterogeneity remains incompletely understood. Long noncoding RNAs (lncRNAs) are an emerging class of regulatory molecules implicated in a diverse range of human malignancies. Here, SChLAP1 is identified as a novel, highly prognostic lncRNA that is expressed in 15-30% of prostate cancers. Functionally, SChLAP1 coordinates cancer cell invasion in vitro and metastatic spread in vivo. Mechanistically, SChLAP1 interacts with and antagonizes the tumor-suppressive SWI/SNF nucleosome-remodeling complex. While deleterious SWI/SNF mutations occur in 20% of all cancers, they are relatively rare in prostate cancer. Within prostate cancer, SWI/SNF mutations are associated with low SChLAP1 expression, suggesting that high SChLAP1 expression may represent a mutation-independent modality of SWI/SNF inhibition. Employing a previously described antagonistic model between SWI/SNF and Polycomb Repressive Complex 2 (PRC2), SChLAP1 is found to enhance PRC2 function in prostate cancer. Additionally, SChLAP1-expressing cells are more sensitive to pharmacologic EZH2 inhibition. Further characterization of SChLAP1 reveals a 250bp region near the 3’-end that mediates its invasive phenotype and coordinates its interaction with SWI/SNF. Additionally, SChLAP1 interacts with BRG1-containing but not BRM-containing SWI/SNF complexes, and knockdown of BRM in SChLAP1-expressing cells exposes a synthetic lethal vulnerability in prostate cancer. Finally, the largest biomarker discovery project to date in prostate cancer identifies SChLAP1 as one of the best genes for predicting metastatic progression. Characterization of SChLAP1 expression by in situ hybridization shows that SChLAP1 expression is enriched in metastatic samples. Additionally, SChLAP1 can be detected in patient urine samples and may be useful as a non-invasive biomarker. Lastly, targeting SChLAP1 with antisense oligonucleotides (ASO) suggests that directly targeting SChLAP1 may be an effective therapeutic strategy in prostate cancer. Taken together, this work defines an essential role for SChLAP1 in aggressive prostate cancer, uncovers novel aspects of lncRNA biology, and has broad implications for cancer biology.PHDMolecular & Cellular Path PhDUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137075/1/asahu_1.pd
    corecore