868 research outputs found

    Integrative Bioinformatics of Functional and Genomic Profiles for Cancer Systems Medicine

    Get PDF
    Cancer is a leading cause of death worldwide and a major public health burden. The rapid advancements in high-throughput techniques have now made it possible to molecularly characterize large number of patient tumors, and large-scale genomic and functional profiles are routinely being generated. Such datasets hold immense potential to reveal novel genes driving cancer, biomarkers with prognostic value, and also identify promising targets for drug treatment. But the ‘big data’ nature of these highly complex datasets require concurrent development of computational models and data analysis strategies to be able to mine useful knowledge and unlock the potential of the information content that is latent in such datasets. This thesis presents computational and analytical approaches to extract potentially useful information by integrating genomic and functional profiles of cancer cells.Syöpä on maailmanlaajuisesti johtava kuolinsyy sekä suuri kansanterveystaakka. Edistyneen teknologian ansiosta voimme nykyään tutkia syöpäsoluja molekyylitasolla sekä tuottaa valtavia määriä tietoa. Tällaisissa tietomäärissä piilee suuria mahdollisuuksia uusien syöpää aiheuttavien geenien löytämiseen ja lupaavien syöpähoitokohteiden tunnistamiseen. Näiden erittäin monimutkaisten tietomäärien ”Big data” -luonne vaatii kuitenkin myös laskennallisten mallien kehittämistä ja strategioita tiedon analysointiin, jotta voidaan löytää käyttökelpoista tietoa, joka voisi olla hyödyllistä terveydenhoidossa. Tämä väitöskirja esittelee laskennallisia ja analyyttisiä tapoja löytää mahdollisesti hyödyllistä tietoa yhdistämällä erilaisia syöpäsolujen molekulaarisia malleja, kuten niiden genomisia ja toiminnallisia profiileja

    BIASES AND BLIND-SPOTS IN GENOME-WIDE CRISPR-CAS9 KNOCKOUT SCREENS

    Get PDF
    Adaptation of the bacterial CRISPR-Cas9 system to mammalian cells revolutionized the field of functional genomics, enabling genome-scale genetic perturbations to study essential genes, whose loss of function results in a severe fitness defect. There are two types of essential genes in a cell. Core essential genes are absolutely required for growth and proliferation in every cell type. On the other hand, context-dependent essential genes become essential in an environmental or genetic context. The concept of context-dependent gene essentiality is particularly important in cancer, since killing cancer cells selectively without harming surrounding healthy tissue remains a major challenge. The toxicity of traditional cancer treatment protocols to the normal cells stresses the need for new strategies that can identify and address the weaknesses specific to cancer cells. Studies showed that CRISPR monogenic knockout screens can identify specific processes that cells rely on for growth and proliferation, which is a crucial step in identifying candidate cancer-specific therapeutic targets. While it is widely accepted that CRISPR screening is both more specific and more sensitive than previously established methods, the limitations of this technology have not been systematically investigated. In this dissertation, through several lines of integrated analysis of CRISPR screen data in cancer cell lines from the Cancer Dependency Map initiative, I will describe several computational approaches to demonstrate that CRISPR screens are not saturating. In fact, a typical screen has a ~20% false-negative rate, saturating coverage requires multiple repeats and false negatives are more prevalent among moderately expressed genes. I will then introduce a solution to the false negative problem and describe another method that provides a cleaner analysis of the data, rescuing the false negatives observed in these screens. Moreover, I will show that half of all constitutively expressed genes are never observed as essential in any CRISPR screen. Notably, these never-essentials are highly enriched for paralogs, suggesting that functional redundancy masks the detection of a substantial number of genes. Finally, I will describe our efforts to investigate functional buffering among approximately 400 candidate paralog pairs using CRISPR/enCas12a dual-gene knockout screening technology and discuss the paralog synthetic lethal interactions that we have identified, which have escaped detection in monogenic CRISPR-Cas9 knockout screens. Collectively, these observations reveal significant biases and blind-spots in the analysis of CRISPR-based functional genomics approaches and offer new opportunities for the discovery of novel candidate drug targets

    Molecular Portraits of Cancer Evolution and Ecology

    Get PDF
    Research on the molecular lesions that drive cancers holds the translational promise of unmasking distinct disease subtypes in otherwise pathologically identical patients. Yet clinical adoption is hindered by the reproducibility crisis for cancer biomarkers. In this thesis, a novel metric uncovered transcriptional diversity within individual non-small cell lung cancers, driven by chromosomal instability. Existing prognostic biomarkers were confounded by tumour sampling bias, arising from this diversity, in ~50% of patients assessed. An atlas of consistently expressed genes was derived to address this diagnostic challenge, yielding a clonal biomarker robust to sampling bias. This diagnostic based on cancer evolutionary principles maintained prognostic value in a metaanalysis of >900 patients, and over known risk factors in stage I disease, motivating further development as a clinical assay. Next, in situ RNA profiles of immune, fibroblast and endothelial cell subsets were generated from cancerous and adjacent non-malignant lung tissue. The phenotypic adaptation of stromal cells in the tumour microenvironment undermined the performance of existing molecular signatures for cell-type enumeration. Transcriptome-wide analysis delineated ~10% of genes displaying cell-type-specific expression, paving the way for high-fidelity signatures for the accurate digital dissection of tumour ecology. Lastly, the impact of branching, Darwinian evolution on the detection of epistatic interactions was evaluated in a pan-cancer analysis. The clonal status of driver genes was associated with the proportion of significant epistatic findings in 44-78% of the cancer-types assessed. Integrating the clonal architecture of tumours in future analyses could help decipher evolutionary dependencies. This work provides pragmatic solutions for refining molecular portraits of cancer in the light of their evolutionary and ecological features, moving the needle for precision cancer diagnostics

    Design of new algorithms for gene network reconstruction applied to in silico modeling of biomedical data

    Get PDF
    Programa de Doctorado en Biotecnología, Ingeniería y Tecnología QuímicaLínea de Investigación: Ingeniería, Ciencia de Datos y BioinformáticaClave Programa: DBICódigo Línea: 111The root causes of disease are still poorly understood. The success of current therapies is limited because persistent diseases are frequently treated based on their symptoms rather than the underlying cause of the disease. Therefore, biomedical research is experiencing a technology-driven shift to data-driven holistic approaches to better characterize the molecular mechanisms causing disease. Using omics data as an input, emerging disciplines like network biology attempt to model the relationships between biomolecules. To this effect, gene co- expression networks arise as a promising tool for deciphering the relationships between genes in large transcriptomic datasets. However, because of their low specificity and high false positive rate, they demonstrate a limited capacity to retrieve the disrupted mechanisms that lead to disease onset, progression, and maintenance. Within the context of statistical modeling, we dove deeper into the reconstruction of gene co-expression networks with the specific goal of discovering disease-specific features directly from expression data. Using ensemble techniques, which combine the results of various metrics, we were able to more precisely capture biologically significant relationships between genes. We were able to find de novo potential disease-specific features with the help of prior biological knowledge and the development of new network inference techniques. Through our different approaches, we analyzed large gene sets across multiple samples and used gene expression as a surrogate marker for the inherent biological processes, reconstructing robust gene co-expression networks that are simple to explore. By mining disease-specific gene co-expression networks we come up with a useful framework for identifying new omics-phenotype associations from conditional expression datasets.In this sense, understanding diseases from the perspective of biological network perturbations will improve personalized medicine, impacting rational biomarker discovery, patient stratification and drug design, and ultimately leading to more targeted therapies.Universidad Pablo de Olavide de Sevilla. Departamento de Deporte e Informátic

    Improving biomarker list stability by integration of biological knowledge in the learning process

    Get PDF
    BACKGROUND: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for biomarker discovery using microarray data often provide results with limited overlap. It has been suggested that one reason for these inconsistencies may be that in complex diseases, such as cancer, multiple genes belonging to one or more physiological pathways are associated with the outcomes. Thus, a possible approach to improve list stability is to integrate biological information from genomic databases in the learning process; however, a comprehensive assessment based on different types of biological information is still lacking in the literature. In this work we have compared the effect of using different biological information in the learning process like functional annotations, protein-protein interactions and expression correlation among genes. RESULTS: Biological knowledge has been codified by means of gene similarity matrices and expression data linearly transformed in such a way that the more similar two features are, the more closely they are mapped. Two semantic similarity matrices, based on Biological Process and Molecular Function Gene Ontology annotation, and geodesic distance applied on protein-protein interaction networks, are the best performers in improving list stability maintaining almost equal prediction accuracy. CONCLUSIONS: The performed analysis supports the idea that when some features are strongly correlated to each other, for example because are close in the protein-protein interaction network, then they might have similar importance and are equally relevant for the task at hand. Obtained results can be a starting point for additional experiments on combining similarity matrices in order to obtain even more stable lists of biomarkers. The implementation of the classification algorithm is available at the link: http://www.math.unipd.it/~dasan/biomarkers.html

    Gene Expression Analysis Methods on Microarray Data a A Review

    Get PDF
    In recent years a new type of experiments are changing the way that biologists and other specialists analyze many problems. These are called high throughput experiments and the main difference with those that were performed some years ago is mainly in the quantity of the data obtained from them. Thanks to the technology known generically as microarrays, it is possible to study nowadays in a single experiment the behavior of all the genes of an organism under different conditions. The data generated by these experiments may consist from thousands to millions of variables and they pose many challenges to the scientists who have to analyze them. Many of these are of statistical nature and will be the center of this review. There are many types of microarrays which have been developed to answer different biological questions and some of them will be explained later. For the sake of simplicity we start with the most well known ones: expression microarrays

    Identification of Potential Drug Targets in Cancer Signaling Pathways Using Stochastic Logical Models

    Get PDF
    The investigation of vulnerable components in a signaling pathway can contribute to development of drug therapy addressing aberrations in that pathway. Here, an original signaling pathway is derived from the published literature on breast cancer models. New stochastic logical models are then developed to analyze the vulnerability of the components in multiple signalling sub-pathways involved in this signaling cascade. The computational results are consistent with the experimental results, where the selected proteins were silenced using specific siRNAs and the viability of the cells were analyzed 72 hours after silencing. The genes elF4E and NFkB are found to have nearly no effect on the relative cell viability and the genes JAK2, Stat3, S6K, JUN, FOS, Myc, and Mcl1 are effective candidates to influence the relative cell growth. The vulnerabilities of some targets such as Myc and S6K are found to vary significantly depending on the weights of the sub-pathways; this will be indicative of the chosen target to require customization for therapy. When these targets are utilized, the response of breast cancers from different patients will be highly variable because of the known heterogeneities in signaling pathways among the patients. The targets whose vulnerabilities are invariably high might be more universally acceptable targets

    A Knowledge-based Integrative Modeling Approach for <em>In-Silico</em> Identification of Mechanistic Targets in Neurodegeneration with Focus on Alzheimer’s Disease

    Get PDF
    Dementia is the progressive decline in cognitive function due to damage or disease in the body beyond what might be expected from normal aging. Based on neuropathological and clinical criteria, dementia includes a spectrum of diseases, namely Alzheimer's dementia, Parkinson's dementia, Lewy Body disease, Alzheimer's dementia with Parkinson's, Pick's disease, Semantic dementia, and large and small vessel disease. It is thought that these disorders result from a combination of genetic and environmental risk factors. Despite accumulating knowledge that has been gained about pathophysiological and clinical characteristics of the disease, no coherent and integrative picture of molecular mechanisms underlying neurodegeneration in Alzheimer’s disease is available. Existing drugs only offer symptomatic relief to the patients and lack any efficient disease-modifying effects. The present research proposes a knowledge-based rationale towards integrative modeling of disease mechanism for identifying potential candidate targets and biomarkers in Alzheimer’s disease. Integrative disease modeling is an emerging knowledge-based paradigm in translational research that exploits the power of computational methods to collect, store, integrate, model and interpret accumulated disease information across different biological scales from molecules to phenotypes. It prepares the ground for transitioning from ‘descriptive’ to “mechanistic” representation of disease processes. The proposed approach was used to introduce an integrative framework, which integrates, on one hand, extracted knowledge from the literature using semantically supported text-mining technologies and, on the other hand, primary experimental data such as gene/protein expression or imaging readouts. The aim of such a hybrid integrative modeling approach was not only to provide a consolidated systems view on the disease mechanism as a whole but also to increase specificity and sensitivity of the mechanistic model by providing disease-specific context. This approach was successfully used for correlating clinical manifestations of the disease to their corresponding molecular events and led to the identification and modeling of three important mechanistic components underlying Alzheimer’s dementia, namely the CNS, the immune system and the endocrine components. These models were validated using a novel in-silico validation method, namely biomarker-guided pathway analysis and a pathway-based target identification approach was introduced, which resulted in the identification of the MAPK signaling pathway as a potential candidate target at the crossroad of the triad components underlying disease mechanism in Alzheimer’s dementia

    Quantitative proteome landscape of the NCI-60 cancer cell lines

    Get PDF
    Here we describe a proteomic data resource for the NCI-60 cell lines generated by pressure cycling technology and SWATH mass spectrometry. We developed the DIA-expert software to curate and visualize the SWATH data, leading to reproducible detection of over 3,100 SwissProt proteotypic proteins and systematic quantification of pathway activities. Stoichiometric relationships of interacting proteins for DNA replication, repair, the chromatin remodeling NuRD complex, β-catenin, RNA metabolism, and prefoldins are more evident than that at the mRNA level. The data are available in CellMiner (discover.nci.nih.gov/cellminercdb and discover.nci.nih.gov/cellminer), allowing casual users to test hypotheses and perform integrative, cross-database analyses of multi-omic drug response correlations for over 20,000 drugs. We demonstrate the value of proteome data in predicting drug response for over 240 clinically relevant chemotherapeutic and targeted therapies. In summary, we present a novel proteome resource for the NCI-60, together with relevant software tools, and demonstrate the benefit of proteome analyses
    corecore