6 research outputs found

    Data-driven information retrieval in heterogeneous collections of transcriptomics data links SIM2s to malignant pleural mesothelioma

    Get PDF
    Motivation: Genome-wide measurement of transcript levels is an ubiquitous tool in biomedical research. As experimental data continues to be deposited in public databases, it is becoming important to develop search engines that enable the retrieval of relevant studies given a query study. While retrieval systems based on meta-data already exist, data-driven approaches that retrieve studies based on similarities in the expression data itself have a greater potential of uncovering novel biological insights

    Graphical models for biclustering and information retrieval in gene expression data

    Get PDF
    The cell coordinates its biological response to the environment partly via the selective synthesis of thousands of unique RNA and protein molecules. Understanding the molecular biology of the cell is thus essential to the advancement of areas such as health care, agriculture, and energy production, but requires the ability to simultaneously acquire information about thousands of molecules in a sample. Recent high-throughput measurement technologies address this concern. While being useful, they generate a high volume of data and bring in methodological challenges, effectively shifting the bottleneck in molecular biology research from data acquisition to data analysis. In particular, an important challenge is the genome-wide analysis of how RNA is transcribed under different conditions, organisms, and tissues, a process known as gene expression. When developing computational methods for biological data analysis tasks, probabilistic frameworks constitute promising approaches due to their flexibility, soundness, and ability to handle noisy data. In this thesis, the contributions are in the development of probabilistic methods for two relevant tasks in genome-wide gene expression analysis, namely biclustering and information retrieval. Biclustering concerns the simultaneous grouping of objects, e.g., genes, and conditions. The first contribution is the development of a Bayesian extension to an existing biclustering model. The second contribution is a novel probabilistic method that allows deriving a hierarchical organization of microarrays in a gene expression data set and at the same time indicate the genes that characterize the hierarchy. Finally, the third contribution is a general probabilistic biclustering framework that easily lends itself to different data types and model assumptions. Information retrieval in gene expression data is needed because of the increasing amount of available data stored in public databases. Two probabilistic methods for information retrieval are proposed. The models are used in a series of biological case studies that show how the proposed approaches have the potential to accelerate biological research by jointly analyzing data from different studies. In particular, several connections between biological conditions found by the models either correspond to existing biological knowledge or were used in a confirmatory follow-up study to obtain novel biological findings

    Retrieval of Gene Expression Measurements with Probabilistic Models

    Get PDF
    A crucial problem in current biological and medical research is how to utilize the diverse set of existing biological knowledge and heterogeneous measurement data in order to gain insights on new data. As datasets continue to be deposited in public repositories it is becoming important to develop search engines that can efficiently integrate existing data and search for relevant earlier studies given a new study. The search task is encountered in several biological applications including cancer genomics, pharmacokinetics, personalized medicine and meta-analysis of functional genomics.  Most existing search engines rely on classical keyword or annotation based retrieval which is limited to discovering known information and requires careful downstream annotation of the data. Data-driven model-based methods, that retrieve studies based on similarities in the actual measurement data, have a greater potential for uncovering novel biological insights. In particular, probabilistic modeling provides promising model-based tools due to its ability to encode prior knowledge, represent uncertainty in model parameters and handle noise associated to the data. By introducing latent variables it is further possible to capture relationships in data features in the form of meaningful biological components underlying the data.  This thesis adapts existing and develops new probabilistic models for retrieval of relevant measurement data in three different cases of background repositories. The first case is a background collection of data samples where each sample is represented by a single data type. The second case is a collection of multimodal data samples where each sample is represented by more than one data type. The third case is a background collection of datasets where each dataset, in turn, is a collection of multiple samples. In all three setups the proposed models are evaluated quantitatively and with case studies the models are demonstrated to facilitate interpretable retrieval of relevant data, rigorous integration of diverse information sources and learning of latent components from partly related dataset collections

    Regulation of Mammary cell Differentiation and Metabolism by Singleminded-2s

    Get PDF
    Ductal carcinoma in situ (DCIS) has been shown to be a precursor to invasive ductal cancer (IDC). Though the progression of DCIS to IDC is believed to be an important aspect of tumor aggressiveness, prognosis and molecular markers that predict progression are poorly understood. Therefore, determining the mechanisms by which some DCIS progress is critical for future breast cancer diagnostics and treatment. Singleminded-2s (SIM2s) is a member of the bHLH/PAS family of transcription factors and a key regulator of differentiation. SIM2s is highly expressed in mammary epithelial cells and lost in breast cancer. Loss of Sim2s causes aberrant mouse mammary development with features suggestive of malignant transformation, whereas over-expression of Sim2s promotes precocious alveolar differentiation, suggesting that Sim2s is required for establishing and enhancing mammary gland differentiation. We hypothesize that SIM2s expression must be lost in premalignant lesions for breast cancer to develop. We first analyzed Sim2s in the involuting mammary gland, which is a highly tumorpromoting environment. Sim2s is down-regulated during involution, and forced expression delays involution. We then analyzed SIM2s expression in human breast cancer samples and found that SIM2s is lost with progression from DCIS to IDC, and this loss correlates with metastasis. SIM2s expression in DCIS promoted a differentiated phenotype and suppressed genes associated with de-differentiation. Furthermore, loss of SIM2s expression in DCIS xenografts increased metastasis likely due to an increase in hedgehog signaling and matrix metalloproteinase expression. Interestingly, we found metabolic shifts with gain and loss of SIM2s in not only DCIS cells, but also MCF7 and SUM159 cells. SIM2s expression decreased aerobic glycolysis and promoted oxidative phosphorylation through direct upregulation of CDKN1a and senescence. Loss of SIM2s, conversely, promotes mitochondrial dysfunction and induction of the Warburg effect. This is the first time CDKN1a and cellular senescence have been indicated as causative to metabolic shifts within cancer cells. These studies show a new role for SIM2s in metabolic homeostasis, and this regulation is lost during tumorigenesis. These data indicate SIM2s is at the apex where aging, metabolism, and disease meet – regulating the delicate relationship between the three

    Antioxidant and DPPH-Scavenging Activities of Compounds and Ethanolic Extract of the Leaf and Twigs of Caesalpinia bonduc L. Roxb.

    Get PDF
    Antioxidant effects of ethanolic extract of Caesalpinia bonduc and its isolated bioactive compounds were evaluated in vitro. The compounds included two new cassanediterpenes, 1α,7α-diacetoxy-5α,6β-dihydroxyl-cass-14(15)-epoxy-16,12-olide (1)and 12α-ethoxyl-1α,14β-diacetoxy-2α,5α-dihydroxyl cass-13(15)-en-16,12-olide(2); and others, bonducellin (3), 7,4’-dihydroxy-3,11-dehydrohomoisoflavanone (4), daucosterol (5), luteolin (6), quercetin-3-methyl ether (7) and kaempferol-3-O-α-L-rhamnopyranosyl-(1Ç2)-β-D-xylopyranoside (8). The antioxidant properties of the extract and compounds were assessed by the measurement of the total phenolic content, ascorbic acid content, total antioxidant capacity and 1-1-diphenyl-2-picryl hydrazyl (DPPH) and hydrogen peroxide radicals scavenging activities.Compounds 3, 6, 7 and ethanolic extract had DPPH scavenging activities with IC50 values of 186, 75, 17 and 102 μg/ml respectively when compared to vitamin C with 15 μg/ml. On the other hand, no significant results were obtained for hydrogen peroxide radical. In addition, compound 7 has the highest phenolic content of 0.81±0.01 mg/ml of gallic acid equivalent while compound 8 showed the highest total antioxidant capacity with 254.31±3.54 and 199.82±2.78 μg/ml gallic and ascorbic acid equivalent respectively. Compound 4 and ethanolic extract showed a high ascorbic acid content of 2.26±0.01 and 6.78±0.03 mg/ml respectively.The results obtained showed the antioxidant activity of the ethanolic extract of C. bonduc and deduced that this activity was mediated by its isolated bioactive compounds
    corecore