8,804 research outputs found

    Interpretable Categorization of Heterogeneous Time Series Data

    Get PDF
    Understanding heterogeneous multivariate time series data is important in many applications ranging from smart homes to aviation. Learning models of heterogeneous multivariate time series that are also human-interpretable is challenging and not adequately addressed by the existing literature. We propose grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs extend decision trees with a grammar framework. Logical expressions derived from a context-free grammar are used for branching in place of simple thresholds on attributes. The added expressivity enables support for a wide range of data types while retaining the interpretability of decision trees. In particular, when a grammar based on temporal logic is used, we show that GBDTs can be used for the interpretable classi cation of high-dimensional and heterogeneous time series data. Furthermore, we show how GBDTs can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply GBDTs to analyze the classic Australian Sign Language dataset as well as data on near mid-air collisions (NMACs). The NMAC data comes from aircraft simulations used in the development of the next-generation Airborne Collision Avoidance System (ACAS X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data Mining (SDM) 201

    Mutually exclusive expression of DLX2 and DLX5/6 is associated with the metastatic potential of the human breast cancer cell line MDA-MB-231

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The <it>DLX </it>gene family encodes for homeobox transcription factors involved in the control of morphogenesis and tissue homeostasis. Their expression can be regulated by Endothelin1 (ET1), a peptide associated with breast cancer invasive phenotype. Deregulation of <it>DLX </it>gene expression was found in human solid tumors and hematologic malignancies. In particular, <it>DLX4 </it>overexpression represents a possible prognostic marker in ovarian cancer. We have investigated the role of <it>DLX </it>genes in human breast cancer progression.</p> <p>Methods</p> <p>MDA-MB-231 human breast carcinoma cells were grown in vitro or injected in nude mice, either subcutaneously, to mimic primary tumor growth, or intravenously, to mimic metastatic spreading. Expression of <it>DLX2</it>, <it>DLX5 </it>and <it>DLX6 </it>was assessed in cultured cells, either treated or not with ET1, tumors and metastases by RT-PCR. <it>In situ </it>hybridization was used to confirm <it>DLX </it>gene expression in primary tumors and in lung and bone metastases. The expression of <it>DLX2 </it>and <it>DLX5 </it>was evaluated in 408 primary human breast cancers examining the GSE1456 and GSE3494 microarray datasets. Kaplan-Meier estimates for disease-free survival were calculated for the patients grouped on the basis of <it>DLX2</it>/<it>DLX5 </it>expression.</p> <p>Results</p> <p>Before injection, or after subcutaneous growth, MDA-MB-231 cells expressed <it>DLX2 </it>but neither <it>DLX5 </it>nor <it>DLX6</it>. Instead, in bone and lung metastases resulting from intravenous injection we detected expression of <it>DLX5/6 </it>but not of <it>DLX2</it>, suggesting that <it>DLX5/6 </it>are activated during metastasis formation, and that their expression is alternative to that of <it>DLX2</it>. The <it>in vitro </it>treatment of MDA-MB-231 cells with ET1, resulted in switch from <it>DLX2 </it>to <it>DLX5 </it>expression. By data mining in microarray datasets we found that expression of <it>DLX2 </it>occurred in 21.6% of patients, and was significantly correlated with prolonged disease-free survival and reduced incidence of relapse. Instead, <it>DLX5 </it>was expressed in a small subset of cases, 2.2% of total, displaying reduced disease-free survival and high incidence of relapse which was, however, non-significantly different from the other groups due to the small size of the <it>DLX+ </it>cohort. In all cases, we found mutually exclusive expression of <it>DLX2 </it>and <it>DLX5</it>.</p> <p>Conclusions</p> <p>Our studies indicate that <it>DLX </it>genes are involved in human breast cancer progression, and that <it>DLX2 </it>and <it>DLX5 </it>genes might serve as prognostic markers.</p

    NCR-PCOPGene: An Exploratory Tool for Analysis of Sample-Classes Effect on Gene-Expression Relationships

    Get PDF
    Background. Microarray technology is so expensive and powerful that it is essential to extract maximum value from microarray data. Our tools allow researchers to test and formulate from a hypothesis to entire models. Results. The objective of the NCRPCOPGene is to study the relationships among gene expressions under different conditions, to classify these conditions, and to study their effect on the different relationships. The web application makes it easier to define the sample classes, grouping the microarray experiments either by using (a) biological, statistical, or any other previous knowledge or (b) their effect on the expression relationship maintained among specific genes of interest. By means of the type (a) class definition, the researcher can add biological information to the gene-expression relationships. The type (b) class definition allows for linking genes correlated neither linearly nor nonlinearly. Conclusions. The PCOPGene tools are especially suitable for microarrays with large sample series. This application helps to identify cellular states and the genes involved in it in a flexible way. The application takes advantage of the ability of our system to relate gene expressions; even when these relationships are noncontinuous and cannot be found using linear or nonlinear analytical methods

    Identifying Driver Genomic Alterations in Cancers by Searching Minimum-Weight, Mutually Exclusive Sets

    Get PDF
    An important goal of cancer genomic research is to identify the driving pathways underlying disease mechanisms and the heterogeneity of cancers. It is well known that somatic genome alterations (SGAs) affecting the genes that encode the proteins within a common signaling pathway exhibit mutual exclusivity, in which these SGAs usually do not co-occur in a tumor. With some success, this characteristic has been utilized as an objective function to guide the search for driver mutations within a pathway. However, mutual exclusivity alone is not sufficient to indicate that genes affected by such SGAs are in common pathways. Here, we propose a novel, signal-oriented framework for identifying driver SGAs. First, we identify the perturbed cellular signals by mining the gene expression data. Next, we search for a set of SGA events that carries strong information with respect to such perturbed signals while exhibiting mutual exclusivity. Finally, we design and implement an efficient exact algorithm to solve an NP-hard problem encountered in our approach. We apply this framework to the ovarian and glioblastoma tumor data available at the TCGA database, and perform systematic evaluations. Our results indicate that the signal-oriented approach enhances the ability to find informative sets of driver SGAs that likely constitute signaling pathways

    An interdisciplinary approach to brand association research

    Get PDF
    Purpose. This paper discusses the current role of qualitative research in the analysis of the relations between brands and consumers in new market spaces, with particular reference to how it can be enhanced with quantitative techniques to study interactions in online communities. Design/methodology/approach. The paper reviews key scientific contributions in the area of qualitative marketing research. Drawing from this theoretical background, the authors then propose the integration of digital ethnography (a qualitative approach) with quantitative text mining as an innovative approach to gain insights into perceptions of brand associations among online consumers. Findings. The paper contributes to a greater awareness of both limitations and new perspectives in relation to qualitative market research, while suggesting innovative paths for future research. Practical implications. The new methodological approach described can be used to better understand brand knowledge based on consumer brand associations. These insights can then be applied towards developing and implementing effective branding strategies. Originality/Value. The authors propose an interdisciplinary methodology to study consumer behaviour in online communities which incorporates digital ethnography and computer-assisted textual analysis. Particularly the latter technique (borrowed from the field of linguistics) has not yet been exploited extensively in marketing research, but is capable of offering new types of knowledge with important implications for strategic brand management

    Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study

    Get PDF
    BACKGROUND: A method to evaluate and analyze the massive data generated by series of microarray experiments is of utmost importance to reveal the hidden patterns of gene expression. Because of the complexity and the high dimensionality of microarray gene expression profiles, the dimensional reduction of raw expression data and the feature selections necessary for, for example, classification of disease samples remains a challenge. To solve the problem we propose a two-level analysis. First self-organizing map (SOM) is used. SOM is a vector quantization method that simplifies and reduces the dimensionality of original measurements and visualizes individual tumor sample in a SOM component plane. Next, hierarchical clustering and K-means clustering is used to identify patterns of gene expression useful for classification of samples. RESULTS: We tested the two-level analysis on public data from diffuse large B-cell lymphomas. The analysis easily distinguished major gene expression patterns without the need for supervision: a germinal center-related, a proliferation, an inflammatory and a plasma cell differentiation-related gene expression pattern. The first three patterns matched the patterns described in the original publication using supervised clustering analysis, whereas the fourth one was novel. CONCLUSIONS: Our study shows that by using SOM as an intermediate step to analyze genome-wide gene expression data, the gene expression patterns can more easily be revealed. The "expression display" by the SOM component plane summarises the complicated data in a way that allows the clinician to evaluate the classification options rather than giving a fixed diagnosis
    corecore