2,728 research outputs found

    Categorical Dimensions of Human Odor Descriptor Space Revealed by Non-Negative Matrix Factorization

    Get PDF
    In contrast to most other sensory modalities, the basic perceptual dimensions of olfaction remain unclear. Here, we use non-negative matrix factorization (NMF) – a dimensionality reduction technique – to uncover structure in a panel of odor profiles, with each odor defined as a point in multi-dimensional descriptor space. The properties of NMF are favorable for the analysis of such lexical and perceptual data, and lead to a high-dimensional account of odor space. We further provide evidence that odor dimensions apply categorically. That is, odor space is not occupied homogenously, but rather in a discrete and intrinsically clustered manner. We discuss the potential implications of these results for the neural coding of odors, as well as for developing classifiers on larger datasets that may be useful for predicting perceptual qualities from chemical structures

    Inferring Unusual Crowd Events From Mobile Phone Call Detail Records

    Full text link
    The pervasiveness and availability of mobile phone data offer the opportunity of discovering usable knowledge about crowd behaviors in urban environments. Cities can leverage such knowledge in order to provide better services (e.g., public transport planning, optimized resource allocation) and safer cities. Call Detail Record (CDR) data represents a practical data source to detect and monitor unusual events considering the high level of mobile phone penetration, compared with GPS equipped and open devices. In this paper, we provide a methodology that is able to detect unusual events from CDR data that typically has low accuracy in terms of space and time resolution. Moreover, we introduce a concept of unusual event that involves a large amount of people who expose an unusual mobility behavior. Our careful consideration of the issues that come from coarse-grained CDR data ultimately leads to a completely general framework that can detect unusual crowd events from CDR data effectively and efficiently. Through extensive experiments on real-world CDR data for a large city in Africa, we demonstrate that our method can detect unusual events with 16% higher recall and over 10 times higher precision, compared to state-of-the-art methods. We implement a visual analytics prototype system to help end users analyze detected unusual crowd events to best suit different application scenarios. To the best of our knowledge, this is the first work on the detection of unusual events from CDR data with considerations of its temporal and spatial sparseness and distinction between user unusual activities and daily routines.Comment: 18 pages, 6 figure

    Ultrametric embedding: application to data fingerprinting and to fast data clustering

    Get PDF
    We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or locally present in data. We show how the ultrametricity can be assessed in text or document collections, and in time series signals. An aspect of importance here is that to draw benefit from this perspective the data may need to be recoded. Such data recoding can also be powerful in proximity searching, as we will show, where the data is embedded globally and not locally in an ultrametric space.Comment: 14 pages, 1 figure. New content and modified title compared to the 19 May 2006 versio

    Biclustering of gene expression data by non-smooth non-negative matrix factorization

    Get PDF
    BACKGROUND: The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states. RESULTS: In this work we present a methodology able to cluster genes and conditions highly related in sub-portions of the data. Our approach is based on a new data mining technique, Non-smooth Non-Negative Matrix Factorization (nsNMF), able to identify localized patterns in large datasets. We assessed the potential of this methodology analyzing several synthetic datasets as well as two large and heterogeneous sets of gene expression profiles. In all cases the method was able to identify localized features related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The uncovered structures showed a clear biological meaning in terms of relationships among functional annotations of genes and the phenotypes or physiological states of the associated conditions. CONCLUSION: The proposed approach can be a useful tool to analyze large and heterogeneous gene expression datasets. The method is able to identify complex relationships among genes and conditions that are difficult to identify by standard clustering algorithms

    Nonlinear mixture-wise expansion approach to underdetermined blind separation of nonnegative dependent sources

    Get PDF
    Underdetermined blind separation of nonnegative dependent sources consists in decomposing set of observed mixed signals into greater number of original nonnegative and dependent component (source) signals. That is an important problem for which very few algorithms exist. It is also practically relevant for contemporary metabolic profiling of biological samples, such as biomarker identification studies, where sources (a.k.a. pure components or analytes) are aimed to be extracted from mass spectra of complex multicomponent mixtures. This paper presents method for underdetermined blind separation of nonnegative dependent sources. The method performs nonlinear mixture-wise mapping of observed data in high-dimensional reproducible kernel Hilbert space (RKHS) of functions and sparseness constrained nonnegative matrix factorization (NMF) therein. Thus, original problem is converted into new one with increased number of mixtures, increased number of dependent sources and higher-order (error) terms generated by nonlinear mapping. Provided that amplitudes of original components are sparsely distributed, that is the case for mass spectra of analytes, sparseness constrained NMF in RKHS yields, with significant probability, improved accuracy relative to the case when the same NMF algorithm is performed on original problem. The method is exemplified on numerical and experimental examples related respectively to extraction of ten dependent components from five mixtures and to extraction of ten dependent analytes from mass spectra of two to five mixtures. Thereby, analytes mimic complexity of components expected to be found in biological samples
    • …
    corecore