Search CORE

10,899 research outputs found

Categorical Dimensions of Human Odor Descriptor Space Revealed by Non-Negative Matrix Factorization

Author: A Arzi
A Dravnieks
A Mamlouk
AA Koulakov
AG Khan
Andreas Schaefer
Arvind Ramanathan
Chakra S. Chennubhotla
CI Bargmann
DD Lee
G Hinton
G Laurent
H Lapid
J Niessing
JA Gottfried
Jason B. Castro
JE Amoore
JE Amoore
JP Brunet
L van der Maaten
M Berry
M Zarzo
M Zarzo
P Lennie
P Paatero
P Paatero
PM Kim
PM Wise
R Haddad
RB Lotto
RM Khan
SS Schiffman
SS Schiffman
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

In contrast to most other sensory modalities, the basic perceptual dimensions of olfaction remain unclear. Here, we use non-negative matrix factorization (NMF) – a dimensionality reduction technique – to uncover structure in a panel of odor profiles, with each odor defined as a point in multi-dimensional descriptor space. The properties of NMF are favorable for the analysis of such lexical and perceptual data, and lead to a high-dimensional account of odor space. We further provide evidence that odor dimensions apply categorically. That is, odor space is not occupied homogenously, but rather in a discrete and intrinsically clustered manner. We discuss the potential implications of these results for the neural coding of odors, as well as for developing classifiers on larger datasets that may be useful for predicting perceptual qualities from chemical structures

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

D-Scholarship@Pitt

FigShare

Taming Wild High Dimensional Text Data with a Fuzzy Lash

Author: Karami Amir
Publication venue
Publication date: 01/11/2017
Field of study

The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on a new basis to represent BOW. The recent increase of text data and its challenges imply that DR area still needs new perspectives. Although a wide range of methods based on the UFT strategy has been developed, the fuzzy approach has not been considered for DR based on this strategy. This research investigates the application of fuzzy clustering as a DR method based on the UFT strategy to collapse BOW matrix to provide a lower-dimensional representation of documents instead of the words in a corpus. The quantitative evaluation shows that fuzzy clustering produces superior performance and features to Principal Components Analysis (PCA) and Singular Value Decomposition (SVD), two popular DR methods based on the UFT strategy

arXiv.org e-Print Archive

Crossref

Scholar Commons - Institutional Repository of the University of South Carolina

Iterative Residual Rescaling: An Analysis and Generalization of LSI

Author: Ando Rie Kubota
Lee Lillian
Publication venue
Publication date: 01/01/2001
Field of study

We consider the problem of creating document representations in which inter-document similarity measurements correspond to semantic similarity. We first present a novel subspace-based framework for formalizing this task. Using this framework, we derive a new analysis of Latent Semantic Indexing (LSI), showing a precise relationship between its performance and the uniformity of the underlying distribution of documents over topics. This analysis helps explain the improvements gained by Ando's (2000) Iterative Residual Rescaling (IRR) algorithm: IRR can compensate for distributional non-uniformity. A further benefit of our framework is that it provides a well-motivated, effective method for automatically determining the rescaling factor IRR depends on, leading to further improvements. A series of experiments over various settings and with several evaluation metrics validates our claims.Comment: To appear in the proceedings of SIGIR 2001. 11 page

arXiv.org e-Print Archive

CiteSeerX

Enhancing Domain Word Embedding via Latent Semantic Imputation

Author: Lai Siwei
Lin Frank
Mikolov Tomas
van der Maaten Laurens
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/05/2019
Field of study

We present a novel method named Latent Semantic Imputation (LSI) to transfer external knowledge into semantic space for enhancing word embedding. The method integrates graph theory to extract the latent manifold structure of the entities in the affinity space and leverages non-negative least squares with standard simplex constraints and power iteration method to derive spectral embeddings. It provides an effective and efficient approach to combining entity representations defined in different Euclidean spaces. Specifically, our approach generates and imputes reliable embedding vectors for low-frequency words in the semantic space and benefits downstream language tasks that depend on word embedding. We conduct comprehensive experiments on a carefully designed classification problem and language modeling and demonstrate the superiority of the enhanced embedding via LSI over several well-known benchmark embeddings. We also confirm the consistency of the results under different parameter settings of our method.Comment: ACM SIGKDD 201

arXiv.org e-Print Archive

Crossref

Visualising the structure of document search results: A comparison of graph theoretic approaches

Author: Busing F.
Chen C.
Coxon A.
Leuski A.
Salton G.
Skupin A.
Timothy Cribbin
Van Rijsbergen C.J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/04/2009
Field of study

This is the post-print of the article - Copyright @ 2010 Sage PublicationsPrevious work has shown that distance-similarity visualisation or ‘spatialisation’ can provide a potentially useful context in which to browse the results of a query search, enabling the user to adopt a simple local foraging or ‘cluster growing’ strategy to navigate through the retrieved document set. However, faithfully mapping feature-space models to visual space can be problematic owing to their inherent high dimensionality and non-linearity. Conventional linear approaches to dimension reduction tend to fail at this kind of task, sacrificing local structural in order to preserve a globally optimal mapping. In this paper the clustering performance of a recently proposed algorithm called isometric feature mapping (Isomap), which deals with non-linearity by transforming dissimilarities into geodesic distances, is compared to that of non-metric multidimensional scaling (MDS). Various graph pruning methods, for geodesic distance estimation, are also compared. Results show that Isomap is significantly better at preserving local structural detail than MDS, suggesting it is better suited to cluster growing and other semantic navigation tasks. Moreover, it is shown that applying a minimum-cost graph pruning criterion can provide a parameter-free alternative to the traditional K-neighbour method, resulting in spatial clustering that is equivalent to or better than that achieved using an optimal-K criterion

Crossref

Brunel University Research Archive