Search CORE

33,683 research outputs found

Clustering and Latent Semantic Indexing Aspects of the Nonnegative Matrix Factorization

Author: Mirzal Andri
Publication venue
Publication date: 16/12/2011
Field of study

This paper provides a theoretical support for clustering aspect of the nonnegative matrix factorization (NMF). By utilizing the Karush-Kuhn-Tucker optimality conditions, we show that NMF objective is equivalent to graph clustering objective, so clustering aspect of the NMF has a solid justification. Different from previous approaches which usually discard the nonnegativity constraints, our approach guarantees the stationary point being used in deriving the equivalence is located on the feasible region in the nonnegative orthant. Additionally, since clustering capability of a matrix decomposition technique can sometimes imply its latent semantic indexing (LSI) aspect, we will also evaluate LSI aspect of the NMF by showing its capability in solving the synonymy and polysemy problems in synthetic datasets. And more extensive evaluation will be conducted by comparing LSI performances of the NMF and the singular value decomposition (SVD), the standard LSI method, using some standard datasets.Comment: 28 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints

Author: Anandkumar Animashree
Hsu Daniel
Javanmard Adel
Kakade Sham M.
Publication venue
Publication date: 24/09/2012
Field of study

Unsupervised estimation of latent variable models is a fundamental problem central to numerous applications of machine learning and statistics. This work presents a principled approach for estimating broad classes of such models, including probabilistic topic models and latent linear Bayesian networks, using only second-order observed moments. The sufficient conditions for identifiability of these models are primarily based on weak expansion constraints on the topic-word matrix, for topic models, and on the directed acyclic graph, for Bayesian networks. Because no assumptions are made on the distribution among the latent variables, the approach can handle arbitrary correlations among the topics or latent factors. In addition, a tractable learning method via

\ell_1

optimization is proposed and studied in numerical experiments.Comment: 38 pages, 6 figures, 2 tables, applications in topic models and Bayesian networks are studied. Simulation section is adde

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

Implementation of a Combined OFDM-Demodulation and WCDMA-Equalization Module

Author: Gerez Sabih H.
Hofstra Klaas L.
Kampen David van
Potman Jordy
Publication venue
Publication date: 01/01/2006
Field of study

For a dual-mode baseband receiver for the OFDMWireless LAN andWCDMA standards, integration of the demodulation and equalization tasks on a dedicated hardware module has been investigated. For OFDM demodulation, an FFT algorithm based on cascaded twiddle factor decomposition has been selected. This type of algorithm combines high spatial and temporal regularity in the FFT data-flow graphs with a minimal number of computations. A frequency-domain algorithm based on a circulant channel approximation has been selected for WCDMA equalization. It has good performance, low hardware complexity and a low number of computations. Its main advantage is the reuse of the FFT kernel, which contributes to the integration of both tasks. The demodulation and equalization module has been described at the register transfer level with the in-house developed Arx language. The core of the module is a pipelined radix-23 butterfly combined with a complex multiplier and complex divider. The module has an area of 0.447 mm2 in 0.18 ¿m technology and a power consumption of 10.6 mW. The proposed module compares favorably with solutions reported in literature

CiteSeerX

University of Twente Research Information

A Scalable Asynchronous Distributed Algorithm for Topic Modeling

Author: Asuncion A.
Asuncion A.
Cormen T. H.
Gonzalez J. E.
Snyder P.
Yan F.
Publication venue
Publication date: 16/12/2014
Field of study

Learning meaningful topic models with massive document collections which contain millions of documents and billions of tokens is challenging because of two reasons: First, one needs to deal with a large number of topics (typically in the order of thousands). Second, one needs a scalable and efficient way of distributing the computation across multiple machines. In this paper we present a novel algorithm F+Nomad LDA which simultaneously tackles both these problems. In order to handle large number of topics we use an appropriately modified Fenwick tree. This data structure allows us to sample from a multinomial distribution over

T

items in

O(\log T)

time. Moreover, when topic counts change the data structure can be updated in

O(\log T)

time. In order to distribute the computation across multiple processor we present a novel asynchronous framework inspired by the Nomad algorithm of \cite{YunYuHsietal13}. We show that F+Nomad LDA significantly outperform state-of-the-art on massive problems which involve millions of documents, billions of words, and thousands of topics

arXiv.org e-Print Archive

CiteSeerX

Crossref