Search CORE

5 research outputs found

DatAR: Your brain, your data, on your desk - A research proposal

Author: Hardman L. (Lynda)
Huerst W. (Wolfgang)
Tanhaei G. (Ghazaleh)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/12/2019
Field of study

We present a research proposal that investigates the use of 3D representations in Augmented Reality (AR) to allow neuroscientists to explore literature they wish to understand for their own scientific purposes. Neuroscientists need to identify potential real-life experiments they wish to perform that provide the most information for their field with the minimum use of limited resources. This requires understanding both the already known relationships among concepts and those that have not yet been discovered. Our assumption is that by providing overviews of the correlations among concepts through the use of linked data, these will allow neuroscientists to better understand the gaps in their own literature and more quickly identify the most suitable experiments to carry out. We will identify candidate visualizations and improve upon these for a specific information need. We describe our planned prototype 3D AR implementation and directions we intend to explore

CWI's Institutional Repository

Two to Five Truths in Non-Negative Matrix Factorization

Author: Baughman Brian
Conroy John M.
Gomez Rod
Kaliszewski Ryan
Lines Nicholas A.
Molino Neil P
Publication venue
Publication date: 05/09/2023
Field of study

In this paper, we explore the role of matrix scaling on a matrix of counts when building a topic model using non-negative matrix factorization. We present a scaling inspired by the normalized Laplacian (NL) for graphs that can greatly improve the quality of a non-negative matrix factorization. The results parallel those in the spectral graph clustering work of \cite{Priebe:2019}, where the authors proved adjacency spectral embedding (ASE) spectral clustering was more likely to discover core-periphery partitions and Laplacian Spectral Embedding (LSE) was more likely to discover affinity partitions. In text analysis non-negative matrix factorization (NMF) is typically used on a matrix of co-occurrence ``contexts'' and ``terms" counts. The matrix scaling inspired by LSE gives significant improvement for text topic models in a variety of datasets. We illustrate the dramatic difference a matrix scalings in NMF can greatly improve the quality of a topic model on three datasets where human annotation is available. Using the adjusted Rand index (ARI), a measure cluster similarity we see an increase of 50\% for Twitter data and over 200\% for a newsgroup dataset versus using counts, which is the analogue of ASE. For clean data, such as those from the Document Understanding Conference, NL gives over 40\% improvement over ASE. We conclude with some analysis of this phenomenon and some connections of this scaling with other matrix scaling methods

arXiv.org e-Print Archive

Random Separating Hyperplane Theorem and Learning Polytopes

Author: Bhattacharyya Chiranjib
Kannan Ravindran
Kumar Amit
Publication venue
Publication date: 21/07/2023
Field of study

The Separating Hyperplane theorem is a fundamental result in Convex Geometry with myriad applications. Our first result, Random Separating Hyperplane Theorem (RSH), is a strengthening of this for polytopes. \rsh asserts that if the distance between

a

and a polytope

K

with

k

vertices and unit diameter in

\Re^d

is at least

\delta

, where

\delta

is a fixed constant in

(0,1)

, then a randomly chosen hyperplane separates

a

and

K

with probability at least

1/poly(k)

and margin at least

\Omega \left(\delta/\sqrt{d} \right)

. An immediate consequence of our result is the first near optimal bound on the error increase in the reduction from a Separation oracle to an Optimization oracle over a polytope. RSH has algorithmic applications in learning polytopes. We consider a fundamental problem, denoted the ``Hausdorff problem'', of learning a unit diameter polytope

K

within Hausdorff distance

\delta

, given an optimization oracle for

K

. Using RSH, we show that with polynomially many random queries to the optimization oracle,

K

can be approximated within error

O(\delta)

. To our knowledge this is the first provable algorithm for the Hausdorff Problem. Building on this result, we show that if the vertices of

K

are well-separated, then an optimization oracle can be used to generate a list of points, each within Hausdorff distance

O(\delta)

K

, with the property that the list contains a point close to each vertex of

K

. Further, we show how to prune this list to generate a (unique) approximation to each vertex of the polytope. We prove that in many latent variable settings, e.g., topic modeling, LDA, optimization oracles do exist provided we project to a suitable SVD subspace. Thus, our work yields the first efficient algorithm for finding approximations to the vertices of the latent polytope under the well-separatedness assumption

arXiv.org e-Print Archive

Efficient Algorithms for Sparse Moment Problems without Separation

Author: Fan Zhiyuan
Li Jian
Publication venue
Publication date: 23/07/2023
Field of study

We consider the sparse moment problem of learning a

k

-spike mixture in high-dimensional space from its noisy moment information in any dimension. We measure the accuracy of the learned mixtures using transportation distance. Previous algorithms either assume certain separation assumptions, use more recovery moments, or run in (super) exponential time. Our algorithm for the one-dimensional problem (also called the sparse Hausdorff moment problem) is a robust version of the classic Prony's method, and our contribution mainly lies in the analysis. We adopt a global and much tighter analysis than previous work (which analyzes the perturbation of the intermediate results of Prony's method). A useful technical ingredient is a connection between the linear system defined by the Vandermonde matrix and the Schur polynomial, which allows us to provide tight perturbation bound independent of the separation and may be useful in other contexts. To tackle the high-dimensional problem, we first solve the two-dimensional problem by extending the one-dimensional algorithm and analysis to complex numbers. Our algorithm for the high-dimensional case determines the coordinates of each spike by aligning a 1d projection of the mixture to a random vector and a set of 2d projections of the mixture. Our results have applications to learning topic models and Gaussian mixtures, implying improved sample complexity results or running time over prior work

arXiv.org e-Print Archive

Recommended from our members

Learning topic models -- provably and efficiently

Author: Arora Sanjeev
Ge Rong
Halpern Yoni
Mimno David
Moitra Ankur
Sontag David
Wu Yichen
Zhu Michael
Publication venue
Publication date: 01/01/2018
Field of study

Today, we have both the blessing and the curse of being over- loaded with information. Never before has text been more important to how we communicate, or more easily avail- able. But massive text streams far outstrip anyone’s ability to read. We need automated tools that can help make sense of their thematic structure, and find strands of meaning that connect documents, all without human supervision. Such methods can also help us organize and navigate large text corpora. Popular tools for this task range from Latent Semantic Analysis (LSA)8 which uses standard linear algebra, to deep learning which relies on non-convex optimization. This paper concerns topic modeling which posits a simple probabilistic model of how a document is generated. We give a formal description of the generative model at the end of the section, but next we will outline its important features

Princeton University Open Access Repository

Crossref