398 research outputs found
Finite automata for caching in matrix product algorithms
A diagram is introduced for visualizing matrix product states which makes
transparent a connection between matrix product factorizations of states and
operators, and complex weighted finite state automata. It is then shown how one
can proceed in the opposite direction: writing an automaton that ``generates''
an operator gives one an immediate matrix product factorization of it. Matrix
product factorizations have the advantage of reducing the cost of computing
expectation values by facilitating caching of intermediate calculations. Thus
our connection to complex weighted finite state automata yields insight into
what allows for efficient caching in matrix product algorithms. Finally, these
techniques are generalized to the case of multiple dimensions.Comment: 18 pages, 19 figures, LaTeX; numerous improvements have been made to
the manuscript in response to referee feedbac
Utilisation de l'analyse formelle de concepts pour extraire le plus grand modèle commun
International audienceThe development of information systems follows a long and complex process in which various actors are involved. We report an experiment in which we observe the evolution of the analysis model of an information system through 15 successive versions. We use indicators on the underlying concept lattices built by applying Relational Concept Analysis (RCA) to each version. RCA is an extension of FCA which groups entities based on characteristics they share, including links to other entities. It here helps in analyzing their evolution. From this experience, we establish recommendations to monitor and verify the proper evolution of the analysis process
CUR Decompositions, Similarity Matrices, and Subspace Clustering
A general framework for solving the subspace clustering problem using the CUR
decomposition is presented. The CUR decomposition provides a natural way to
construct similarity matrices for data that come from a union of unknown
subspaces . The similarity
matrices thus constructed give the exact clustering in the noise-free case.
Additionally, this decomposition gives rise to many distinct similarity
matrices from a given set of data, which allow enough flexibility to perform
accurate clustering of noisy data. We also show that two known methods for
subspace clustering can be derived from the CUR decomposition. An algorithm
based on the theoretical construction of similarity matrices is presented, and
experiments on synthetic and real data are presented to test the method.
Additionally, an adaptation of our CUR based similarity matrices is utilized
to provide a heuristic algorithm for subspace clustering; this algorithm yields
the best overall performance to date for clustering the Hopkins155 motion
segmentation dataset.Comment: Approximately 30 pages. Current version contains improved algorithm
and numerical experiments from the previous versio
Curriculum Guidelines for Undergraduate Programs in Data Science
The Park City Math Institute (PCMI) 2016 Summer Undergraduate Faculty Program
met for the purpose of composing guidelines for undergraduate programs in Data
Science. The group consisted of 25 undergraduate faculty from a variety of
institutions in the U.S., primarily from the disciplines of mathematics,
statistics and computer science. These guidelines are meant to provide some
structure for institutions planning for or revising a major in Data Science
Image-based Recommendations on Styles and Substitutes
Humans inevitably develop a sense of the relationships between objects, some
of which are based on their appearance. Some pairs of objects might be seen as
being alternatives to each other (such as two pairs of jeans), while others may
be seen as being complementary (such as a pair of jeans and a matching shirt).
This information guides many of the choices that people make, from buying
clothes to their interactions with each other. We seek here to model this human
sense of the relationships between objects based on their appearance. Our
approach is not based on fine-grained modeling of user annotations but rather
on capturing the largest dataset possible and developing a scalable method for
uncovering human notions of the visual relationships within. We cast this as a
network inference problem defined on graphs of related images, and provide a
large-scale dataset for the training and evaluation of the same. The system we
develop is capable of recommending which clothes and accessories will go well
together (and which will not), amongst a host of other applications.Comment: 11 pages, 10 figures, SIGIR 201
Sparse and Nonnegative Factorizations For Music Understanding
In this dissertation, we propose methods for sparse and nonnegative factorization that are specifically suited for analyzing musical signals. First, we discuss two constraints that aid factorization of musical signals: harmonic and co-occurrence constraints. We propose a novel dictionary learning method that imposes harmonic constraints upon the atoms of the learned dictionary while allowing the dictionary size to grow appropriately during the learning procedure. When there is significant spectral-temporal overlap among the musical sources, our method outperforms popular existing matrix factorization methods as measured by the recall and precision of learned dictionary atoms. We also propose co-occurrence constraints -- three simple and convenient multiplicative update rules for nonnegative matrix factorization (NMF) that enforce dependence among atoms. Using examples in music transcription, we demonstrate the ability of these updates to represent each musical note with multiple atoms and cluster the atoms for source separation purposes.
Second, we study how spectral and temporal information extracted by nonnegative factorizations can improve upon musical instrument recognition. Musical instrument recognition in melodic signals is difficult, especially for classification systems that rely entirely upon spectral information instead of temporal information. Here, we propose a simple and effective method of combining spectral and temporal information for instrument recognition. While existing classification methods use traditional features such as statistical moments, we extract novel features from spectral and temporal atoms generated by NMF using a biologically motivated multiresolution gamma filterbank. Unlike other methods that require thresholds, safeguards, and hierarchies, the proposed spectral-temporal method requires only simple filtering and a flat classifier.
Finally, we study how to perform sparse factorization when a large dictionary of musical atoms is already known. Sparse coding methods such as matching pursuit (MP) have been applied to problems in music information retrieval such as transcription and source separation with moderate success. However, when the set of dictionary atoms is large, identification of the best match in the dictionary with the residual is slow -- linear in the size of the dictionary. Here, we propose a variant called approximate matching pursuit (AMP) that is faster than MP while maintaining scalability and accuracy. Unlike MP, AMP uses an approximate nearest-neighbor (ANN) algorithm to find the closest match in a dictionary in sublinear time. One such ANN algorithm, locality-sensitive hashing (LSH), is a probabilistic hash algorithm that places similar, yet not identical, observations into the same bin. While the accuracy of AMP is comparable to similar MP methods, the computational complexity is reduced. Also, by using LSH, this method scales easily; the dictionary can be expanded without reorganizing any data structures
Tensor factorizations of local second-order Møller–Plesset theory
Efficient electronic structure methods can be built around efficient tensor representations of the wavefunction. Here we first describe a general view of tensor factorization for the compact representation of electronic wavefunctions. Next, we use this language to construct a low-complexity representation of the doubles amplitudes in local second-order Møller–Plesset perturbation theory. We introduce two approximations—the direct orbital-specific virtual approximation and the full orbital-specific virtual approximation. In these approximations, each occupied orbital is associated with a small set of correlating virtual orbitals. Conceptually, the representation lies between the projected atomic orbital representation in Pulay–Saebø local correlation theories and pair natural orbital correlation theories. We have tested the orbital-specific virtual approximations on a variety of systems and properties including total energies, reaction energies, and potential energy curves. Compared to the Pulay–Saebø ansatz, we find that these approximations exhibit favorable accuracy and computational times while yielding smooth potential energy curves
- …