Search CORE

1,426 research outputs found

Max-Plus Algebraic Statistical Leverage Scores

Author: Hook James
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2017
Field of study

(Quasi)Periodicity Quantification in Video Data, Using Topology

Author: Perea Jose A.
Tralie Christopher J.
Publication venue
Publication date: 21/01/2018
Field of study

This work introduces a novel framework for quantifying the presence and strength of recurrent dynamics in video data. Specifically, we provide continuous measures of periodicity (perfect repetition) and quasiperiodicity (superposition of periodic modes with non-commensurate periods), in a way which does not require segmentation, training, object tracking or 1-dimensional surrogate signals. Our methodology operates directly on video data. The approach combines ideas from nonlinear time series analysis (delay embeddings) and computational topology (persistent homology), by translating the problem of finding recurrent dynamics in video data, into the problem of determining the circularity or toroidality of an associated geometric space. Through extensive testing, we show the robustness of our scores with respect to several noise models/levels, we show that our periodicity score is superior to other methods when compared to human-generated periodicity rankings, and furthermore, we show that our quasiperiodicity score clearly indicates the presence of biphonation in videos of vibrating vocal folds, which has never before been accomplished end to end quantitatively.Comment: 27 pages, 1 column, 23 figures, SIAM Journal on Imaging Sciences, 201

arXiv.org e-Print Archive

$\ell_p$ Row Sampling by Lewis Weights

Author: Cohen Michael B.
Peng Richard
Publication venue
Publication date: 01/12/2014
Field of study

We give a simple algorithm to efficiently sample the rows of a matrix while preserving the p-norms of its product with vectors. Given an

n

-by-

d

matrix

\boldsymbol{\mathit{A}}

, we find with high probability and in input sparsity time an

\boldsymbol{\mathit{A}}'

consisting of about

d \log{d}

rescaled rows of

\boldsymbol{\mathit{A}}

such that

\| \boldsymbol{\mathit{A}} \boldsymbol{\mathit{x}} \|_1

is close to

\| \boldsymbol{\mathit{A}}' \boldsymbol{\mathit{x}} \|_1

for all vectors

\boldsymbol{\mathit{x}}

. We also show similar results for all

\ell_p

that give nearly optimal sample bounds in input sparsity time. Our results are based on sampling by "Lewis weights", which can be viewed as statistical leverage scores of a reweighted matrix. We also give an elementary proof of the guarantees of this sampling process for

\ell_1

arXiv.org e-Print Archive

Coresets and Sketches

Author: Phillips Jeff M.
Publication venue
Publication date: 12/06/2016
Field of study

Geometric data summarization has become an essential tool in both geometric approximation algorithms and where geometry intersects with big data problems. In linear or near-linear time large data sets can be compressed into a summary, and then more intricate algorithms can be run on the summaries whose results approximate those of the full data set. Coresets and sketches are the two most important classes of these summaries. We survey five types of coresets and sketches: shape-fitting, density estimation, high-dimensional vectors, high-dimensional point sets / matrices, and clustering.Comment: Near-final version of Chapter 49 in Handbook on Discrete and Computational Geometry, 3rd editio

arXiv.org e-Print Archive

Learning Latent Variable Gaussian Graphical Models

Author: Eriksson Brian
Hero III Alfred O.
Meng Zhaoshi
Publication venue
Publication date: 10/06/2014
Field of study

Gaussian graphical models (GGM) have been widely used in many high-dimensional applications ranging from biological and financial data to recommender systems. Sparsity in GGM plays a central role both statistically and computationally. Unfortunately, real-world data often does not fit well to sparse graphical models. In this paper, we focus on a family of latent variable Gaussian graphical models (LVGGM), where the model is conditionally sparse given latent variables, but marginally non-sparse. In LVGGM, the inverse covariance matrix has a low-rank plus sparse structure, and can be learned in a regularized maximum likelihood framework. We derive novel parameter estimation error bounds for LVGGM under mild conditions in the high-dimensional setting. These results complement the existing theory on the structural learning, and open up new possibilities of using LVGGM for statistical inference.Comment: To appear in The 31st International Conference on Machine Learning (ICML 2014

arXiv.org e-Print Archive

CLEAR: A Consistent Lifting, Embedding, and Alignment Rectification Algorithm for Multi-View Data Association

Author: Fathian Kaveh
How Jonathan P.
Khosoussi Kasra
Lusk Parker
Tian Yulun
Publication venue
Publication date: 04/03/2020
Field of study

Many robotics applications require alignment and fusion of observations obtained at multiple views to form a global model of the environment. Multi-way data association methods provide a mechanism to improve alignment accuracy of pairwise associations and ensure their consistency. However, existing methods that solve this computationally challenging problem are often too slow for real-time applications. Furthermore, some of the existing techniques can violate the cycle consistency principle, thus drastically reducing the fusion accuracy. This work presents the CLEAR (Consistent Lifting, Embedding, and Alignment Rectification) algorithm to address these issues. By leveraging insights from the multi-way matching and spectral graph clustering literature, CLEAR provides cycle consistent and accurate solutions in a computationally efficient manner. Numerical experiments on both synthetic and real datasets are carried out to demonstrate the scalability and superior performance of our algorithm in real-world problems. This algorithmic framework can provide significant improvement in the accuracy and efficiency of existing discrete assignment problems, which traditionally use pairwise (but potentially inconsistent) correspondences. An implementation of CLEAR is made publicly available online

arXiv.org e-Print Archive

Hinge-Loss Markov Random Fields and Probabilistic Soft Logic

Author: Bach Stephen H.
Broecheler Matthias
Getoor Lise
Huang Bert
Publication venue
Publication date: 16/11/2017
Field of study

A fundamental challenge in developing high-impact machine learning technologies is balancing the need to model rich, structured domains with the ability to scale to big data. Many important problem areas are both richly structured and large scale, from social and biological networks, to knowledge graphs and the Web, to images, video, and natural language. In this paper, we introduce two new formalisms for modeling structured data, and show that they can both capture rich structure and scale to big data. The first, hinge-loss Markov random fields (HL-MRFs), is a new kind of probabilistic graphical model that generalizes different approaches to convex inference. We unite three approaches from the randomized algorithms, probabilistic graphical models, and fuzzy logic communities, showing that all three lead to the same inference objective. We then define HL-MRFs by generalizing this unified objective. The second new formalism, probabilistic soft logic (PSL), is a probabilistic programming language that makes HL-MRFs easy to define using a syntax based on first-order logic. We introduce an algorithm for inferring most-probable variable assignments (MAP inference) that is much more scalable than general-purpose convex optimization methods, because it uses message passing to take advantage of sparse dependency structures. We then show how to learn the parameters of HL-MRFs. The learned HL-MRFs are as accurate as analogous discrete models, but much more scalable. Together, these algorithms enable HL-MRFs and PSL to model rich, structured data at scales not previously possible

arXiv.org e-Print Archive

Perturbations of Christoffel-Darboux kernels. I: detection of outliers

Author: Beckermann Bernhard
Putinar Mihai
Saff Edward B.
Stylianopoulos Nikos
Publication venue
Publication date: 27/04/2019
Field of study

Two central objects in constructive approximation, the Christoffel-Darboux kernel and the Christoffel function, are encoding ample information about the associated moment data and ultimately about the possible generating measures. We develop a multivariate theory of the Christoffel-Darboux kernel in C^d, with emphasis on the perturbation of Christoffel functions and their level sets with respect to perturbations of small norm or low rank. The statistical notion of leverage score provides a quantitative criterion for the detection of outliers in large data. Using the refined theory of Bergman orthogonal polynomials, we illustrate the main results, including some numerical simulations, in the case of finite atomic perturbations of area measure of a 2D region. Methods of function theory of a complex variable and (pluri)potential theory are widely used in the derivation of our perturbation formulas.Comment: second version, 53 page

arXiv.org e-Print Archive

Conjecturing-Based Computational Discovery of Patterns in Data

Author: Brooks J. P.
Edwards D. J.
Larson C. E.
Van Cleemput N.
Publication venue
Publication date: 28/09/2021
Field of study

Modern machine learning methods are designed to exploit complex patterns in data regardless of their form, while not necessarily revealing them to the investigator. Here we demonstrate situations where modern machine learning methods are ill-equipped to reveal feature interaction effects and other nonlinear relationships. We propose the use of a conjecturing machine that generates feature relationships in the form of bounds for numerical features and boolean expressions for nominal features that are ignored by machine learning algorithms. The proposed framework is demonstrated for a classification problem with an interaction effect and a nonlinear regression problem. In both settings, true underlying relationships are revealed and generalization performance improves. The framework is then applied to patient-level data regarding COVID-19 outcomes to suggest possible risk factors.Comment: 25 pages, 6 figure

arXiv.org e-Print Archive

How to Train Your Energy-Based Models

Author: Kingma Diederik P.
Song Yang
Publication venue
Publication date: 17/02/2021
Field of study

Energy-Based Models (EBMs), also known as non-normalized probabilistic models, specify probability density or mass functions up to an unknown normalizing constant. Unlike most other probabilistic models, EBMs do not place a restriction on the tractability of the normalizing constant, thus are more flexible to parameterize and can model a more expressive family of probability distributions. However, the unknown normalizing constant of EBMs makes training particularly difficult. Our goal is to provide a friendly introduction to modern approaches for EBM training. We start by explaining maximum likelihood training with Markov chain Monte Carlo (MCMC), and proceed to elaborate on MCMC-free approaches, including Score Matching (SM) and Noise Constrastive Estimation (NCE). We highlight theoretical connections among these three approaches, and end with a brief survey on alternative training methods, which are still under active research. Our tutorial is targeted at an audience with basic understanding of generative models who want to apply EBMs or start a research project in this direction

arXiv.org e-Print Archive