Search CORE

168 research outputs found

Computing Bi-Lipschitz Outlier Embeddings into the Line

Author: Chubarian Karine
Sidiropoulos Anastasios
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)
Publication date: 01/01/2020
Field of study

The problem of computing a bi-Lipschitz embedding of a graphical metric into the line with minimum distortion has received a lot of attention. The best-known approximation algorithm computes an embedding with distortion

O(c^2)

, where

c

denotes the optimal distortion [B\u{a}doiu \etal~2005]. We present a bi-criteria approximation algorithm that extends the above results to the setting of \emph{outliers}. Specifically, we say that a metric space

(X,\rho)

admits a

(k,c)

-embedding if there exists

K\subset X

, with

|K|=k

, such that

(X\setminus K, \rho)

admits an embedding into the line with distortion at most

c

. Given

k\geq 0

, and a metric space that admits a

(k,c)

-embedding, for some

c\geq 1

, our algorithm computes a

({\mathsf p}{\mathsf o}{\mathsf l}{\mathsf y}(k, c, \log n), {\mathsf p}{\mathsf o}{\mathsf l}{\mathsf y}(c))

-embedding in polynomial time. This is the first algorithmic result for outlier bi-Lipschitz embeddings. Prior to our work, comparable outlier embeddings where known only for the case of additive distortion

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Composition of nested embeddings with an application to outlier removal

Author: Chawla Shuchi
Sheridan Kristin
Publication venue
Publication date: 27/06/2023
Field of study

We study the design of embeddings into Euclidean space with outliers. Given a metric space

(X,d)

and an integer

k

, the goal is to embed all but

k

points in

X

(called the "outliers") into

\ell_2

with the smallest possible distortion

c

. Finding the optimal distortion

c

for a given outlier set size

k

, or alternately the smallest

k

for a given target distortion

c

are both NP-hard problems. In fact, it is UGC-hard to approximate

k

to within a factor smaller than

2

even when the metric sans outliers is isometrically embeddable into

\ell_2

. We consider bi-criteria approximations. Our main result is a polynomial time algorithm that approximates the outlier set size to within an

O(\log^4 k)

factor and the distortion to within a constant factor. The main technical component in our result is an approach for constructing a composition of two given embeddings from subsets of

X

into

\ell_2

which inherits the distortions of each to within small multiplicative factors. Specifically, given a low

c_S

distortion embedding from

S\subset X

into

\ell_2

and a high(er)

c_X

distortion embedding from the entire set

X

into

\ell_2

, we construct a single embedding that achieves the same distortion

c_S

over pairs of points in

S

and an expansion of at most

O(\log k)\cdot c_X

over the remaining pairs of points, where

k=|X\setminus S|

. Our composition theorem extends to embeddings into arbitrary

\ell_p

metrics for

p\ge 1

, and may be of independent interest. While unions of embeddings over disjoint sets have been studied previously, to our knowledge, this is the first work to consider compositions of nested embeddings.Comment: 25 pages (including 2 appendices), 5 figure

arXiv.org e-Print Archive

Sparse Control of Alignment Models in High Dimension

Author: Bongini Mattia
Fornasier Massimo
Junge Oliver
Scharf Benjamin
Publication venue
Publication date: 27/08/2014
Field of study

For high dimensional particle systems, governed by smooth nonlinearities depending on mutual distances between particles, one can construct low-dimensional representations of the dynamical system, which allow the learning of nearly optimal control strategies in high dimension with overwhelming confidence. In this paper we present an instance of this general statement tailored to the sparse control of models of consensus emergence in high dimension, projected to lower dimensions by means of random linear maps. We show that one can steer, nearly optimally and with high probability, a high-dimensional alignment model to consensus by acting at each switching time on one agent of the system only, with a control rule chosen essentially exclusively according to information gathered from a randomly drawn low-dimensional representation of the control system.Comment: 39 page

arXiv.org e-Print Archive

CiteSeerX

Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

Author: Munteanu Alexander
Schwiegelshohn Chris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques

Archivio della ricerca- Università di Roma La Sapienza

Exact Computation of a Manifold Metric, via Lipschitz Embeddings and Shortest Paths on a Graph

Author: Chu Timothy
Miller Gary
Sheehy Donald
Publication venue
Publication date: 21/04/2020
Field of study

Data-sensitive metrics adapt distances locally based the density of data points with the goal of aligning distances and some notion of similarity. In this paper, we give the first exact algorithm for computing a data-sensitive metric called the nearest neighbor metric. In fact, we prove the surprising result that a previously published

3

-approximation is an exact algorithm. The nearest neighbor metric can be viewed as a special case of a density-based distance used in machine learning, or it can be seen as an example of a manifold metric. Previous computational research on such metrics despaired of computing exact distances on account of the apparent difficulty of minimizing over all continuous paths between a pair of points. We leverage the exact computation of the nearest neighbor metric to compute sparse spanners and persistent homology. We also explore the behavior of the metric built from point sets drawn from an underlying distribution and consider the more general case of inputs that are finite collections of path-connected compact sets. The main results connect several classical theories such as the conformal change of Riemannian metrics, the theory of positive definite functions of Schoenberg, and screw function theory of Schoenberg and Von Neumann. We develop novel proof techniques based on the combination of screw functions and Lipschitz extensions that may be of independent interest.Comment: 15 page

arXiv.org e-Print Archive

Crossref

One-class classifiers based on entropic spanning graphs

Author: Alippi Cesare
Livi Lorenzo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/08/2016
Field of study

One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach takes into account the possibility to process also non-numeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the

\alpha

-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN, Vancouver, Canad

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Open Research Exeter

Graph Priors, Optimal Transport, and Deep Learning in Biomedical Discovery

Author: Tong Alexander
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/10/2021
Field of study

Recent advances in biomedical data collection allows the collection of massive datasets measuring thousands of features in thousands to millions of individual cells. This data has the potential to advance our understanding of biological mechanisms at a previously impossible resolution. However, there are few methods to understand data of this scale and type. While neural networks have made tremendous progress on supervised learning problems, there is still much work to be done in making them useful for discovery in data with more difficult to represent supervision. The flexibility and expressiveness of neural networks is sometimes a hindrance in these less supervised domains, as is the case when extracting knowledge from biomedical data. One type of prior knowledge that is more common in biological data comes in the form of geometric constraints. In this thesis, we aim to leverage this geometric knowledge to create scalable and interpretable models to understand this data. Encoding geometric priors into neural network and graph models allows us to characterize the models’ solutions as they relate to the fields of graph signal processing and optimal transport. These links allow us to understand and interpret this datatype. We divide this work into three sections. The first borrows concepts from graph signal processing to construct more interpretable and performant neural networks by constraining and structuring the architecture. The second borrows from the theory of optimal transport to perform anomaly detection and trajectory inference efficiently and with theoretical guarantees. The third examines how to compare distributions over an underlying manifold, which can be used to understand how different perturbations or conditions relate. For this we design an efficient approximation of optimal transport based on diffusion over a joint cell graph. Together, these works utilize our prior understanding of the data geometry to create more useful models of the data. We apply these methods to molecular graphs, images, single-cell sequencing, and health record data

Yale University

Phase Retrieval From Binary Measurements

Author: Mukherjee Subhadip
Seelamantula Chandra Sekhar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/11/2017
Field of study

We consider the problem of signal reconstruction from quadratic measurements that are encoded as +1 or -1 depending on whether they exceed a predetermined positive threshold or not. Binary measurements are fast to acquire and inexpensive in terms of hardware. We formulate the problem of signal reconstruction using a consistency criterion, wherein one seeks to find a signal that is in agreement with the measurements. To enforce consistency, we construct a convex cost using a one-sided quadratic penalty and minimize it using an iterative accelerated projected gradient-descent (APGD) technique. The PGD scheme reduces the cost function in each iteration, whereas incorporating momentum into PGD, notwithstanding the lack of such a descent property, exhibits faster convergence than PGD empirically. We refer to the resulting algorithm as binary phase retrieval (BPR). Considering additive white noise contamination prior to quantization, we also derive the Cramer-Rao Bound (CRB) for the binary encoding model. Experimental results demonstrate that the BPR algorithm yields a signal-to- reconstruction error ratio (SRER) of approximately 25 dB in the absence of noise. In the presence of noise prior to quantization, the SRER is within 2 to 3 dB of the CRB

arXiv.org e-Print Archive