Search CORE

116 research outputs found

Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

Author: Munteanu Alexander
Schwiegelshohn Chris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques

Archivio della ricerca- Università di Roma La Sapienza

Probabilistic Smallest Enclosing Ball in High Dimensions via Subgradient Sampling

Author: Krivosija Amer
Munteanu Alexander
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 35th International Symposium on Computational Geometry (SoCG 2019)
Publication date: 01/01/2019
Field of study

We study a variant of the median problem for a collection of point sets in high dimensions. This generalizes the geometric median as well as the (probabilistic) smallest enclosing ball (pSEB) problems. Our main objective and motivation is to improve the previously best algorithm for the pSEB problem by reducing its exponential dependence on the dimension to linear. This is achieved via a novel combination of sampling techniques for clustering problems in metric spaces with the framework of stochastic subgradient descent. As a result, the algorithm becomes applicable to shape fitting problems in Hilbert spaces of unbounded dimension via kernel functions. We present an exemplary application by extending the support vector data description (SVDD) shape fitting method to the probabilistic case. This is done by simulating the pSEB algorithm implicitly in the feature space induced by the kernel function

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Random projections for Bayesian regression

Author: Geppert Leo N.
Ickstadt Katja
Munteanu Alexander
Quedenfeld Jens
Sohler Christian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/11/2015
Field of study

This article deals with random projections applied as a data reduction technique for Bayesian regression analysis. We show sufficient conditions under which the entire

d

-dimensional distribution is approximately preserved under random projections by reducing the number of data points from

n

k\in O(\operatorname{poly}(d/\varepsilon))

in the case

n\gg d

. Under mild assumptions, we prove that evaluating a Gaussian likelihood function based on the projected data instead of the original data yields a

(1+O(\varepsilon))

-approximation in terms of the

\ell_2

Wasserstein distance. Our main result shows that the posterior distribution of Bayesian linear regression is approximated up to a small error depending on only an

\varepsilon

-fraction of its defining parameters. This holds when using arbitrary Gaussian priors or the degenerate case of uniform distributions over

\mathbb{R}^d

for

\beta

. Our empirical evaluations involve different simulated settings of Bayesian linear regression. Our experiments underline that the proposed method is able to recover the regression model up to small error while considerably reducing the total running time

arXiv.org e-Print Archive

Springer - Publisher Connector

On large-scale probabilistic and statistical data analysis

Author: Munteanu Alexander
Publication venue
Publication date
Field of study

In this manuscript we develop and apply modern algorithmic data reduction techniques to tackle scalability issues and enable statistical data analysis of massive data sets. Our algorithms follow a general scheme, where a reduction technique is applied to the large-scale data to obtain a small summary of sublinear size to which a classical algorithm is applied. The techniques for obtaining these summaries depend on the problem that we want to solve. The size of the summaries is usually parametrized by an approximation parameter, expressing the trade-off between efficiency and accuracy. In some cases the data can be reduced to a size that has no or only negligible dependency on the initial number of data items. However, for other problems it turns out that sublinear summaries do not exist in the worst case. In such situations, we exploit statistical or geometric relaxations to obtain useful sublinear summaries under certain mildness assumptions. We present, in particular, the data reduction methods called coresets and subspace embeddings, and several algorithmic techniques to construct these via random projections and sampling

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Streaming statistical models via Merge & Reduce

Author: Geppert Leo N.
Ickstadt Katja
Munteanu Alexander
Sohler Christian
Publication venue
Publication date: 12/06/2020
Field of study

Merge & Reduce is a general algorithmic scheme in the theory of data structures. Its main purpose is to transform static data structures—that support only queries—into dynamic data structures—that allow insertions of new elements—with as little overhead as possible. This can be used to turn classic offline algorithms for summarizing and analyzing data into streaming algorithms. We transfer these ideas to the setting of statistical data analysis in streaming environments. Our approach is conceptually different from previous settings where Merge & Reduce has been employed. Instead of summarizing the data, we combine the Merge & Reduce framework directly with statistical models. This enables performing computationally demanding data analysis tasks on massive data sets. The computations are divided into small tractable batches whose size is independent of the total number of observations n. The results are combined in a structured way at the cost of a bounded O(logn) factor in their memory requirements. It is only necessary, though nontrivial, to choose an appropriate statistical model and design merge and reduce operations on a casewise basis for the specific type of model. We illustrate our Merge & Reduce schemes on simulated and real-world data employing (Bayesian) linear regression models, Gaussian mixture models and generalized linear models

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Glow Discharge Optical Emission Spectrometry (GDOES), an Effectiveness Method for Characterizing Composition of Surfaces and Coatings

Author: Alexander SCHREINER
Daniel MUNTEANU
Publication venue: Galati University Press
Publication date: 01/11/2008
Field of study

Within the frame of this work, the technical procedures and real advantages of using Glow Discharge Optical Emission Spectroscopy (GDOES) for establishing depth concentration profiles of surfaces are presented. GDOES can detect low concentrations with high accuracy. It can be used for either quantitative bulk analysis (QBA) or quantitative depth profiling (QDP) in the nanometer to micron range. Non-conductive and conductive samples can be analysed. The main applications of this spectral method are related to different technology fields such as: heat treatment processes, casting, heat and cold forming processes, thermochemical treatments, electro-chemical processes (galvanic coatings), chemical and physical vapour depositions (CVD, PVD), thermal oxidation processes and anodizing, thin-films and others

Directory of Open Access Journals

Optimal Sketching Bounds for Sparse Linear Regression

Author: Mai Tung
Munteanu Alexander
Musco Cameron
Rao Anup B.
Schwiegelshohn Chris
Woodruff David P.
Publication venue
Publication date: 05/04/2023
Field of study

We study oblivious sketching for

k

-sparse linear regression under various loss functions such as an

\ell_p

norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse

\ell_2

norm regression, there is a distribution over oblivious sketches with

\Theta(k\log(d/k)/\varepsilon^2)

rows, which is tight up to a constant factor. This extends to

\ell_p

loss with an additional additive

O(k\log(k/\varepsilon)/\varepsilon^2)

term in the upper bound. This establishes a surprising separation from the related sparse recovery problem, which is an important special case of sparse regression. For this problem, under the

\ell_2

norm, we observe an upper bound of

O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2)

rows, showing that sparse recovery is strictly easier to sketch than sparse regression. For sparse regression under hinge-like loss functions including sparse logistic and sparse ReLU regression, we give the first known sketching bounds that achieve

o(d)

rows showing that

O(\mu^2 k\log(\mu n d/\varepsilon)/\varepsilon^2)

rows suffice, where

\mu

is a natural complexity parameter needed to obtain relative error bounds for these loss functions. We again show that this dimension is tight, up to lower order terms and the dependence on

\mu

. Finally, we show that similar sketching bounds can be achieved for LASSO regression, a popular convex relaxation of sparse regression, where one aims to minimize

\|Ax-b\|_2^2+\lambda\|x\|_1

over

x\in\mathbb{R}^d

. We show that sketching dimension

O(\log(d)/(\lambda \varepsilon)^2)

suffices and that the dependence on

d

and

\lambda

is tight.Comment: AISTATS 202

arXiv.org e-Print Archive