Search CORE

249 research outputs found

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

Author: Cohen Michael B.
Elder Sam
Musco Cameron
Musco Christopher
Persu Madalina
Publication venue
Publication date: 02/04/2015
Field of study

We show how to approximate a data matrix

\mathbf{A}

with a much smaller sketch

\mathbf{\tilde A}

that can be used to solve a general class of constrained k-rank approximation problems to within

(1+\epsilon)

error. Importantly, this class of problems includes

k

-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just

O(k)

dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For

k

-means dimensionality reduction, we provide

(1+\epsilon)

relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover' a good subspace for \bv{A}, but can be used directly to compute this subspace. Finally, for

k

-means clustering, we show how to achieve a

(9+\epsilon)

approximation by Johnson-Lindenstrauss projecting data points to just

O(\log k/\epsilon^2)

dimensions. This gives the first result that leverages the specific structure of

k

-means to achieve dimension independent of input size and sublinear in

k

arXiv.org e-Print Archive

CiteSeerX

Uniform Sampling for Matrix Approximation

Author: Cohen Michael B.
Lee Yin Tat
Musco Cameron
Musco Christopher
Peng Richard
Sidford Aaron
Publication venue
Publication date: 21/08/2014
Field of study

Random sampling has become a critical tool in solving massive matrix problems. For linear regression, a small, manageable set of data rows can be randomly selected to approximate a tall, skinny data matrix, improving processing time significantly. For theoretical performance guarantees, each row must be sampled with probability proportional to its statistical leverage score. Unfortunately, leverage scores are difficult to compute. A simple alternative is to sample rows uniformly at random. While this often works, uniform sampling will eliminate critical row information for many natural instances. We take a fresh look at uniform sampling by examining what information it does preserve. Specifically, we show that uniform sampling yields a matrix that, in some sense, well approximates a large fraction of the original. While this weak form of approximation is not enough for solving linear regression directly, it is enough to compute a better approximation. This observation leads to simple iterative row sampling algorithms for matrix approximation that run in input-sparsity time and preserve row structure and sparsity at all intermediate steps. In addition to an improved understanding of uniform sampling, our main proof introduces a structural result of independent interest: we show that every matrix can be made to have low coherence by reweighting a small subset of its rows

arXiv.org e-Print Archive

CiteSeerX

Crossref

Online Row Sampling

Author: Cohen Michael B.
Musco Cameron
Pachocki Jakub
Publication venue
Publication date: 01/01/2016
Field of study

Finding a small spectral approximation for a tall

n \times d

matrix

A

is a fundamental numerical primitive. For a number of reasons, one often seeks an approximation whose rows are sampled from those of

A

. Row sampling improves interpretability, saves space when

A

is sparse, and preserves row structure, which is especially important, for example, when

A

represents a graph. However, correctly sampling rows from

A

can be costly when the matrix is large and cannot be stored and processed in memory. Hence, a number of recent publications focus on row sampling in the streaming setting, using little more space than what is required to store the outputted approximation [KL13, KLM+14]. Inspired by a growing body of work on online algorithms for machine learning and data analysis, we extend this work to a more restrictive online setting: we read rows of

A

one by one and immediately decide whether each row should be kept in the spectral approximation or discarded, without ever retracting these decisions. We present an extremely simple algorithm that approximates

A

up to multiplicative error

\epsilon

and additive error

\delta

using

O(d \log d \log(\epsilon||A||_2/\delta)/\epsilon^2)

online samples, with memory overhead proportional to the cost of storing the spectral approximation. We also present an algorithm that uses

O(d^2

) memory but only requires

O(d\log(\epsilon||A||_2/\delta)/\epsilon^2)

samples, which we show is optimal. Our methods are clean and intuitive, allow for lower memory usage than prior work, and expose new theoretical properties of leverage score based matrix approximation

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Recommended from our members

Online Row Sampling

Author: Cohen Michael B.
Musco Cameron
Pachocki Jakub
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2020
Field of study

Finding a small spectral approximation for a tall n X d matrix A is a fundamental numerical primitive. For a number of reasons, one often seeks an approximation whose rows are sampled from those of A. Row sampling improves interpretability, saves space when A is sparse, and preserves structure, which is important, e.g., when A represents a graph. However, correct sampling rows from A can be costly when the matrix is large and cannot be stored and processed in memory. Hence, a number of recent publications focus on row sampling in the streaming setting, using little more space than what is required to store the returned approximation (Kelner--Levin, Theory Comput. Sys. 2013, Kapralov et al., SIAM J. Comp. 2017). Inspired by a growing body of work on online algorithms for machine learning and data analysis, we extend this work to a more restrictive online setting: we read rows of A one by one and immediately decide whether each row should be kept in the spectral approximation or discarded, without ever retracting these decisions. We present an extremely simple algorithm that approximates A up to multiplicative-error 1+ϵ and additive-error δ using O(d log d log(ϵ‖A‖22/δ)/ϵ2) online samples, with memory overhead proportional to the cost of storing the spectral approximation. We also present an algorithm that uses O(d2) memory but only requires O(d log d log(ϵ‖A‖22/δ)/ϵ2) samples, which we show is optimal. Our methods are clean and intuitive, allow for lower memory usage than prior work, and expose new theoretical properties of leverage score based matrix approximation

ScholarWorks@UMass Amherst

Critical Phenomena in Neutron Stars I: Linearly Unstable Nonrotating Models

Author: Brady P
Chirenti C B M H
David Radice
Giacomazzo B
Gundlach C
Gundlach C
Hara T Koike T Adachi S
Kellerman T
Kellerman T
Liebling S L Lehner L Neilsen D Palenzuela C
Luciano Rezzolla
Musco I
Musco I
Neilsen D W
Noble S C
Thorsten Kellerman
Wan M B Jin K J Suen W M
Publication venue: 'IOP Publishing'
Publication date: 01/01/2010
Field of study

We consider the evolution in full general relativity of a family of linearly unstable isolated spherical neutron stars under the effects of very small, perturbations as induced by the truncation error. Using a simple ideal-fluid equation of state we find that this system exhibits a type-I critical behaviour, thus confirming the conclusions reached by Liebling et al. [1] for rotating magnetized stars. Exploiting the relative simplicity of our system, we are able carry out a more in-depth study providing solid evidences of the criticality of this phenomenon and also to give a simple interpretation of the putative critical solution as a spherical solution with the unstable mode being the fundamental F-mode. Hence for any choice of the polytropic constant, the critical solution will distinguish the set of subcritical models migrating to the stable branch of the models of equilibrium from the set of subcritical models collapsing to a black hole. Finally, we study how the dynamics changes when the numerically perturbation is replaced by a finite-size, resolution independent velocity perturbation and show that in such cases a nearly-critical solution can be changed into either a sub or supercritical. The work reported here also lays the basis for the analysis carried in a companion paper, where the critical behaviour in the the head-on collision of two neutron stars is instead considered [2].Comment: 15 pages, 9 figure

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Concern about the spread of the invader seaweed Caulerpa taxifolia var. distichophylla (Chlorophyta: Caulerpales) towards the West Mediterranean

Author: ANDALORO F.
BADALAMENTI F.
MIKAC B.
MIRTO S.
MUSCO L.
VEGA FERNADEZ T.
Publication venue: 'National Documentation Centre (EKT)'
Publication date: 02/04/2014
Field of study

The new Australian alien seaweed Caulerpa taxifolia var. distichophylla, after being established along the Turkish Mediterranean coast in 2006, was recorded in Southern Sicily in 2007. Since then local fishermen claimed support to counteract the effects of entanglement of large amounts of the alien strain wrack in their trammel nets, causing the gear to become ineffective. The further northward and westward spread of the new alien strain is supposed to be limited by winter temperature. We present novel data confirming that the new alien strain is fully naturalized in Central Mediterranean and is expanding its range beyond such limit (i.e. the 15°C February isotherm), thus becoming potentially able to colonize the western basin. By means of a preliminary estimation of effects on native polychaete assemblages, and considering some peculiarities of Sicily (mostly linked to its geographical position in the Mediterranean Sea), the risk linked to the increasing range of distribution of the invasive algae is highlighted

Directory of Open Access Journals

Full-text Institutional Repository of the Ruđer Bošković Institute

National Documentation Centre - EKT journals

Optimal Sketching Bounds for Sparse Linear Regression

Author: Mai Tung
Munteanu Alexander
Musco Cameron
Rao Anup B.
Schwiegelshohn Chris
Woodruff David P.
Publication venue
Publication date: 05/04/2023
Field of study

We study oblivious sketching for

k

-sparse linear regression under various loss functions such as an

\ell_p

norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse

\ell_2

norm regression, there is a distribution over oblivious sketches with

\Theta(k\log(d/k)/\varepsilon^2)

rows, which is tight up to a constant factor. This extends to

\ell_p

loss with an additional additive

O(k\log(k/\varepsilon)/\varepsilon^2)

term in the upper bound. This establishes a surprising separation from the related sparse recovery problem, which is an important special case of sparse regression. For this problem, under the

\ell_2

norm, we observe an upper bound of

O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2)

rows, showing that sparse recovery is strictly easier to sketch than sparse regression. For sparse regression under hinge-like loss functions including sparse logistic and sparse ReLU regression, we give the first known sketching bounds that achieve

o(d)

rows showing that

O(\mu^2 k\log(\mu n d/\varepsilon)/\varepsilon^2)

rows suffice, where

\mu

is a natural complexity parameter needed to obtain relative error bounds for these loss functions. We again show that this dimension is tight, up to lower order terms and the dependence on

\mu

. Finally, we show that similar sketching bounds can be achieved for LASSO regression, a popular convex relaxation of sparse regression, where one aims to minimize

\|Ax-b\|_2^2+\lambda\|x\|_1

over

x\in\mathbb{R}^d

. We show that sketching dimension

O(\log(d)/(\lambda \varepsilon)^2)

suffices and that the dependence on

d

and

\lambda

is tight.Comment: AISTATS 202

arXiv.org e-Print Archive

In vitro fermentation and chemical characteristics of mediterranean by-products for swine nutrition

Author: Calabro S.
Chiofalo B.
Cutrignelli M. I.
Di Rosa A. R.
Liotta L.
Musco N.
Vastolo A.
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

The purpose of the study is to determine the nutritional characteristics of some by-products derived from fruit juice and olive oil production to evaluate their use in pig nutrition. Five by-products of citrus fruit (three citrus fruit pulp and two molasses) and three by-products of olive oil (olive cake) obtained by different varieties are analysed for chemical composition. The fermentation characteristics are evaluated in vitro using the gas production technique with swine faecal inoculum. All the citrus by-products are highly fermentable, producing gas and a high amount of short-chain fatty acids. The fermentation kinetics vary when comparing pulps and molasses. Citrus fruit pulps show lower and slower fermentation rates than molasses. The olive oil by-products, compared to citrus fruits ones, are richer in NDF and ADL. These characteristics negatively affect all the fermentation parameters. Therefore, the high concentration of fiber and lipids represents a key aspect in the nutrition of fattening pigs. The preliminary results obtained in this study confirm that the use of by-products in pig nutrition could represent a valid opportunity the reduce the livestock economic cost and environmental impact

Archivio della ricerca - Università degli studi di Napoli Federico II

Black hole production in tachyonic preheating

Author: Bruce Bassett
Carr B J
Desroche M Felder G N Kratochvil J M Linde A
Dolgov A
Felder G Tkachev I
Hideaki Kudoh
Lindley D
Musco I
Nasel’skii P D
Novikov I D
Takahiro Tanaka
Teruaki Suyama
Yokoyama J
Zel’dovich Ya B
Publication venue: 'IOP Publishing'
Publication date: 26/01/2006
Field of study

We present fully non-linear simulations of a self-interacting scalar field in the early universe undergoing tachyonic preheating. We find that density perturbations on sub-horizon scales which are amplified by tachyonic instability maintain long range correlations even during the succeeding parametric resonance, in contrast to the standard models of preheating dominated by parametric resonance. As a result the final spectrum exhibits memory and is not universal in shape. We find that throughout the subsequent era of parametric resonance the equation of state of the universe is almost dust-like, hence the Jeans wavelength is much smaller than the horizon scale. If our 2D simulations are accurate reflections of the situation in 3D, then there are wide regions of parameter space ruled out by over-production of black holes. It is likely however that realistic parameter values, consistent with COBE/WMAP normalisation, are safetly outside this black hole over-production region.Comment: 6pages, 7figures, figures correcte

arXiv.org e-Print Archive

Crossref