249 research outputs found

    Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

    Full text link
    We show how to approximate a data matrix A\mathbf{A} with a much smaller sketch A~\mathbf{\tilde A} that can be used to solve a general class of constrained k-rank approximation problems to within (1+ϵ)(1+\epsilon) error. Importantly, this class of problems includes kk-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just O(k)O(k) dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For kk-means dimensionality reduction, we provide (1+ϵ)(1+\epsilon) relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover' a good subspace for \bv{A}, but can be used directly to compute this subspace. Finally, for kk-means clustering, we show how to achieve a (9+ϵ)(9+\epsilon) approximation by Johnson-Lindenstrauss projecting data points to just O(logk/ϵ2)O(\log k/\epsilon^2) dimensions. This gives the first result that leverages the specific structure of kk-means to achieve dimension independent of input size and sublinear in kk

    Uniform Sampling for Matrix Approximation

    Full text link
    Random sampling has become a critical tool in solving massive matrix problems. For linear regression, a small, manageable set of data rows can be randomly selected to approximate a tall, skinny data matrix, improving processing time significantly. For theoretical performance guarantees, each row must be sampled with probability proportional to its statistical leverage score. Unfortunately, leverage scores are difficult to compute. A simple alternative is to sample rows uniformly at random. While this often works, uniform sampling will eliminate critical row information for many natural instances. We take a fresh look at uniform sampling by examining what information it does preserve. Specifically, we show that uniform sampling yields a matrix that, in some sense, well approximates a large fraction of the original. While this weak form of approximation is not enough for solving linear regression directly, it is enough to compute a better approximation. This observation leads to simple iterative row sampling algorithms for matrix approximation that run in input-sparsity time and preserve row structure and sparsity at all intermediate steps. In addition to an improved understanding of uniform sampling, our main proof introduces a structural result of independent interest: we show that every matrix can be made to have low coherence by reweighting a small subset of its rows

    Online Row Sampling

    Get PDF
    Finding a small spectral approximation for a tall n×dn \times d matrix AA is a fundamental numerical primitive. For a number of reasons, one often seeks an approximation whose rows are sampled from those of AA. Row sampling improves interpretability, saves space when AA is sparse, and preserves row structure, which is especially important, for example, when AA represents a graph. However, correctly sampling rows from AA can be costly when the matrix is large and cannot be stored and processed in memory. Hence, a number of recent publications focus on row sampling in the streaming setting, using little more space than what is required to store the outputted approximation [KL13, KLM+14]. Inspired by a growing body of work on online algorithms for machine learning and data analysis, we extend this work to a more restrictive online setting: we read rows of AA one by one and immediately decide whether each row should be kept in the spectral approximation or discarded, without ever retracting these decisions. We present an extremely simple algorithm that approximates AA up to multiplicative error ϵ\epsilon and additive error δ\delta using O(dlogdlog(ϵA2/δ)/ϵ2)O(d \log d \log(\epsilon||A||_2/\delta)/\epsilon^2) online samples, with memory overhead proportional to the cost of storing the spectral approximation. We also present an algorithm that uses O(d2O(d^2) memory but only requires O(dlog(ϵA2/δ)/ϵ2)O(d\log(\epsilon||A||_2/\delta)/\epsilon^2) samples, which we show is optimal. Our methods are clean and intuitive, allow for lower memory usage than prior work, and expose new theoretical properties of leverage score based matrix approximation

    Critical Phenomena in Neutron Stars I: Linearly Unstable Nonrotating Models

    Full text link
    We consider the evolution in full general relativity of a family of linearly unstable isolated spherical neutron stars under the effects of very small, perturbations as induced by the truncation error. Using a simple ideal-fluid equation of state we find that this system exhibits a type-I critical behaviour, thus confirming the conclusions reached by Liebling et al. [1] for rotating magnetized stars. Exploiting the relative simplicity of our system, we are able carry out a more in-depth study providing solid evidences of the criticality of this phenomenon and also to give a simple interpretation of the putative critical solution as a spherical solution with the unstable mode being the fundamental F-mode. Hence for any choice of the polytropic constant, the critical solution will distinguish the set of subcritical models migrating to the stable branch of the models of equilibrium from the set of subcritical models collapsing to a black hole. Finally, we study how the dynamics changes when the numerically perturbation is replaced by a finite-size, resolution independent velocity perturbation and show that in such cases a nearly-critical solution can be changed into either a sub or supercritical. The work reported here also lays the basis for the analysis carried in a companion paper, where the critical behaviour in the the head-on collision of two neutron stars is instead considered [2].Comment: 15 pages, 9 figure

    Concern about the spread of the invader seaweed Caulerpa taxifolia var. distichophylla (Chlorophyta: Caulerpales) towards the West Mediterranean

    Get PDF
    The new Australian alien seaweed Caulerpa taxifolia var. distichophylla, after being established along the Turkish Mediterranean coast in 2006, was recorded in Southern Sicily in 2007. Since then local fishermen claimed support to counteract the effects of entanglement of large amounts of the alien strain wrack in their trammel nets, causing the gear to become ineffective. The further northward and westward spread of the new alien strain is supposed to be limited by winter temperature. We present novel data confirming that the new alien strain is fully naturalized in Central Mediterranean and is expanding its range beyond such limit (i.e. the 15°C February isotherm), thus becoming potentially able to colonize the western basin. By means of a preliminary estimation of effects on native polychaete assemblages, and considering some peculiarities of Sicily (mostly linked to its geographical position in the Mediterranean Sea), the risk linked to the increasing range of distribution of the invasive algae is highlighted

    Optimal Sketching Bounds for Sparse Linear Regression

    Full text link
    We study oblivious sketching for kk-sparse linear regression under various loss functions such as an p\ell_p norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse 2\ell_2 norm regression, there is a distribution over oblivious sketches with Θ(klog(d/k)/ε2)\Theta(k\log(d/k)/\varepsilon^2) rows, which is tight up to a constant factor. This extends to p\ell_p loss with an additional additive O(klog(k/ε)/ε2)O(k\log(k/\varepsilon)/\varepsilon^2) term in the upper bound. This establishes a surprising separation from the related sparse recovery problem, which is an important special case of sparse regression. For this problem, under the 2\ell_2 norm, we observe an upper bound of O(klog(d)/ε+klog(k/ε)/ε2)O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2) rows, showing that sparse recovery is strictly easier to sketch than sparse regression. For sparse regression under hinge-like loss functions including sparse logistic and sparse ReLU regression, we give the first known sketching bounds that achieve o(d)o(d) rows showing that O(μ2klog(μnd/ε)/ε2)O(\mu^2 k\log(\mu n d/\varepsilon)/\varepsilon^2) rows suffice, where μ\mu is a natural complexity parameter needed to obtain relative error bounds for these loss functions. We again show that this dimension is tight, up to lower order terms and the dependence on μ\mu. Finally, we show that similar sketching bounds can be achieved for LASSO regression, a popular convex relaxation of sparse regression, where one aims to minimize Axb22+λx1\|Ax-b\|_2^2+\lambda\|x\|_1 over xRdx\in\mathbb{R}^d. We show that sketching dimension O(log(d)/(λε)2)O(\log(d)/(\lambda \varepsilon)^2) suffices and that the dependence on dd and λ\lambda is tight.Comment: AISTATS 202

    In vitro fermentation and chemical characteristics of mediterranean by-products for swine nutrition

    Get PDF
    The purpose of the study is to determine the nutritional characteristics of some by-products derived from fruit juice and olive oil production to evaluate their use in pig nutrition. Five by-products of citrus fruit (three citrus fruit pulp and two molasses) and three by-products of olive oil (olive cake) obtained by different varieties are analysed for chemical composition. The fermentation characteristics are evaluated in vitro using the gas production technique with swine faecal inoculum. All the citrus by-products are highly fermentable, producing gas and a high amount of short-chain fatty acids. The fermentation kinetics vary when comparing pulps and molasses. Citrus fruit pulps show lower and slower fermentation rates than molasses. The olive oil by-products, compared to citrus fruits ones, are richer in NDF and ADL. These characteristics negatively affect all the fermentation parameters. Therefore, the high concentration of fiber and lipids represents a key aspect in the nutrition of fattening pigs. The preliminary results obtained in this study confirm that the use of by-products in pig nutrition could represent a valid opportunity the reduce the livestock economic cost and environmental impact

    Black hole production in tachyonic preheating

    Full text link
    We present fully non-linear simulations of a self-interacting scalar field in the early universe undergoing tachyonic preheating. We find that density perturbations on sub-horizon scales which are amplified by tachyonic instability maintain long range correlations even during the succeeding parametric resonance, in contrast to the standard models of preheating dominated by parametric resonance. As a result the final spectrum exhibits memory and is not universal in shape. We find that throughout the subsequent era of parametric resonance the equation of state of the universe is almost dust-like, hence the Jeans wavelength is much smaller than the horizon scale. If our 2D simulations are accurate reflections of the situation in 3D, then there are wide regions of parameter space ruled out by over-production of black holes. It is likely however that realistic parameter values, consistent with COBE/WMAP normalisation, are safetly outside this black hole over-production region.Comment: 6pages, 7figures, figures correcte
    corecore