Search CORE

489 research outputs found

Revisiting the Nystrom Method for Improved Large-Scale Machine Learning

Author: Gittens Alex
Mahoney Michael W.
Publication venue
Publication date: 01/06/2013
Field of study

We reconsider randomized algorithms for the low-rank approximation of symmetric positive semi-definite (SPSD) matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods; they characterize the effects of common data preprocessing steps on the performance of these algorithms; and they point to important differences between uniform sampling and nonuniform sampling methods based on leverage scores. In addition, our empirical results illustrate that existing theory is so weak that it does not provide even a qualitative guide to practice. Thus, we complement our empirical results with a suite of worst-case theoretical bounds for both random sampling and random projection methods. These bounds are qualitatively superior to existing bounds---e.g. improved additive-error bounds for spectral and Frobenius norm error and relative-error bounds for trace norm error---and they point to future directions to make these algorithms useful in even larger-scale machine learning applications.Comment: 60 pages, 15 color figures; updated proof of Frobenius norm bounds, added comparison to projection-based low-rank approximations, and an analysis of the power method applied to SPSD sketche

arXiv.org e-Print Archive

CiteSeerX

Caltech Authors

Fixed-Rank Approximation of a Positive-Semidefinite Matrix from Streaming Data

Author: Cevher Volkan
Tropp Joel A.
Udell Madeleine
Yurtsever Alp
Publication venue
Publication date: 18/05/2017
Field of study

Several important applications, such as streaming PCA and semidefinite programming, involve a large-scale positive-semidefinite (psd) matrix that is presented as a sequence of linear updates. Because of storage limitations, it may only be possible to retain a sketch of the psd matrix. This paper develops a new algorithm for fixed-rank psd approximation from a sketch. The approach combines the Nystrom approximation with a novel mechanism for rank truncation. Theoretical analysis establishes that the proposed method can achieve any prescribed relative error in the Schatten 1-norm and that it exploits the spectral decay of the input matrix. Computer experiments show that the proposed method dominates alternative techniques for fixed-rank psd matrix approximation across a wide range of examples

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Caltech Authors

Precise expressions for random projections: Low-rank approximation and randomized Newton

Author: Dereziński Michał
Liang Feynman
Liao Zhenyu
Mahoney Michael W.
Publication venue
Publication date: 11/12/2020
Field of study

It is often desirable to reduce the dimensionality of a large dataset by projecting it onto a low-dimensional subspace. Matrix sketching has emerged as a powerful technique for performing such dimensionality reduction very efficiently. Even though there is an extensive literature on the worst-case performance of sketching, existing guarantees are typically very different from what is observed in practice. We exploit recent developments in the spectral analysis of random matrices to develop novel techniques that provide provably accurate expressions for the expected value of random projection matrices obtained via sketching. These expressions can be used to characterize the performance of dimensionality reduction in a variety of common machine learning tasks, ranging from low-rank approximation to iterative stochastic optimization. Our results apply to several popular sketching methods, including Gaussian and Rademacher sketches, and they enable precise analysis of these methods in terms of spectral properties of the data. Empirical results show that the expressions we derive reflect the practical performance of these sketching methods, down to lower-order effects and even constant factors.Comment: Minor corrections and clarifications of the previous version, including additional discussion in Appendix A.

arXiv.org e-Print Archive

Learning with SGD and Random Features

Author: Carratino Luigi
Rosasco Lorenzo
Rudi Alessandro
Publication venue
Publication date: 01/12/2018
Field of study

Sketching and stochastic gradient methods are arguably the most common techniques to derive efficient large scale learning algorithms. In this paper, we investigate their application in the context of nonparametric statistical learning. More precisely, we study the estimator defined by stochastic gradient with mini batches and random features. The latter can be seen as form of nonlinear sketching and used to define approximate kernel methods. The considered estimator is not explicitly penalized/constrained and regularization is implicit. Indeed, our study highlights how different parameters, such as number of features, iterations, step-size and mini-batch size control the learning properties of the solutions. We do this by deriving optimal finite sample bounds, under standard assumptions. The obtained results are corroborated and illustrated by numerical experiments

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Aquatic macroinvertebrate responses to native and non-native predators

Author: Christmas M
Dunn AM
Haddaway NR
Mortimer RJG
Vieille D
Publication venue: 'EDP Sciences'
Publication date: 01/01/2014
Field of study

Non-native species can profoundly affect native ecosystems through trophic interactions with native species. Native prey may respond differently to non-native versus native predators since they lack prior experience. Here we investigate antipredator responses of two common freshwater macroinvertebrates, Gammarus pulex and Potamopyrgus jenkinsi, to olfactory cues from three predators; sympatric native fish (Gasterosteus aculeatus), sympatric native crayfish (Austropotamobius pallipes), and novel invasive crayfish (Pacifastacus leniusculus). G. pulex responded differently to fish and crayfish; showing enhanced locomotion in response to fish, but a preference for the dark over the light in response to the crayfish. P. jenkinsi showed increased vertical migration in response to all three predator cues relative to controls. These different responses to fish and crayfish are hypothesised to reflect the predators’ differing predation types; benthic for crayfish and pelagic for fish. However, we found no difference in response to native versus invasive crayfish, indicating that prey naiveté is unlikely to drive the impacts of invasive crayfish. The Predator Recognition Continuum Hypothesis proposes that benefits of generalisable predator recognition outweigh costs when predators are diverse. Generalised responses of prey as observed here will be adaptive in the presence of an invader, and may reduce novel predators’ potential impacts

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Nottingham Trent Institutional Repository (IRep)

Directory of Open Access Journals

York St John University Institutional Repository

White Rose Research Online

Bangor University Research Portal

Researching intercultural participatory design

Author: Bolmsten Johan
Publication venue: School of Information Technology Murdoch University
Publication date: 01/01/2010
Field of study

What impact does culture have on tools and techniques that are used to\ud facilitate cooperation amongst stakeholders in Information Communication\ud Technology (ICT) design projects? This is a question facing the ICT development\ud activities at the World Maritime University in Malmö, Sweden. At the university\ud around 300 staff and students from 90 different countries come together every\ud year. Continuously finding ways to improve how they can actively participate in\ud design activities of useful and usable ICT support to benefit their everyday work\ud is a prioritized area. This short paper presents a case that illustrates the\ud intertwined and negotiated characteristics of culture when working with tools and\ud techniques for cooperation in a student ICT design project. Using the case, an\ud ethnographically based research cycle is explored to make sense of and ultimately\ud further improve the interactions between the actors in an intercultural application\ud domain

Elektronisch archivierte Theorie - Sammelpunkt

Streaming Tensor Train Approximation

Author: Kressner Daniel
Vandereycken Bart
Voorhaar Rik
Publication venue
Publication date: 04/08/2022
Field of study

Tensor trains are a versatile tool to compress and work with high-dimensional data and functions. In this work we introduce the Streaming Tensor Train Approximation (STTA), a new class of algorithms for approximating a given tensor

\mathcal T

in the tensor train format. STTA accesses

\mathcal T

exclusively via two-sided random sketches of the original data, making it streamable and easy to implement in parallel -- unlike existing deterministic and randomized tensor train approximations. This property also allows STTA to conveniently leverage structure in

\mathcal T

, such as sparsity and various low-rank tensor formats, as well as linear combinations thereof. When Gaussian random matrices are used for sketching, STTA is admissible to an analysis that builds and extends upon existing results on the generalized Nystr\"om approximation for matrices. Our results show that STTA can be expected to attain a nearly optimal approximation error if the sizes of the sketches are suitably chosen. A range of numerical experiments illustrates the performance of STTA compared to existing deterministic and randomized approaches.Comment: 21 pages, code available at https://github.com/RikVoorhaar/tt-sketc

arXiv.org e-Print Archive