Search CORE

8,161 research outputs found

Deterministic algorithms for skewed matrix products

Author: Kutzkov Konstantin
Publication venue
Publication date: 20/09/2012
Field of study

Recently, Pagh presented a randomized approximation algorithm for the multiplication of real-valued matrices building upon work for detecting the most frequent items in data streams. We continue this line of research and present new {\em deterministic} matrix multiplication algorithms. Motivated by applications in data mining, we first consider the case of real-valued, nonnegative

n

-by-

n

input matrices

A

and

B

, and show how to obtain a deterministic approximation of the weights of individual entries, as well as the entrywise

p

-norm, of the product

AB

. The algorithm is simple, space efficient and runs in one pass over the input matrices. For a user defined

b \in (0, n^2)

the algorithm runs in time

O(nb + n\cdot\text{Sort}(n))

and space

O(n + b)

and returns an approximation of the entries of

AB

within an additive factor of

\|AB\|_{E1}/b

, where

\|C\|_{E1} = \sum_{i, j} |C_{ij}|

is the entrywise 1-norm of a matrix

C

and

\text{Sort}(n)

is the time required to sort

n

real numbers in linear space. Building upon a result by Berinde et al. we show that for skewed matrix products (a common situation in many real-life applications) the algorithm is more efficient and achieves better approximation guarantees than previously known randomized algorithms. When the input matrices are not restricted to nonnegative entries, we present a new deterministic group testing algorithm detecting nonzero entries in the matrix product with large absolute value. The algorithm is clearly outperformed by randomized matrix multiplication algorithms, but as a byproduct we obtain the first

O(n^{2 + \varepsilon})

-time deterministic algorithm for matrix products with

O(\sqrt{n})

nonzero entries

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

The IT University of Copenhagen's Repository

On the Number of Iterations for Dantzig-Wolfe Optimization and Packing-Covering Approximation Algorithms

Author: A. V. Goldberg
F. Shahrokhi
G. B. Dantzig
J. F. Benders
J. F. Shapiro
L. G. Khachiyan
L. R. Ford
M. D. Grigoriadis
M. Held
M. Held
P. Klein
S. Plotkin
T. Leighton
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

We give a lower bound on the iteration complexity of a natural class of Lagrangean-relaxation algorithms for approximately solving packing/covering linear programs. We show that, given an input with

m

random 0/1-constraints on

n

variables, with high probability, any such algorithm requires

\Omega(\rho \log(m)/\epsilon^2)

iterations to compute a

(1+\epsilon)

-approximate solution, where

\rho

is the width of the input. The bound is tight for a range of the parameters

(m,n,\rho,\epsilon)

. The algorithms in the class include Dantzig-Wolfe decomposition, Benders' decomposition, Lagrangean relaxation as developed by Held and Karp [1971] for lower-bounding TSP, and many others (e.g. by Plotkin, Shmoys, and Tardos [1988] and Grigoriadis and Khachiyan [1996]). To prove the bound, we use a discrepancy argument to show an analogous lower bound on the support size of

(1+\epsilon)

-approximate mixed strategies for random two-player zero-sum 0/1-matrix games

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Improved bounds on sample size for implicit matrix trace estimators

Author: Ascher Uri
Roosta-Khorasani Farbod
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/07/2014
Field of study

This article is concerned with Monte-Carlo methods for the estimation of the trace of an implicitly given matrix

A

whose information is only available through matrix-vector products. Such a method approximates the trace by an average of

N

expressions of the form \ww^t (A\ww), with random vectors \ww drawn from an appropriate distribution. We prove, discuss and experiment with bounds on the number of realizations

N

required in order to guarantee a probabilistic bound on the relative error of the trace estimation upon employing Rademacher (Hutchinson), Gaussian and uniform unit vector (with and without replacement) probability distributions. In total, one necessary bound and six sufficient bounds are proved, improving upon and extending similar estimates obtained in the seminal work of Avron and Toledo (2011) in several dimensions. We first improve their bound on

N

for the Hutchinson method, dropping a term that relates to

rank(A)

and making the bound comparable with that for the Gaussian estimator. We further prove new sufficient bounds for the Hutchinson, Gaussian and the unit vector estimators, as well as a necessary bound for the Gaussian estimator, which depend more specifically on properties of the matrix

A

. As such they may suggest for what type of matrices one distribution or another provides a particularly effective or relatively ineffective stochastic estimation method

arXiv.org e-Print Archive

University of Queensland eSpace

Kronecker Graphs: An Approach to Modeling Networks

Author: Chakrabarti D
Faloutsos C
Ghahramani Z
Kleinberg J
Leskovec J
Publication venue
Publication date: 21/08/2009
Field of study

How can we model networks with a mathematically tractable model that allows for rigorous analysis of network properties? Networks exhibit a long list of surprising properties: heavy tails for the degree distribution; small diameters; and densification and shrinking diameters over time. Most present network models either fail to match several of the above properties, are complicated to analyze mathematically, or both. In this paper we propose a generative model for networks that is both mathematically tractable and can generate networks that have the above mentioned properties. Our main idea is to use the Kronecker product to generate graphs that we refer to as "Kronecker graphs". First, we prove that Kronecker graphs naturally obey common network properties. We also provide empirical evidence showing that Kronecker graphs can effectively model the structure of real networks. We then present KronFit, a fast and scalable algorithm for fitting the Kronecker graph generation model to large real networks. A naive approach to fitting would take super- exponential time. In contrast, KronFit takes linear time, by exploiting the structure of Kronecker matrix multiplication and by using statistical simulation techniques. Experiments on large real and synthetic networks show that KronFit finds accurate parameters that indeed very well mimic the properties of target networks. Once fitted, the model parameters can be used to gain insights about the network structure, and the resulting synthetic graphs can be used for null- models, anonymization, extrapolations, and graph summarization

arXiv.org e-Print Archive

CiteSeerX

CUED - Cambridge University Engineering Department

Stochastic partial differential equation based modelling of large space-time data sets

Author: Abramowitz
Anderson
Antolik
Applequist
Banerjee
Banerjee
Bardossy
Bell
Berrocal
Borgman
Bremnes
Bronson
Brooks
Brown
Cameletti
Carter
Coe
Cooley
Cramér
Cressie
Cressie
Cressie
Duan
Dudgeon
Folland
Friederichs
Frühwirth-Schnatter
Fuentes
Furrer
Gelfand
Gelfand
Gelman
Gilks
Gneiting
Gneiting
Gneiting
Gneiting
Gneiting
Golightly
Gottlieb
Haberman
Hamill
Hamill
Handcock
Hastings
Heine
Huang
Hutchinson
Johannesson
Jones
Kleiber
Künsch
Lindgren
Ma
Malmberg
Matheson
Metropolis
Neal
Nychka
Paciorek
Paciorek
Palmer
Pedlosky
Ramrez
Robert
Roberts
Roberts
Roberts
Royle
Rue
Rue
Sampson
Sansó
Sansó
Shumway
Sigrist
Simpson
Sloughter
Smith
Solna
Stein
Stein
Stein
Stein
Stensrud
Steppeler
Stidd
Storvik
Stroud
Tobin
Vecchia
Vivar
Whittle
Whittle
Whittle
Wikle
Wikle
Wikle
Wikle
Wikle
Wikle
Wikle
Wilks
Wilks
Xu
Xu
Yussouf
Zheng
Publication venue: 'Wiley'
Publication date: 11/02/2016
Field of study

Increasingly larger data sets of processes in space and time ask for statistical models and methods that can cope with such data. We show that the solution of a stochastic advection-diffusion partial differential equation provides a flexible model class for spatio-temporal processes which is computationally feasible also for large data sets. The Gaussian process defined through the stochastic partial differential equation has in general a nonseparable covariance structure. Furthermore, its parameters can be physically interpreted as explicitly modeling phenomena such as transport and diffusion that occur in many natural processes in diverse fields ranging from environmental sciences to ecology. In order to obtain computationally efficient statistical algorithms we use spectral methods to solve the stochastic partial differential equation. This has the advantage that approximation errors do not accumulate over time, and that in the spectral space the computational cost grows linearly with the dimension, the total computational costs of Bayesian or frequentist inference being dominated by the fast Fourier transform. The proposed model is applied to postprocessing of precipitation forecasts from a numerical weather prediction model for northern Switzerland. In contrast to the raw forecasts from the numerical model, the postprocessed forecasts are calibrated and quantify prediction uncertainty. Moreover, they outperform the raw forecasts, in the sense that they have a lower mean absolute error

arXiv.org e-Print Archive

Crossref