Search CORE

49 research outputs found

Fast approximation of matrix coherence and statistical leverage

Author: David P. Woodruff
Malik Magdon-ismail
Mehryar Mohri
Michael W. Mahoney
Petros Drineas
Publication venue
Publication date: 01/01/2011
Field of study

The statistical leverage scores of a matrix

A

are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nystr\"{o}m-based low-rank matrix approximation as well as in large-scale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary

n \times d

matrix

A

, with

n \gg d

, and that returns as output relative-error approximations to all

n

of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of

n

and

d

) in

O(n d \log n)

time, as opposed to the

O(nd^2)

time required by the na\"{i}ve algorithm that involves computing an orthogonal basis for the range of

A

. Our analysis may be viewed in terms of computing a relative-error approximation to an underconstrained least-squares approximation problem, or, relatedly, it may be viewed as an application of Johnson-Lindenstrauss type ideas. Several practically-important extensions of our basic result are also described, including the approximation of so-called cross-leverage scores, the extension of these ideas to matrices with

n \approx d

, and the extension to streaming environments.Comment: 29 pages; conference version is in ICML; journal version is in JML

arXiv.org e-Print Archive

CiteSeerX

Topics in Matrix Sampling Algorithms

Author: Christos Boutsidis
Christos Boutsidis
Kristin P. Bennett
Kristin P. Bennett
Malik Magdon-ismail Member
Mark Tygert Member
Mark Tygert Member
Michael W. Mahoney
Michael W. Mahoney
Sanmay Das Member
Publication venue
Publication date: 01/01/2011
Field of study

We study three fundamental problems of Linear Algebra, lying in the heart of various Machine Learning applications, namely: 1)"Low-rank Column-based Matrix Approximation". We are given a matrix A and a target rank k. The goal is to select a subset of columns of A and, by using only these columns, compute a rank k approximation to A that is as good as the rank k approximation that would have been obtained by using all the columns; 2) "Coreset Construction in Least-Squares Regression". We are given a matrix A and a vector b. Consider the (over-constrained) least-squares problem of minimizing ||Ax-b||, over all vectors x in D. The domain D represents the constraints on the solution and can be arbitrary. The goal is to select a subset of the rows of A and b and, by using only these rows, find a solution vector that is as good as the solution vector that would have been obtained by using all the rows; 3) "Feature Selection in K-means Clustering". We are given a set of points described with respect to a large number of features. The goal is to select a subset of the features and, by using only this subset, obtain a k-partition of the points that is as good as the partition that would have been obtained by using all the features. We present novel algorithms for all three problems mentioned above. Our results can be viewed as follow-up research to a line of work known as "Matrix Sampling Algorithms". [Frieze, Kanna, Vempala, 1998] presented the first such algorithm for the Low-rank Matrix Approximation problem. Since then, such algorithms have been developed for several other problems, e.g. Graph Sparsification and Linear Equation Solving. Our contributions to this line of research are: (i) improved algorithms for Low-rank Matrix Approximation and Regression (ii) algorithms for a new problem domain (K-means Clustering).Comment: PhD Thesis, 150 page

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server

Understanding forest health with Remote sensing-Part II-A review of approaches and data models

Author: Erasmi S. (Stefan)
Heurich M. (Marco)
King D. (Douglas J.)
Lausch A. (Angela)
Magdon P. (Paul)
Publication venue: 'MDPI AG'
Publication date: 01/01/2017
Field of study

Stress in forest ecosystems (FES) occurs as a result of land-use intensification, disturbances, resource limitations or unsustainable management, causing changes in forest health (FH) at various scales from the local to the global scale. Reactions to such stress depend on the phylogeny of forest species or communities and the characteristics of their impacting drivers and processes. There are many approaches to monitor indicators of FH using in-situ forest inventory and experimental studies, but they are generally limited to sample points or small areas, as well as being time- and labour-inte

Multidisciplinary Digital Publishing Institute

GEO-LEOe-docs

Crossref

Carleton University's Institutional Repository

Directory of Open Access Journals

Extending the definition of modularity to directed graphs with overlapping communities

Author: Arenas A
Baumes J Goldberg M Magdon-Ismail M
Brede M Sinha S
Danon L
Fortunato S
G Mangioni
Holme P
Lancichinetti A
M Malgeri
Newman M E J
Palla G
Tasgin M Herdagdelen A Bingol H
V Carchiolo
V Nicosia
Publication venue: 'IOP Publishing'
Publication date: 24/03/2009
Field of study

Complex networks topologies present interesting and surprising properties, such as community structures, which can be exploited to optimize communication, to find new efficient and context-aware routing algorithms or simply to understand the dynamics and meaning of relationships among nodes. Complex networks are gaining more and more importance as a reference model and are a powerful interpretation tool for many different kinds of natural, biological and social networks, where directed relationships and contextual belonging of nodes to many different communities is a matter of fact. This paper starts from the definition of modularity function, given by M. Newman to evaluate the goodness of network community decompositions, and extends it to the more general case of directed graphs with overlapping community structures. Interesting properties of the proposed extension are discussed, a method for finding overlapping communities is proposed and results of its application to benchmark case-studies are reported. We also propose a new dataset which could be used as a reference benchmark for overlapping community structures identification.Comment: 22 pages, 11 figure

arXiv.org e-Print Archive

Crossref

Radar vision in the mapping of forest biodiversity from space

Author: Bae S.
Bässler C.
Culmsee H.
Doerfler I.
Fischer M.
Gerlach T.
Gossner M.
Heibl C.
Heidrich L.
Heurich M.
Hothorn T.
Jung K.
Krah F.
Krzystek P.
Leutner B.
Levick S.
Magdon P.
Müller J.
Nauss T.
Schall P.
Schulze E.
Seibold S.
Serebryanyk A.
Thorn S.
Weisser W.
Wöllauer S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/10/2019
Field of study

Recent progress in remote sensing provides much-needed, large-scale spatio-temporal information on habitat structures important for biodiversity conservation. Here we examine the potential of a newly launched satellite-borne radar system (Sentinel-1) to map the biodiversity of twelve taxa across five temperate forest regions in central Europe. We show that the sensitivity of radar to habitat structure is similar to that of airborne laser scanning (ALS), the current gold standard in the measurement of forest structure. Our models of different facets of biodiversity reveal that radar performs as well as ALS; median R² over twelve taxa by ALS and radar are 0.51 and 0.57 respectively for the first non-metric multidimensional scaling axes representing assemblage composition. We further demonstrate the promising predictive ability of radar-derived data with external validation based on the species composition of birds and saproxylic beetles. Establishing new area-wide biodiversity monitoring by remote sensing will require the coupling of radar data to stratified and standardized collected local species data

Institute of Transport Research:Publications

Bern Open Repository and Information System (BORIS)

MPG.PuRe

A Survey of Bayesian Statistical Approaches for Big Data

Author: A Akusok
A Baldominos
A Belle
A Beskos
A Bouchard-Côté
A De Mauro
A Fahad
A Gandomi
A Lee
A Lee
A Marshall
A O’Driscoll
A Siddiqa
A Vyas
AB Owen
AF Wise
AR Linero
AT Azar
AT Porter
AT Porter
AÇ Pehlivanlı
B Franke
B Liquet
B Liu
B Oancea
C Loebbecke
C Wang
C Wang
C Yang
CA McGrory
CC Drovandi
CE Rasmussen
Changwon Yoo
CK Emani
D Apiletti
D Oprea
D Talia
DB Dunson
DM Blei
DN Politis
DT Frazier
DV Shah
DW Bates
E Raguseo
ED Schifano
ET Bradlow
F Lindsten
Florian Buettner
Florian Maire
G Bello-Orgaz
G Jifa
GI Allen
GJ Lasinio
GM Allenby
H Cai
H Demirkan
H Hassani
H Kousar
HA Chipman
HH Huang
HJ Watson
I Ben-Gal
J Fan
J Roski
J Zhu
Jake Luo
JE Bibault
JJ Chen
JN Cappella
JS Rumsfeld
K Chalupka
Kath Albury
KL Mengersen
KS Divya
L Breiman
L Liu
L Mählmann
L Wang
L Yu
L Zhang
L Zhou
LG Nongxa
M Hilbert
M Viceconti
MA Suchard
Matias Quiroz
MD Assunção
MD Hoffman
MT Moores
N Moustafa
N. Chopin
NA Lazar
O Sysoev
Oliver Müller
OY Al-Jarrah
P Ducange
P Müller
P Pudlo
PF Brennan
R Bardenet
R Burrows
R Guhaniyogi
R Guhaniyogi
R Guhaniyogi
R Izbicki
RF Mansour
Richard Branch
Robin Genuer
RW Hoerl
S Atkinson
S Castruccio
S Chaudhuri
S Fosso Wamba
S Guha
S Kaisler
S Li
S Minsker
S Pandey
S Sagiroglu
S Sisson
S Srivastava
S Suthaharan
S White
SF Wamba
Shahriar Akter
Shweta Bansal
Simon I. Hay
SL Scott
SM Schennach
Sudipto Banerjee
T Magdon-Ismail
T Zhang
Tengyao Wang
TH McCormick
TJ McKinley
U Sivarajah
VD Katkar
X Zhang
XF Wang
XG Xia
Xing Ju Lee
Y Tang
Y Webb-Vargas
Y Zhang
Yang Ni
YW Teh
Z Ma
Z Sun
Z Zhang
Ziad Obermeyer
Zoubin Ghahramani
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/05/2020
Field of study

The modern era is characterised as an era of information or Big Data. This has motivated a huge literature on new methods for extracting information and insights from these data. A natural question is how these approaches differ from those that were available prior to the advent of Big Data. We present a review of published studies that present Bayesian statistical approaches specifically for Big Data and discuss the reported and perceived benefits of these approaches. We conclude by addressing the question of whether focusing only on improving computational algorithms and infrastructure will be enough to face the challenges of Big Data

arXiv.org e-Print Archive

Crossref

Queensland University of Technology ePrints Archive

Uncertainties of forest area estimates caused by the minimum crown cover criterion

Author: A Strahler
A Zingg
B Bennett
C Kleinn
C Kleinn
C Kleinn
C Kleinn
C Kleinn
C Vidal
C Woodcock
Christoph Kleinn
D Ko
D Zheng
DE Myers
DJ Mareceau
F Colson
G Matheron
HG Lund
JA Blackard
L Eysn
L Korhonen
L Korhonen
L Mathys
L Verchot
M Hansen
M Lawrence
M Nelson
M Schlather
MA Wulder
N Sasaki
P Atkinson
P Lemmon
Paul Magdon
R DeFries
R Development Core Team
R Zomer
RE McRoberts
RE McRoberts
S Berberoglu
SB Jennings
T Geschwantner
T Gregoire
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

PAMR: Passive aggressive mean reversion strategy for portfolio selection

Crossref

<em>CanopyShotNoise</em> – An individual-based tree canopy modelling framework for projecting remote-sensing data and ecological sensitivity analysis

Author: Gaulton R
Magdon P
Myllymaki M
Pommerening A
Publication venue: Taylor and Francis
Publication date
Field of study

Newcastle University E-Prints

Distance Matrix Reconstruction from Incomplete Distance Information for Sensor Network Localization

Author: Andreas Savvides
Gopal P
Malik Magdon-ismail
Petros Drineas
Reino Virrankoski
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

This paper initiates the principled study of distance reconstruction for distance-based node localization. We address an important issue in node localization by showing that the highly incomplete set of inter-node distance measurements obtained in ad-hoc node deployments carries sufficient information for the accurate reconstruction of the missing distances, even in the presence of noise. We provide an efficient and provably accurate algorithm for this reconstruction, and we show that the resulting error is bounded, decreasing at a rate that is inversely proportional to √ n, the square root of the number of nodes in the region of deployment. Although this result is applicable to many localization schemes, in this paper we illustrate its use in conjunction with the popular MultiDimensional Scaling algorithm. Our analysis reveals valuable insights and key factors to consider during the sensor network setup phase, to improve the quality of the position estimates. 1

CiteSeerX

Crossref