Search CORE

3,225 research outputs found

Lazy stochastic principal component analysis

Author: Li Li
Nguyen Dinh
Wojnowicz Michael
Zhao Xuan
Publication venue
Publication date: 21/09/2017
Field of study

Stochastic principal component analysis (SPCA) has become a popular dimensionality reduction strategy for large, high-dimensional datasets. We derive a simplified algorithm, called Lazy SPCA, which has reduced computational complexity and is better suited for large-scale distributed computation. We prove that SPCA and Lazy SPCA find the same approximations to the principal subspace, and that the pairwise distances between samples in the lower-dimensional space is invariant to whether SPCA is executed lazily or not. Empirical studies find downstream predictive performance to be identical for both methods, and superior to random projections, across a range of predictive models (linear regression, logistic lasso, and random forests). In our largest experiment with 4.6 million samples, Lazy SPCA reduced 43.7 hours of computation to 9.9 hours. Overall, Lazy SPCA relies exclusively on matrix multiplications, besides an operation on a small square matrix whose size depends only on the target dimensionality.Comment: To be published in: 2017 IEEE International Conference on Data Mining Workshops (ICDMW

arXiv.org e-Print Archive

Crossref

Random Walks Along the Streets and Canals in Compact Cities: Spectral analysis, Dynamical Modularity, Information, and Statistical Mechanics

Author: B. Hillier
B. Hillier
D. Cvetkovic
D. Volchenkov
E. Nelson
J. Nieminen
J. P. Steadman
L. Lovász
L. Lovász
L. March
M. Fiedler
M. Fiedler
P. Hagget
Ph. Blanchard
Ph. Blanchard
S. Wasserman
W. Feller
Y. Colin de Verdiére
Publication venue: 'American Physical Society (APS)'
Publication date: 15/08/2006
Field of study

Different models of random walks on the dual graphs of compact urban structures are considered. Analysis of access times between streets helps to detect the city modularity. The statistical mechanics approach to the ensembles of lazy random walkers is developed. The complexity of city modularity can be measured by an information-like parameter which plays the role of an individual fingerprint of {\it Genius loci}. Global structural properties of a city can be characterized by the thermodynamical parameters calculated in the random walks problem.Comment: 44 pages, 22 figures, 2 table

arXiv.org e-Print Archive

CiteSeerX

Crossref

Publications at Bielefeld University

CERN Document Server

Tournament Rewards and Risk Taking

Author: Hans K. Hvide
Publication venue
Publication date
Field of study

I consider two seemingly unrelated puzzles; 1.Why is relative performance evaluation (RPE) used less in CEO compensation than agency theory suggests? 2.Why is sometimes, e.g., for fund managers, a mediocre performance more highly rewarded than excellence? I consider a simple tournament model, where agents can influence the spread of output in addition to its mean. I show that standard tournament rewards induce risky and lazy behavior from the agents. This finding sheds light on Puzzle 1. Second, I consider a scheme that ranks agents according to their relative closeness to a benchmar k. I show that there exists intermediate values of k such that the risky-lazy problem of the standard tournament can be mitigated. This result sheds light on Puzzle 2.

Research Papers in Economics

A general and intuitive envelope theorem

Author: Clausen Andrew
Strub Carlo
Publication venue
Publication date: 01/12/2013
Field of study

We present an envelope theorem for establishing first-order conditions in decision problems involving continuous and discrete choices. Our theorem accommodates general dynamic programming problems, even with unbounded marginal utilities. And, unlike classical envelope theorems that focus only on differentiating value functions, we accommodate other endogenous functions such as default probabilities and interest rates. Our main technical ingredient is how we establish the differentiability of a function at a point: we sandwich the function between two differentiable functions from above and below. Our theory is widely applicable. In unsecured credit models, neither interest rates nor continuation values are globally differentiable. Nevertheless, we establish an Euler equation involving marginal prices and values. In adjustment cost models, we show that first-order conditions apply universally, even if optimal policies are not (S,s). Finally, we incorporate indivisible choices into a classic dynamic insurance analysis

Edinburgh Research Explorer

SIRE

Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

Author: Canon Shane
Chhugani Jatin
Demmel James
Devarakonda Aditya
Gerhardt Lisa
Gittens Alex
Harrell Jim
Kottalam Jey
Krishnamurthy Venkat
Liu Jialin
Mahoney Michael W.
Maschhoff Kristyn
Prabhat
Racah Evan
Ringenburg Michael
Sharma Pramod
Yang Jiyan
Publication venue
Publication date: 12/05/2016
Field of study

We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance

arXiv.org e-Print Archive

eScholarship - University of California

On the role of pre and post-processing in environmental data mining

Author: Athanasiadis Ioannis
Comas Joaquim
Gibert Karina
Holmes Geoffrey
Izquierdo Joaquin
Sanchez-Marre Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2008
Field of study

The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

Research Commons@Waikato

Quantum walks can find a marked element on any graph

Author: A Ambainis
A Ambainis
A Tulsi
AM Childs
AM Childs
CM Grinstead
DA Levin
E Farhi
F Magniez
F Magniez
F Magniez
Frédéric Magniez
H Krovi
H Krovi
Hari Krovi
J Kempe
JG Kemeny
Jérémie Roland
LB Koralov
M Varbanov
Maris Ozols
N Shenvi
R Cleve
RA Horn
S Aaronson
U Feige
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/02/2014
Field of study

We solve an open problem by constructing quantum walks that not only detect but also find marked vertices in a graph. In the case when the marked set

M

consists of a single vertex, the number of steps of the quantum walk is quadratically smaller than the classical hitting time

HT(P,M)

of any reversible random walk

P

on the graph. In the case of multiple marked elements, the number of steps is given in terms of a related quantity

HT^+(\mathit{P,M})

which we call extended hitting time. Our approach is new, simpler and more general than previous ones. We introduce a notion of interpolation between the random walk

P

and the absorbing walk

P'

, whose marked states are absorbing. Then our quantum walk is simply the quantum analogue of this interpolation. Contrary to previous approaches, our results remain valid when the random walk

P

is not state-transitive. We also provide algorithms in the cases when only approximations or bounds on parameters

p_M

(the probability of picking a marked vertex from the stationary distribution) and

HT^+(\mathit{P,M})

are known.Comment: 50 page

arXiv.org e-Print Archive

Crossref

HAL Descartes

DI-fusion