3,225 research outputs found
Lazy stochastic principal component analysis
Stochastic principal component analysis (SPCA) has become a popular
dimensionality reduction strategy for large, high-dimensional datasets. We
derive a simplified algorithm, called Lazy SPCA, which has reduced
computational complexity and is better suited for large-scale distributed
computation. We prove that SPCA and Lazy SPCA find the same approximations to
the principal subspace, and that the pairwise distances between samples in the
lower-dimensional space is invariant to whether SPCA is executed lazily or not.
Empirical studies find downstream predictive performance to be identical for
both methods, and superior to random projections, across a range of predictive
models (linear regression, logistic lasso, and random forests). In our largest
experiment with 4.6 million samples, Lazy SPCA reduced 43.7 hours of
computation to 9.9 hours. Overall, Lazy SPCA relies exclusively on matrix
multiplications, besides an operation on a small square matrix whose size
depends only on the target dimensionality.Comment: To be published in: 2017 IEEE International Conference on Data Mining
Workshops (ICDMW
Random Walks Along the Streets and Canals in Compact Cities: Spectral analysis, Dynamical Modularity, Information, and Statistical Mechanics
Different models of random walks on the dual graphs of compact urban
structures are considered. Analysis of access times between streets helps to
detect the city modularity. The statistical mechanics approach to the ensembles
of lazy random walkers is developed. The complexity of city modularity can be
measured by an information-like parameter which plays the role of an individual
fingerprint of {\it Genius loci}.
Global structural properties of a city can be characterized by the
thermodynamical parameters calculated in the random walks problem.Comment: 44 pages, 22 figures, 2 table
Tournament Rewards and Risk Taking
I consider two seemingly unrelated puzzles; 1.Why is relative performance evaluation (RPE) used less in CEO compensation than agency theory suggests? 2.Why is sometimes, e.g., for fund managers, a mediocre performance more highly rewarded than excellence? I consider a simple tournament model, where agents can influence the spread of output in addition to its mean. I show that standard tournament rewards induce risky and lazy behavior from the agents. This finding sheds light on Puzzle 1. Second, I consider a scheme that ranks agents according to their relative closeness to a benchmar k. I show that there exists intermediate values of k such that the risky-lazy problem of the standard tournament can be mitigated. This result sheds light on Puzzle 2.
A general and intuitive envelope theorem
We present an envelope theorem for establishing first-order conditions in decision problems involving continuous and discrete choices. Our theorem accommodates general dynamic programming problems, even with unbounded marginal utilities. And, unlike classical envelope theorems that focus only on differentiating value functions, we accommodate other endogenous functions such as default probabilities and interest rates. Our main technical ingredient is how we establish the differentiability of a function at a point: we sandwich the function between two differentiable functions from above and below. Our theory is widely applicable. In unsecured credit models, neither interest rates nor continuation values are globally differentiable. Nevertheless, we establish an Euler equation involving marginal prices and values. In adjustment cost models, we show that first-order conditions apply universally, even if optimal policies are not (S,s). Finally, we incorporate indivisible choices into a classic dynamic insurance analysis
Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies
We explore the trade-offs of performing linear algebra using Apache Spark,
compared to traditional C and MPI implementations on HPC platforms. Spark is
designed for data analytics on cluster computing platforms with access to local
disks and is optimized for data-parallel tasks. We examine three widely-used
and important matrix factorizations: NMF (for physical plausability), PCA (for
its ubiquity) and CX (for data interpretability). We apply these methods to
TB-sized problems in particle physics, climate modeling and bioimaging. The
data matrices are tall-and-skinny which enable the algorithms to map
conveniently into Spark's data-parallel model. We perform scaling experiments
on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide
tuning guidance to obtain high performance
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
Quantum walks can find a marked element on any graph
We solve an open problem by constructing quantum walks that not only detect
but also find marked vertices in a graph. In the case when the marked set
consists of a single vertex, the number of steps of the quantum walk is
quadratically smaller than the classical hitting time of any
reversible random walk on the graph. In the case of multiple marked
elements, the number of steps is given in terms of a related quantity
which we call extended hitting time.
Our approach is new, simpler and more general than previous ones. We
introduce a notion of interpolation between the random walk and the
absorbing walk , whose marked states are absorbing. Then our quantum walk
is simply the quantum analogue of this interpolation. Contrary to previous
approaches, our results remain valid when the random walk is not
state-transitive. We also provide algorithms in the cases when only
approximations or bounds on parameters (the probability of picking a
marked vertex from the stationary distribution) and are
known.Comment: 50 page
- âŠ