3,225 research outputs found

    Lazy stochastic principal component analysis

    Full text link
    Stochastic principal component analysis (SPCA) has become a popular dimensionality reduction strategy for large, high-dimensional datasets. We derive a simplified algorithm, called Lazy SPCA, which has reduced computational complexity and is better suited for large-scale distributed computation. We prove that SPCA and Lazy SPCA find the same approximations to the principal subspace, and that the pairwise distances between samples in the lower-dimensional space is invariant to whether SPCA is executed lazily or not. Empirical studies find downstream predictive performance to be identical for both methods, and superior to random projections, across a range of predictive models (linear regression, logistic lasso, and random forests). In our largest experiment with 4.6 million samples, Lazy SPCA reduced 43.7 hours of computation to 9.9 hours. Overall, Lazy SPCA relies exclusively on matrix multiplications, besides an operation on a small square matrix whose size depends only on the target dimensionality.Comment: To be published in: 2017 IEEE International Conference on Data Mining Workshops (ICDMW

    Random Walks Along the Streets and Canals in Compact Cities: Spectral analysis, Dynamical Modularity, Information, and Statistical Mechanics

    Get PDF
    Different models of random walks on the dual graphs of compact urban structures are considered. Analysis of access times between streets helps to detect the city modularity. The statistical mechanics approach to the ensembles of lazy random walkers is developed. The complexity of city modularity can be measured by an information-like parameter which plays the role of an individual fingerprint of {\it Genius loci}. Global structural properties of a city can be characterized by the thermodynamical parameters calculated in the random walks problem.Comment: 44 pages, 22 figures, 2 table

    Tournament Rewards and Risk Taking

    Get PDF
    I consider two seemingly unrelated puzzles; 1.Why is relative performance evaluation (RPE) used less in CEO compensation than agency theory suggests? 2.Why is sometimes, e.g., for fund managers, a mediocre performance more highly rewarded than excellence? I consider a simple tournament model, where agents can influence the spread of output in addition to its mean. I show that standard tournament rewards induce risky and lazy behavior from the agents. This finding sheds light on Puzzle 1. Second, I consider a scheme that ranks agents according to their relative closeness to a benchmar k. I show that there exists intermediate values of k such that the risky-lazy problem of the standard tournament can be mitigated. This result sheds light on Puzzle 2.

    A general and intuitive envelope theorem

    Get PDF
    We present an envelope theorem for establishing first-order conditions in decision problems involving continuous and discrete choices. Our theorem accommodates general dynamic programming problems, even with unbounded marginal utilities. And, unlike classical envelope theorems that focus only on differentiating value functions, we accommodate other endogenous functions such as default probabilities and interest rates. Our main technical ingredient is how we establish the differentiability of a function at a point: we sandwich the function between two differentiable functions from above and below. Our theory is widely applicable. In unsecured credit models, neither interest rates nor continuation values are globally differentiable. Nevertheless, we establish an Euler equation involving marginal prices and values. In adjustment cost models, we show that first-order conditions apply universally, even if optimal policies are not (S,s). Finally, we incorporate indivisible choices into a classic dynamic insurance analysis

    Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

    Full text link
    We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Quantum walks can find a marked element on any graph

    Full text link
    We solve an open problem by constructing quantum walks that not only detect but also find marked vertices in a graph. In the case when the marked set MM consists of a single vertex, the number of steps of the quantum walk is quadratically smaller than the classical hitting time HT(P,M)HT(P,M) of any reversible random walk PP on the graph. In the case of multiple marked elements, the number of steps is given in terms of a related quantity HT+(P,M)HT^+(\mathit{P,M}) which we call extended hitting time. Our approach is new, simpler and more general than previous ones. We introduce a notion of interpolation between the random walk PP and the absorbing walk Pâ€ČP', whose marked states are absorbing. Then our quantum walk is simply the quantum analogue of this interpolation. Contrary to previous approaches, our results remain valid when the random walk PP is not state-transitive. We also provide algorithms in the cases when only approximations or bounds on parameters pMp_M (the probability of picking a marked vertex from the stationary distribution) and HT+(P,M)HT^+(\mathit{P,M}) are known.Comment: 50 page
    • 

    corecore