247,033 research outputs found
Local ensemble transform Kalman filter, a fast non-stationary control law for adaptive optics on ELTs: theoretical aspects and first simulation results
We propose a new algorithm for an adaptive optics system control law, based
on the Linear Quadratic Gaussian approach and a Kalman Filter adaptation with
localizations. It allows to handle non-stationary behaviors, to obtain
performance close to the optimality defined with the residual phase variance
minimization criterion, and to reduce the computational burden with an
intrinsically parallel implementation on the Extremely Large Telescopes (ELTs).Comment: This paper was published in Optics Express and is made available as
an electronic reprint with the permission of OSA. The paper can be found at
the following URL on the OSA website: http://www.opticsinfobase.org/oe/ .
Systematic or multiple reproduction or distribution to multiple locations via
electronic or other means is prohibited and is subject to penalties under la
Random Forests for Big Data
Big Data is one of the major challenges of statistical science and has
numerous consequences from algorithmic and theoretical viewpoints. Big Data
always involve massive data but they also often include online data and data
heterogeneity. Recently some statistical methods have been adapted to process
Big Data, like linear regression models, clustering methods and bootstrapping
schemes. Based on decision trees combined with aggregation and bootstrap ideas,
random forests were introduced by Breiman in 2001. They are a powerful
nonparametric statistical method allowing to consider in a single and versatile
framework regression problems, as well as two-class and multi-class
classification problems. Focusing on classification problems, this paper
proposes a selective review of available proposals that deal with scaling
random forests to Big Data problems. These proposals rely on parallel
environments or on online adaptations of random forests. We also describe how
related quantities -- such as out-of-bag error and variable importance -- are
addressed in these methods. Then, we formulate various remarks for random
forests in the Big Data context. Finally, we experiment five variants on two
massive datasets (15 and 120 millions of observations), a simulated one as well
as real world data. One variant relies on subsampling while three others are
related to parallel implementations of random forests and involve either
various adaptations of bootstrap to Big Data or to "divide-and-conquer"
approaches. The fifth variant relates on online learning of random forests.
These numerical experiments lead to highlight the relative performance of the
different variants, as well as some of their limitations
Modeling Scalability of Distributed Machine Learning
Present day machine learning is computationally intensive and processes large
amounts of data. It is implemented in a distributed fashion in order to address
these scalability issues. The work is parallelized across a number of computing
nodes. It is usually hard to estimate in advance how many nodes to use for a
particular workload. We propose a simple framework for estimating the
scalability of distributed machine learning algorithms. We measure the
scalability by means of the speedup an algorithm achieves with more nodes. We
propose time complexity models for gradient descent and graphical model
inference. We validate our models with experiments on deep learning training
and belief propagation. This framework was used to study the scalability of
machine learning algorithms in Apache Spark.Comment: 6 pages, 4 figures, appears at ICDE 201
Machine Learning-Based Elastic Cloud Resource Provisioning in the Solvency II Framework
The Solvency II Directive (Directive 2009/138/EC) is a European Directive issued in November 2009 and effective from January 2016, which has been enacted by the European Union to regulate the insurance and reinsurance sector through the discipline of risk management. Solvency II requires European insurance companies to conduct consistent evaluation and continuous monitoring of risksâa process which is computationally complex and extremely resource-intensive. To this end, companies are required to equip themselves with adequate IT infrastructures, facing a significant outlay.
In this paper we present the design and the development of a Machine Learning-based approach to transparently deploy on a cloud environment the most resource-intensive portion of the Solvency II-related computation. Our proposal targets DISARÂź, a Solvency II-oriented system initially designed to work on a grid of conventional computers. We show how our solution allows to reduce the overall expenses associated with the computation, without hampering the privacy of the companiesâ data (making it suitable for conventional public cloud environments), and allowing to meet the strict temporal requirements required by the Directive. Additionally, the system is organized as a self-optimizing loop, which allows to use information gathered from actual (useful) computations, thus requiring a shorter training phase. We present an experimental study conducted on Amazon EC2 to assess the validity and the efficiency of our proposal
- âŠ