Search CORE

247,033 research outputs found

Local ensemble transform Kalman filter, a fast non-stationary control law for adaptive optics on ELTs: theoretical aspects and first simulation results

Author: Bertino Laurent
Bocquet Marc
Ferrari Marc
Fusco Thierry
Gray Morgan
Petit Cyril
Rodionov Sergey
Publication venue: 'The Optical Society'
Publication date: 01/08/2014
Field of study

We propose a new algorithm for an adaptive optics system control law, based on the Linear Quadratic Gaussian approach and a Kalman Filter adaptation with localizations. It allows to handle non-stationary behaviors, to obtain performance close to the optimality defined with the residual phase variance minimization criterion, and to reduce the computational burden with an intrinsically parallel implementation on the Extremely Large Telescopes (ELTs).Comment: This paper was published in Optics Express and is made available as an electronic reprint with the permission of OSA. The paper can be found at the following URL on the OSA website: http://www.opticsinfobase.org/oe/ . Systematic or multiple reproduction or distribution to multiple locations via electronic or other means is prohibited and is subject to penalties under la

arXiv.org e-Print Archive

HAL AMU

INRIA a CCSD electronic archive server

HAL-INSU

HAL-Ecole des Ponts ParisTech

Random Forests for Big Data

Author: Genuer Robin
Poggi Jean-Michel
Tuleau-Malot Christine
Villa-Vialaneix Nathalie
Publication venue
Publication date: 19/11/2015
Field of study

Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include online data and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests were introduced by Breiman in 2001. They are a powerful nonparametric statistical method allowing to consider in a single and versatile framework regression problems, as well as two-class and multi-class classification problems. Focusing on classification problems, this paper proposes a selective review of available proposals that deal with scaling random forests to Big Data problems. These proposals rely on parallel environments or on online adaptations of random forests. We also describe how related quantities -- such as out-of-bag error and variable importance -- are addressed in these methods. Then, we formulate various remarks for random forests in the Big Data context. Finally, we experiment five variants on two massive datasets (15 and 120 millions of observations), a simulated one as well as real world data. One variant relies on subsampling while three others are related to parallel implementations of random forests and involve either various adaptations of bootstrap to Big Data or to "divide-and-conquer" approaches. The fifth variant relates on online learning of random forests. These numerical experiments lead to highlight the relative performance of the different variants, as well as some of their limitations

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

ProdInra

Hal-Diderot

Collaborative decision making by ensemble rule based classification systems

Author: Gegov Alexander Emilov
Liu Han
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Portsmouth University Research Portal (Pure)

Modeling Scalability of Distributed Machine Learning

Author: Marwah Manish
Simanovsky Andrey
Ulanov Alexander
Publication venue
Publication date: 24/03/2017
Field of study

Present day machine learning is computationally intensive and processes large amounts of data. It is implemented in a distributed fashion in order to address these scalability issues. The work is parallelized across a number of computing nodes. It is usually hard to estimate in advance how many nodes to use for a particular workload. We propose a simple framework for estimating the scalability of distributed machine learning algorithms. We measure the scalability by means of the speedup an algorithm achieves with more nodes. We propose time complexity models for gradient descent and graphical model inference. We validate our models with experiments on deep learning training and belief propagation. This framework was used to study the scalability of machine learning algorithms in Apache Spark.Comment: 6 pages, 4 figures, appears at ICDE 201

arXiv.org e-Print Archive

Crossref

Machine Learning-Based Elastic Cloud Resource Provisioning in the Solvency II Framework

Author: Casarano Giuseppe
Castellani Gilberto
Ciciani Bruno
La Rizza Andrea
Passalacqua Luca
Pellegrini Alessandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

The Solvency II Directive (Directive 2009/138/EC) is a European Directive issued in November 2009 and effective from January 2016, which has been enacted by the European Union to regulate the insurance and reinsurance sector through the discipline of risk management. Solvency II requires European insurance companies to conduct consistent evaluation and continuous monitoring of risks—a process which is computationally complex and extremely resource-intensive. To this end, companies are required to equip themselves with adequate IT infrastructures, facing a significant outlay. In this paper we present the design and the development of a Machine Learning-based approach to transparently deploy on a cloud environment the most resource-intensive portion of the Solvency II-related computation. Our proposal targets DISAR®, a Solvency II-oriented system initially designed to work on a grid of conventional computers. We show how our solution allows to reduce the overall expenses associated with the computation, without hampering the privacy of the companies’ data (making it suitable for conventional public cloud environments), and allowing to meet the strict temporal requirements required by the Directive. Additionally, the system is organized as a self-optimizing loop, which allows to use information gathered from actual (useful) computations, thus requiring a shorter training phase. We present an experimental study conducted on Amazon EC2 to assess the validity and the efficiency of our proposal

ART

Archivio della ricerca- Università di Roma La Sapienza