Search CORE

854,740 research outputs found

Adapting to Unknown Smoothness by Aggregation of Thresholded Wavelet Estimators

Author: Chesneau Christophe
Lecué Guillaume
Publication venue
Publication date: 01/01/2006
Field of study

We study the performances of an adaptive procedure based on a convex combination, with data-driven weights, of term-by-term thresholded wavelet estimators. For the bounded regression model, with random uniform design, and the nonparametric density model, we show that the resulting estimator is optimal in the minimax sense over all Besov balls under the

L^2

risk, without any logarithm factor

arXiv.org e-Print Archive

CiteSeerX

Hal-Diderot

HAL - UPEC / UPEM

Pointwise adaptive estimation for robust and quantile regression

Author: Cuenod Charles-Andre
Reiss Markus
Rozenholc Yves
Publication venue
Publication date: 03/04/2009
Field of study

A nonparametric procedure for robust regression estimation and for quantile regression is proposed which is completely data-driven and adapts locally to the regularity of the regression function. This is achieved by considering in each point M-estimators over different local neighbourhoods and by a local model selection procedure based on sequential testing. Non-asymptotic risk bounds are obtained, which yield rate-optimality for large sample asymptotics under weak conditions. Simulations for different univariate median regression models show good finite sample properties, also in comparison to traditional methods. The approach is extended to image denoising and applied to CT scans in cancer research

arXiv.org e-Print Archive

CiteSeerX

HAL Descartes

Model selection in logistic regression

Author: Kwemou Marius
Taupin Marie-Luce
Tocquet Anne-Sophie
Publication venue
Publication date: 29/08/2015
Field of study

This paper is devoted to model selection in logistic regression. We extend the model selection principle introduced by Birg\'e and Massart (2001) to logistic regression model. This selection is done by using penalized maximum likelihood criteria. We propose in this context a completely data-driven criteria based on the slope heuristics. We prove non asymptotic oracle inequalities for selected estimators. Theoretical results are illustrated through simulation studies

arXiv.org e-Print Archive

HAL Evry

HAL Descartes

Scalable aggregation predictive analytics: a query-driven machine learning approach

Author: Anagnostopoulos Christos
Savva Fotis
Triantafillou Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/12/2017
Field of study

We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method

Crossref

Warwick Research Archives Portal Repository

Enlighten