854,740 research outputs found
Adapting to Unknown Smoothness by Aggregation of Thresholded Wavelet Estimators
We study the performances of an adaptive procedure based on a convex
combination, with data-driven weights, of term-by-term thresholded wavelet
estimators. For the bounded regression model, with random uniform design, and
the nonparametric density model, we show that the resulting estimator is
optimal in the minimax sense over all Besov balls under the risk, without
any logarithm factor
Pointwise adaptive estimation for robust and quantile regression
A nonparametric procedure for robust regression estimation and for quantile
regression is proposed which is completely data-driven and adapts locally to
the regularity of the regression function. This is achieved by considering in
each point M-estimators over different local neighbourhoods and by a local
model selection procedure based on sequential testing. Non-asymptotic risk
bounds are obtained, which yield rate-optimality for large sample asymptotics
under weak conditions. Simulations for different univariate median regression
models show good finite sample properties, also in comparison to traditional
methods. The approach is extended to image denoising and applied to CT scans in
cancer research
Model selection in logistic regression
This paper is devoted to model selection in logistic regression. We extend
the model selection principle introduced by Birg\'e and Massart (2001) to
logistic regression model. This selection is done by using penalized maximum
likelihood criteria. We propose in this context a completely data-driven
criteria based on the slope heuristics. We prove non asymptotic oracle
inequalities for selected estimators. Theoretical results are illustrated
through simulation studies
Scalable aggregation predictive analytics: a query-driven machine learning approach
We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method
- …
