854,740 research outputs found

    Adapting to Unknown Smoothness by Aggregation of Thresholded Wavelet Estimators

    Get PDF
    We study the performances of an adaptive procedure based on a convex combination, with data-driven weights, of term-by-term thresholded wavelet estimators. For the bounded regression model, with random uniform design, and the nonparametric density model, we show that the resulting estimator is optimal in the minimax sense over all Besov balls under the L2L^2 risk, without any logarithm factor

    Pointwise adaptive estimation for robust and quantile regression

    Full text link
    A nonparametric procedure for robust regression estimation and for quantile regression is proposed which is completely data-driven and adapts locally to the regularity of the regression function. This is achieved by considering in each point M-estimators over different local neighbourhoods and by a local model selection procedure based on sequential testing. Non-asymptotic risk bounds are obtained, which yield rate-optimality for large sample asymptotics under weak conditions. Simulations for different univariate median regression models show good finite sample properties, also in comparison to traditional methods. The approach is extended to image denoising and applied to CT scans in cancer research

    Model selection in logistic regression

    Full text link
    This paper is devoted to model selection in logistic regression. We extend the model selection principle introduced by Birg\'e and Massart (2001) to logistic regression model. This selection is done by using penalized maximum likelihood criteria. We propose in this context a completely data-driven criteria based on the slope heuristics. We prove non asymptotic oracle inequalities for selected estimators. Theoretical results are illustrated through simulation studies

    Scalable aggregation predictive analytics: a query-driven machine learning approach

    Get PDF
    We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method
    corecore