38 research outputs found
Ensemble methods for solving problems of medical diagnosis
A consolidating method for analyzing series of observations based on a fitted
model of a mixture of catalysts of the main components is proposed, which makes
it possible to study any number of markers. Contrasting the longitudinal
approach, it eliminates the need to connect regression analysis methods with
their own uncertainties when choosing particular models. The consolidating
method allows obtaining an original result in the subject area of early
diagnosis of a disease: all options for using markers demonstrate an increase
in classification accuracy with an increase in the length of a series of
examinations.Comment: 4 pages, 1 figure, 4 table
Churn Prediction Task in MOOC
Churn prediction is a common task for machine learning applications in business. In this paper, this task is adapted for solving problem of low efficiency of massive open online courses (only 5% of all the students finish their course). The approach is presented on course “Methods and algorithms of the graph theory” held on national platform of online education in Russia. This paper includes all the steps to build an intelligent system to predict students who are active during the course, but not likely to finish it. The first part consists of constructing the right sample for prediction, EDA and choosing the most appropriate week of the course to make predictions on. The second part is about choosing the right metric and building models. Also, approach with using ensembles like stacking is proposed to increase the accuracy of predictions. As a result, a general approach to build a churn prediction model for online course is reviewed. This approach can be used for making the process of online education adaptive and intelligent for a separate student
Averaging of density kernel estimators
Averaging provides an alternative to bandwidth selection for density kernel estimation. We propose a procedure to combine linearly several kernel estimators of a density obtained from different, possibly data-driven, bandwidths. The method relies on minimizing an easily tractable approximation of the integrated square error of the combination. It provides, at a small computational cost, a final solution that improves on the initial estimators in most cases. The average estimator is proved to be asymptotically as efficient as the best possible combination (the oracle), with an error term that decreases faster than the minimax rate obtained with separated learning and validation samples. The performances are tested numerically, with results that compare favorably to other existing procedures in terms of mean integrated square errors
Prediction of infectious disease epidemics via weighted density ensembles
Accurate and reliable predictions of infectious disease dynamics can be
valuable to public health organizations that plan interventions to decrease or
prevent disease transmission. A great variety of models have been developed for
this task, using different model structures, covariates, and targets for
prediction. Experience has shown that the performance of these models varies;
some tend to do better or worse in different seasons or at different points
within a season. Ensemble methods combine multiple models to obtain a single
prediction that leverages the strengths of each model. We considered a range of
ensemble methods that each form a predictive density for a target of interest
as a weighted sum of the predictive densities from component models. In the
simplest case, equal weight is assigned to each component model; in the most
complex case, the weights vary with the region, prediction target, week of the
season when the predictions are made, a measure of component model uncertainty,
and recent observations of disease incidence. We applied these methods to
predict measures of influenza season timing and severity in the United States,
both at the national and regional levels, using three component models. We
trained the models on retrospective predictions from 14 seasons (1997/1998 -
2010/2011) and evaluated each model's prospective, out-of-sample performance in
the five subsequent influenza seasons. In this test phase, the ensemble methods
showed overall performance that was similar to the best of the component
models, but offered more consistent performance across seasons than the
component models. Ensemble methods offer the potential to deliver more reliable
predictions to public health decision makers.Comment: 20 pages, 6 figure
Aggregating density estimators: an empirical study
We present some new density estimation algorithms obtained by bootstrap
aggregation like Bagging. Our algorithms are analyzed and empirically compared
to other methods found in the statistical literature, like stacking and
boosting for density estimation. We show by extensive simulations that ensemble
learning are effective for density estimation like for classification. Although
our algorithms do not always outperform other methods, some of them are as
simple as bagging, more intuitive and has computational lower cost