2,702 research outputs found
Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting
The authors are doing the readers of Statistical Science a true service with
a well-written and up-to-date overview of boosting that originated with the
seminal algorithms of Freund and Schapire. Equally, we are grateful for
high-level software that will permit a larger readership to experiment with, or
simply apply, boosting-inspired model fitting. The authors show us a world of
methodology that illustrates how a fundamental innovation can penetrate every
nook and cranny of statistical thinking and practice. They introduce the reader
to one particular interpretation of boosting and then give a display of its
potential with extensions from classification (where it all started) to least
squares, exponential family models, survival analysis, to base-learners other
than trees such as smoothing splines, to degrees of freedom and regularization,
and to fascinating recent work in model selection. The uninitiated reader will
find that the authors did a nice job of presenting a certain coherent and
useful interpretation of boosting. The other reader, though, who has watched
the business of boosting for a while, may have quibbles with the authors over
details of the historic record and, more importantly, over their optimism about
the current state of theoretical knowledge. In fact, as much as ``the
statistical view'' has proven fruitful, it has also resulted in some ideas
about why boosting works that may be misconceived, and in some recommendations
that may be misguided. [arXiv:0804.2752]Comment: Published in at http://dx.doi.org/10.1214/07-STS242B the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A Comparative Study of Machine Learning Models for Tabular Data Through Challenge of Monitoring Parkinson's Disease Progression Using Voice Recordings
People with Parkinson's disease must be regularly monitored by their
physician to observe how the disease is progressing and potentially adjust
treatment plans to mitigate the symptoms. Monitoring the progression of the
disease through a voice recording captured by the patient at their own home can
make the process faster and less stressful. Using a dataset of voice recordings
of 42 people with early-stage Parkinson's disease over a time span of 6 months,
we applied multiple machine learning techniques to find a correlation between
the voice recording and the patient's motor UPDRS score. We approached this
problem using a multitude of both regression and classification techniques.
Much of this paper is dedicated to mapping the voice data to motor UPDRS scores
using regression techniques in order to obtain a more precise value for unknown
instances. Through this comparative study of variant machine learning methods,
we realized some old machine learning methods like trees outperform cutting
edge deep learning models on numerous tabular datasets.Comment: Accepted at "HIMS'20 - The 6th Int'l Conf on Health Informatics and
Medical Systems"; https://americancse.org/events/csce2020/conferences/hims2
Evidential Bagging: Combining Heterogeneous Classifiers in the Belief Functions Framework
International audienceIn machine learning, Ensemble Learning methodologies are known to improve predictive accuracy and robustness. They consist in the learning of many classifiers that produce outputs which are finally combined according to different techniques. Bagging, or Bootstrap Aggre-gating, is one of the most famous Ensemble methodologies and is usually applied to the same classification base algorithm, i.e. the same type of classifier is learnt multiple times on bootstrapped versions of the initial learning dataset. In this paper, we propose a bagging methodology that involves different types of classifier. Classifiers' probabilist outputs are used to build mass functions which are further combined within the belief functions framework. Three different ways of building mass functions are proposed; preliminary experiments on benchmark datasets showing the relevancy of the approach are presented
Isoelastic Agents and Wealth Updates in Machine Learning Markets
Recently, prediction markets have shown considerable promise for developing
flexible mechanisms for machine learning. In this paper, agents with isoelastic
utilities are considered. It is shown that the costs associated with
homogeneous markets of agents with isoelastic utilities produce equilibrium
prices corresponding to alpha-mixtures, with a particular form of mixing
component relating to each agent's wealth. We also demonstrate that wealth
accumulation for logarithmic and other isoelastic agents (through payoffs on
prediction of training targets) can implement both Bayesian model updates and
mixture weight updates by imposing different market payoff structures. An
iterative algorithm is given for market equilibrium computation. We demonstrate
that inhomogeneous markets of agents with isoelastic utilities outperform state
of the art aggregate classifiers such as random forests, as well as single
classifiers (neural networks, decision trees) on a number of machine learning
benchmarks, and show that isoelastic combination methods are generally better
than their logarithmic counterparts.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Collaborative Training in Sensor Networks: A graphical model approach
Graphical models have been widely applied in solving distributed inference
problems in sensor networks. In this paper, the problem of coordinating a
network of sensors to train a unique ensemble estimator under communication
constraints is discussed. The information structure of graphical models with
specific potential functions is employed, and this thus converts the
collaborative training task into a problem of local training plus global
inference. Two important classes of algorithms of graphical model inference,
message-passing algorithm and sampling algorithm, are employed to tackle
low-dimensional, parametrized and high-dimensional, non-parametrized problems
respectively. The efficacy of this approach is demonstrated by concrete
examples
- …