9,726 research outputs found
Prediction of infectious disease epidemics via weighted density ensembles
Accurate and reliable predictions of infectious disease dynamics can be
valuable to public health organizations that plan interventions to decrease or
prevent disease transmission. A great variety of models have been developed for
this task, using different model structures, covariates, and targets for
prediction. Experience has shown that the performance of these models varies;
some tend to do better or worse in different seasons or at different points
within a season. Ensemble methods combine multiple models to obtain a single
prediction that leverages the strengths of each model. We considered a range of
ensemble methods that each form a predictive density for a target of interest
as a weighted sum of the predictive densities from component models. In the
simplest case, equal weight is assigned to each component model; in the most
complex case, the weights vary with the region, prediction target, week of the
season when the predictions are made, a measure of component model uncertainty,
and recent observations of disease incidence. We applied these methods to
predict measures of influenza season timing and severity in the United States,
both at the national and regional levels, using three component models. We
trained the models on retrospective predictions from 14 seasons (1997/1998 -
2010/2011) and evaluated each model's prospective, out-of-sample performance in
the five subsequent influenza seasons. In this test phase, the ensemble methods
showed overall performance that was similar to the best of the component
models, but offered more consistent performance across seasons than the
component models. Ensemble methods offer the potential to deliver more reliable
predictions to public health decision makers.Comment: 20 pages, 6 figure
Classifying pairs with trees for supervised biological network inference
Networks are ubiquitous in biology and computational approaches have been
largely investigated for their inference. In particular, supervised machine
learning methods can be used to complete a partially known network by
integrating various measurements. Two main supervised frameworks have been
proposed: the local approach, which trains a separate model for each network
node, and the global approach, which trains a single model over pairs of nodes.
Here, we systematically investigate, theoretically and empirically, the
exploitation of tree-based ensemble methods in the context of these two
approaches for biological network inference. We first formalize the problem of
network inference as classification of pairs, unifying in the process
homogeneous and bipartite graphs and discussing two main sampling schemes. We
then present the global and the local approaches, extending the later for the
prediction of interactions between two unseen network nodes, and discuss their
specializations to tree-based ensemble methods, highlighting their
interpretability and drawing links with clustering techniques. Extensive
computational experiments are carried out with these methods on various
biological networks that clearly highlight that these methods are competitive
with existing methods.Comment: 22 page
Identifying Real Estate Opportunities using Machine Learning
The real estate market is exposed to many fluctuations in prices because of
existing correlations with many variables, some of which cannot be controlled
or might even be unknown. Housing prices can increase rapidly (or in some
cases, also drop very fast), yet the numerous listings available online where
houses are sold or rented are not likely to be updated that often. In some
cases, individuals interested in selling a house (or apartment) might include
it in some online listing, and forget about updating the price. In other cases,
some individuals might be interested in deliberately setting a price below the
market price in order to sell the home faster, for various reasons. In this
paper, we aim at developing a machine learning application that identifies
opportunities in the real estate market in real time, i.e., houses that are
listed with a price substantially below the market price. This program can be
useful for investors interested in the housing market. We have focused in a use
case considering real estate assets located in the Salamanca district in Madrid
(Spain) and listed in the most relevant Spanish online site for home sales and
rentals. The application is formally implemented as a regression problem that
tries to estimate the market price of a house given features retrieved from
public online listings. For building this application, we have performed a
feature engineering stage in order to discover relevant features that allows
for attaining a high predictive performance. Several machine learning
algorithms have been tested, including regression trees, k-nearest neighbors,
support vector machines and neural networks, identifying advantages and
handicaps of each of them.Comment: 24 pages, 13 figures, 5 table
Structured Learning of Tree Potentials in CRF for Image Segmentation
We propose a new approach to image segmentation, which exploits the
advantages of both conditional random fields (CRFs) and decision trees. In the
literature, the potential functions of CRFs are mostly defined as a linear
combination of some pre-defined parametric models, and then methods like
structured support vector machines (SSVMs) are applied to learn those linear
coefficients. We instead formulate the unary and pairwise potentials as
nonparametric forests---ensembles of decision trees, and learn the ensemble
parameters and the trees in a unified optimization problem within the
large-margin framework. In this fashion, we easily achieve nonlinear learning
of potential functions on both unary and pairwise terms in CRFs. Moreover, we
learn class-wise decision trees for each object that appears in the image. Due
to the rich structure and flexibility of decision trees, our approach is
powerful in modelling complex data likelihoods and label relationships. The
resulting optimization problem is very challenging because it can have
exponentially many variables and constraints. We show that this challenging
optimization can be efficiently solved by combining a modified column
generation and cutting-planes techniques. Experimental results on both binary
(Graz-02, Weizmann horse, Oxford flower) and multi-class (MSRC-21, PASCAL VOC
2012) segmentation datasets demonstrate the power of the learned nonlinear
nonparametric potentials.Comment: 10 pages. Appearing in IEEE Transactions on Neural Networks and
Learning System
- …