91,689 research outputs found
Boosting insights in insurance tariff plans with tree-based machine learning methods
Pricing actuaries typically operate within the framework of generalized
linear models (GLMs). With the upswing of data analytics, our study puts focus
on machine learning methods to develop full tariff plans built from both the
frequency and severity of claims. We adapt the loss functions used in the
algorithms such that the specific characteristics of insurance data are
carefully incorporated: highly unbalanced count data with excess zeros and
varying exposure on the frequency side combined with scarce, but potentially
long-tailed data on the severity side. A key requirement is the need for
transparent and interpretable pricing models which are easily explainable to
all stakeholders. We therefore focus on machine learning with decision trees:
starting from simple regression trees, we work towards more advanced ensembles
such as random forests and boosted trees. We show how to choose the optimal
tuning parameters for these models in an elaborate cross-validation scheme, we
present visualization tools to obtain insights from the resulting models and
the economic value of these new modeling approaches is evaluated. Boosted trees
outperform the classical GLMs, allowing the insurer to form profitable
portfolios and to guard against potential adverse risk selection
Multiscale Discriminant Saliency for Visual Attention
The bottom-up saliency, an early stage of humans' visual attention, can be
considered as a binary classification problem between center and surround
classes. Discriminant power of features for the classification is measured as
mutual information between features and two classes distribution. The estimated
discrepancy of two feature classes very much depends on considered scale
levels; then, multi-scale structure and discriminant power are integrated by
employing discrete wavelet features and Hidden markov tree (HMT). With wavelet
coefficients and Hidden Markov Tree parameters, quad-tree like label structures
are constructed and utilized in maximum a posterior probability (MAP) of hidden
class variables at corresponding dyadic sub-squares. Then, saliency value for
each dyadic square at each scale level is computed with discriminant power
principle and the MAP. Finally, across multiple scales is integrated the final
saliency map by an information maximization rule. Both standard quantitative
tools such as NSS, LCC, AUC and qualitative assessments are used for evaluating
the proposed multiscale discriminant saliency method (MDIS) against the
well-know information-based saliency method AIM on its Bruce Database wity
eye-tracking data. Simulation results are presented and analyzed to verify the
validity of MDIS as well as point out its disadvantages for further research
direction.Comment: 16 pages, ICCSA 2013 - BIOCA sessio
DeepWalk: Online Learning of Social Representations
We present DeepWalk, a novel approach for learning latent representations of
vertices in a network. These latent representations encode social relations in
a continuous vector space, which is easily exploited by statistical models.
DeepWalk generalizes recent advancements in language modeling and unsupervised
feature learning (or deep learning) from sequences of words to graphs. DeepWalk
uses local information obtained from truncated random walks to learn latent
representations by treating walks as the equivalent of sentences. We
demonstrate DeepWalk's latent representations on several multi-label network
classification tasks for social networks such as BlogCatalog, Flickr, and
YouTube. Our results show that DeepWalk outperforms challenging baselines which
are allowed a global view of the network, especially in the presence of missing
information. DeepWalk's representations can provide scores up to 10%
higher than competing methods when labeled data is sparse. In some experiments,
DeepWalk's representations are able to outperform all baseline methods while
using 60% less training data. DeepWalk is also scalable. It is an online
learning algorithm which builds useful incremental results, and is trivially
parallelizable. These qualities make it suitable for a broad class of real
world applications such as network classification, and anomaly detection.Comment: 10 pages, 5 figures, 4 table
Modeling Binary Time Series Using Gaussian Processes with Application to Predicting Sleep States
Motivated by the problem of predicting sleep states, we develop a mixed
effects model for binary time series with a stochastic component represented by
a Gaussian process. The fixed component captures the effects of covariates on
the binary-valued response. The Gaussian process captures the residual
variations in the binary response that are not explained by covariates and past
realizations. We develop a frequentist modeling framework that provides
efficient inference and more accurate predictions. Results demonstrate the
advantages of improved prediction rates over existing approaches such as
logistic regression, generalized additive mixed model, models for ordinal data,
gradient boosting, decision tree and random forest. Using our proposed model,
we show that previous sleep state and heart rates are significant predictors
for future sleep states. Simulation studies also show that our proposed method
is promising and robust. To handle computational complexity, we utilize Laplace
approximation, golden section search and successive parabolic interpolation.
With this paper, we also submit an R-package (HIBITS) that implements the
proposed procedure.Comment: Journal of Classification (2018
- …