18,516 research outputs found
Boosting insights in insurance tariff plans with tree-based machine learning methods
Pricing actuaries typically operate within the framework of generalized
linear models (GLMs). With the upswing of data analytics, our study puts focus
on machine learning methods to develop full tariff plans built from both the
frequency and severity of claims. We adapt the loss functions used in the
algorithms such that the specific characteristics of insurance data are
carefully incorporated: highly unbalanced count data with excess zeros and
varying exposure on the frequency side combined with scarce, but potentially
long-tailed data on the severity side. A key requirement is the need for
transparent and interpretable pricing models which are easily explainable to
all stakeholders. We therefore focus on machine learning with decision trees:
starting from simple regression trees, we work towards more advanced ensembles
such as random forests and boosted trees. We show how to choose the optimal
tuning parameters for these models in an elaborate cross-validation scheme, we
present visualization tools to obtain insights from the resulting models and
the economic value of these new modeling approaches is evaluated. Boosted trees
outperform the classical GLMs, allowing the insurer to form profitable
portfolios and to guard against potential adverse risk selection
A Survey on Metric Learning for Feature Vectors and Structured Data
The need for appropriate ways to measure the distance or similarity between
data is ubiquitous in machine learning, pattern recognition and data mining,
but handcrafting such good metrics for specific problems is generally
difficult. This has led to the emergence of metric learning, which aims at
automatically learning a metric from data and has attracted a lot of interest
in machine learning and related fields for the past ten years. This survey
paper proposes a systematic review of the metric learning literature,
highlighting the pros and cons of each approach. We pay particular attention to
Mahalanobis distance metric learning, a well-studied and successful framework,
but additionally present a wide range of methods that have recently emerged as
powerful alternatives, including nonlinear metric learning, similarity learning
and local metric learning. Recent trends and extensions, such as
semi-supervised metric learning, metric learning for histogram data and the
derivation of generalization guarantees, are also covered. Finally, this survey
addresses metric learning for structured data, in particular edit distance
learning, and attempts to give an overview of the remaining challenges in
metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved
presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new
method
- …