75 research outputs found

    Causal Representation Learning Made Identifiable by Grouping of Observational Variables

    Full text link
    A topic of great current interest is Causal Representation Learning (CRL), whose goal is to learn a causal model for hidden features in a data-driven manner. Unfortunately, CRL is severely ill-posed since it is a combination of the two notoriously ill-posed problems of representation learning and causal discovery. Yet, finding practical identifiability conditions that guarantee a unique solution is crucial for its practical applicability. Most approaches so far have been based on assumptions on the latent causal mechanisms, such as temporal causality, or existence of supervision or interventions; these can be too restrictive in actual applications. Here, we show identifiability based on novel, weak constraints, which requires no temporal structure, intervention, nor weak supervision. The approach is based assuming the observational mixing exhibits a suitable grouping of the observational variables. We also propose a novel self-supervised estimation framework consistent with the model, prove its statistical consistency, and experimentally show its superior CRL performances compared to the state-of-the-art baselines. We further demonstrate its robustness against latent confounders and causal cycles

    Application of Machine Learning Algorithms to Actuarial Ratemaking within Property and Casualty Insurance

    Get PDF
    A scientific pricing assessment is essential for maintaining viable customer relationship management solutions (CRM) for various stakeholders including consumers, insurance intermediaries, and insurers. The thesis aims to examine research problems neighboring the ratemaking process, including relaxing the conventional loss model assumption of homogeneity and independence. The thesis identified three major research scopes within multiperil insurance settings: heterogeneity in consumer behaviour on pricing decisions, loss trending under non-linearity and temporal dependencies, and loss modelling in presence of inflationary pressure. Heterogeneous consumers on pricing decisions were examined using demand and loyalty-based strategy. A hybrid decision tree classification framework is implemented, that includes semi-supervised learning model, variable selection technique, and partitioning approach with different treatment effects in order to achieve adequate risk profiling. Also, the thesis explored a supervised tree learning mechanism under highly imbalanced overlap classes and having a non-linear response-predictors relationship. The two-phase classification framework is applied to an owner’s occupied property portfolio from a personal insurance brokerage powered by a digital platform within the Canadian market. The hybrid three-phase tree algorithm, which includes conditional inference trees, random forest wrapped by the Boruta algorithm, and model-based recursive partitioning under a multinomial generalized linear model, is proposed to study the price sensitivity ranking of digital consumers. The empirical results suggest a well-defined segmentation of digital consumers with differential price sensitivity. Further, with highly imbalanced and overlapped classes, the resampling technique was modelled together with the decision tree algorithm, providing a more scientific approach to overcome classification problems than the traditional multinomial regression. The resulting segmentation was able to identify the high-sensitivity consumers group, where premium rate reductions are recommended to reduce the churn rate. Consumers are classified as an insensitive group for which the price strategy to increase the premium rate is expected to have a slight impact on the closing ratio and retention rate. Insurance loss incurred greatly exhibits abnormal characteristics such as temporal dependence, nonlinear relationship between dependent and independent variables, seasonal variation, and mixture distribution resulting from the implicit claim inflation component. With such abnormal variable characteristics, the severity and frequency components may exhibit an altered trending pattern, that changes over time and never repeats. This could have a profound impact on the experience rating model, where the estimates of the pure premium and the rate relativity of tariff class are likely to be under or over-estimated. A discussion of the pros and cons of the conventional loss trending approach leads to an alternative framework for the loss cost structure. The conventional pure premium is further split into base severity and severity deflator random variables using a do(·) operator within causal inference. The components are separately modelled based on different time basis predictors using the semiparametric generalized additive model (GAM) with a spline curve. To maximize the claim inflation calendar year effect and improve the efficiency of severity trending, this thesis refines the claim inflation estimation by adapting Taylor’s [86] separation method that estimates the inflation index from a loss development triangle. In the second phase of developing the severity trend model, we integrated both the base severity and severity deflator under a new generalized mechanism known as Discount, Model, and Trend (DMT). The two-phase modelling was built to overcome the mixture distribution effect on final trend estimates. A simulation study constructed using the claims paid development triangle from a Canadian Insurtech broker’s houseowners/householders portfolio was used in a severity trend movement prediction analysis. We discovered that the conventional framework understated the severity trends more than the separation cum DMT framework. GAM provides a flexible and effective mechanism for modelling nonlinear time series in studies of the frequency loss trend. However, GAM assumes that residuals are independent and identically distributed (iid), while frequency loss time series can be correlated in adjacent time points. This thesis introduces a new model called Generalized Additive Model with Seasonal Autoregressive term (GAMSAR) that accounts for temporal dependency and seasonal variation in order to improve prediction confidence intervals. Parameters of the GAMSAR model are estimated by maximum partial likelihood using a modified Newton’s method developed by Yang et al. [97], and the goodness-of-fit between GAM, and GAMSAR is demonstrated using a simulation study. Simulation results show that the bias of the mean estimates from GAM differs greatly from their true value. The proposed GAMSAR model shows to be superior, especially in the presence of seasonal variation. Further, a comparison study is conducted between GAMSAR and Generalized Additive Model with Autoregressive term (GAMAR) developed by Yang et al. [97], and the coverage rate of 95% confidence interval confirms that the GAMSAR model has the ability to incorporate the nonlinear trend effects as well as capture the serial correlation between the observations. In the empirical analysis, a claim dataset of personal property insurance obtained from digital brokers in Canada is used to show that the GAMSAR(1)12 captures the periodic dependence structure of the data precisely compared to standard regression models. The proposed frequency severity trend models support the thesis’s goal of establishing a scientific approach to pricing that is robust under different trending processes

    Volume II Acquisition Research Creating Synergy for Informed Change, Thursday 19th Annual Acquisition Research Proceedings

    Get PDF
    ProceedingsApproved for public release; distribution is unlimited

    Proceedings of the 36th International Workshop Statistical Modelling July 18-22, 2022 - Trieste, Italy

    Get PDF
    The 36th International Workshop on Statistical Modelling (IWSM) is the first one held in presence after a two year hiatus due to the COVID-19 pandemic. This edition was quite lively, with 60 oral presentations and 53 posters, covering a vast variety of topics. As usual, the extended abstracts of the papers are collected in the IWSM proceedings, but unlike the previous workshops, this year the proceedings will be not printed on paper, but it is only online. The workshop proudly maintains its almost unique feature of scheduling one plenary session for the whole week. This choice has always contributed to the stimulating atmosphere of the conference, combined with its informal character, encouraging the exchange of ideas and cross-fertilization among different areas as a distinguished tradition of the workshop, student participation has been strongly encouraged. This IWSM edition is particularly successful in this respect, as testified by the large number of students included in the program

    SIS 2017. Statistics and Data Science: new challenges, new generations

    Get PDF
    The 2017 SIS Conference aims to highlight the crucial role of the Statistics in Data Science. In this new domain of ‘meaning’ extracted from the data, the increasing amount of produced and available data in databases, nowadays, has brought new challenges. That involves different fields of statistics, machine learning, information and computer science, optimization, pattern recognition. These afford together a considerable contribute in the analysis of ‘Big data’, open data, relational and complex data, structured and no-structured. The interest is to collect the contributes which provide from the different domains of Statistics, in the high dimensional data quality validation, sampling extraction, dimensional reduction, pattern selection, data modelling, testing hypotheses and confirming conclusions drawn from the data

    Statistical Modelling

    Get PDF
    The book collects the proceedings of the 19th International Workshop on Statistical Modelling held in Florence on July 2004. Statistical modelling is an important cornerstone in many scientific disciplines, and the workshop has provided a rich environment for cross-fertilization of ideas from different disciplines. It consists in four invited lectures, 48 contributed papers and 47 posters. The contributions are arranged in sessions: Statistical Modelling; Statistical Modelling in Genomics; Semi-parametric Regression Models; Generalized Linear Mixed Models; Correlated Data Modelling; Missing Data, Measurement of Error and Survival Analysis; Spatial Data Modelling and Time Series and Econometrics

    Migration and Global Health

    Get PDF
    This book attests to the ample research needs and opportunities around migration and health, with a focus on recent as well as earlier migration to Europe. It sheds light on several issues ranging from non-communicable disease epidemiology and health services utilization to aspects of quality of life, and of some methodological challenges
    • …
    corecore