3,716 research outputs found

    Big Data and Reliability Applications: The Complexity Dimension

    Full text link
    Big data features not only large volumes of data but also data with complicated structures. Complexity imposes unique challenges in big data analytics. Meeker and Hong (2014, Quality Engineering, pp. 102-116) provided an extensive discussion of the opportunities and challenges in big data and reliability, and described engineering systems that can generate big data that can be used in reliability analysis. Meeker and Hong (2014) focused on large scale system operating and environment data (i.e., high-frequency multivariate time series data), and provided examples on how to link such data as covariates to traditional reliability responses such as time to failure, time to recurrence of events, and degradation measurements. This paper intends to extend that discussion by focusing on how to use data with complicated structures to do reliability analysis. Such data types include high-dimensional sensor data, functional curve data, and image streams. We first provide a review of recent development in those directions, and then we provide a discussion on how analytical methods can be developed to tackle the challenging aspects that arise from the complexity feature of big data in reliability applications. The use of modern statistical methods such as variable selection, functional data analysis, scalar-on-image regression, spatio-temporal data models, and machine learning techniques will also be discussed.Comment: 28 pages, 7 figure

    The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions

    Full text link
    Discovering interaction effects on a response of interest is a fundamental problem faced in biology, medicine, economics, and many other scientific disciplines. In theory, Bayesian methods for discovering pairwise interactions enjoy many benefits such as coherent uncertainty quantification, the ability to incorporate background knowledge, and desirable shrinkage properties. In practice, however, Bayesian methods are often computationally intractable for even moderate-dimensional problems. Our key insight is that many hierarchical models of practical interest admit a particular Gaussian process (GP) representation; the GP allows us to capture the posterior with a vector of O(p) kernel hyper-parameters rather than O(p^2) interactions and main effects. With the implicit representation, we can run Markov chain Monte Carlo (MCMC) over model hyper-parameters in time and memory linear in p per iteration. We focus on sparsity-inducing models and show on datasets with a variety of covariate behaviors that our method: (1) reduces runtime by orders of magnitude over naive applications of MCMC, (2) provides lower Type I and Type II error relative to state-of-the-art LASSO-based approaches, and (3) offers improved computational scaling in high dimensions relative to existing Bayesian and LASSO-based approaches.Comment: Accepted at ICML 2019. 20 pages, 4 figures, 3 table

    Estimating Risk Preferences from Deductible Choice

    Get PDF
    We use a large data set of deductible choices in auto insurance contracts to estimate the distribution of risk preferences in our sample. To do so, we develop a structural econometric model, which accounts for adverse selection by allowing for unobserved heterogeneity in both risk (probability of an accident) and risk aversion. Ex-post claim information separately identifies the marginal distribution of risk, while the joint distribution of risk and risk aversion is identified by the deductible choice. We find that individuals in our sample have on average an estimated absolute risk aversion which is higher than other estimates found in the literature. Using annual income as a measure of wealth, we find an average two-digit coefficient of relative risk aversion. We also find that women tend to be more risk averse than men, that proxies for income and wealth are positively related to absolute risk aversion, that unobserved heterogeneity in risk preferences is higher relative to that of risk, and that unobserved risk is positively correlated with unobserved risk aversion. Finally, we use our results for counterfactual exercises that assess the profitability of insurance contracts under various assumptions.

    Claim Models: Granular Forms and Machine Learning Forms

    Get PDF
    This collection of articles addresses the most modern forms of loss reserving methodology: granular models and machine learning models. New methodologies come with questions about their applicability. These questions are discussed in one article, which focuses on the relative merits of granular and machine learning models. Others illustrate applications with real-world data. The examples include neural networks, which, though well known in some disciplines, have previously been limited in the actuarial literature. This volume expands on that literature, with specific attention to their application to loss reserving. For example, one of the articles introduces the application of neural networks of the gated recurrent unit form to the actuarial literature, whereas another uses a penalized neural network. Neural networks are not the only form of machine learning, and two other papers outline applications of gradient boosting and regression trees respectively. Both articles construct loss reserves at the individual claim level so that these models resemble granular models. One of these articles provides a practical application of the model to claim watching, the action of monitoring claim development and anticipating major features. Such watching can be used as an early warning system or for other administrative purposes. Overall, this volume is an extremely useful addition to the libraries of those working at the loss reserving frontier

    Modeling and pricing cyber insurance: Idiosyncratic, systematic, and systemic risks

    Get PDF
    The paper provides a comprehensive overview of modeling and pricing cyber insurance and includes clear and easily understandable explanations of the underlying mathematical concepts. We distinguish three main types of cyber risks: idiosyncratic, systematic, and systemic cyber risks. While for idiosyncratic and systematic cyber risks, classical actuarial and financial mathematics appear to be well-suited, systemic cyber risks require more sophisticated approaches that capture both network and strategic interactions. In the context of pricing cyber insurance policies, issues of interdependence arise for both systematic and systemic cyber risks; classical actuarial valuation needs to be extended to include more complex methods, such as concepts of risk-neutral valuation and (set-valued) monetary risk measures

    Models for high dimensional spatially correlated risks and application to thunderstorm loss data in Texas

    Get PDF
    Insurance claims caused by natural disasters exhibit spatial dependence with the strength of dependence being based on factors such as physical distance and population density, to name a few. Accounting for spatial dependence is therefore of crucial importance when modeling these types of claims. In this work, we present an approach to assess spatially dependent insurance risks using a combination of linear regression and factor copula models. Specifically, in loss modeling, observed dependence patterns are highly nonlinear, thus copula-based models seem appropriate since they can handle both linear and nonlinear dependence. The factor copula approach for estimating the spatial dependence reduces a complex dependence structure into a relatively easier task of estimating a spatial dependence parameter. Hence, we use a weighted sum of radial basis functions to model a spatial dependence parameter that determines the influence of each location. The methodology is illustrated using a thunderstorm wind loss dataset of Texas. Extensions to Matérn covariance functions and spatiotemporal models are briefly discussed --Abstract, page iii
    corecore