5 research outputs found

    Conformal prediction for frequency-severity modeling

    Full text link
    We present a nonparametric model-agnostic framework for building prediction intervals of insurance claims, with finite sample statistical guarantees, extending the technique of split conformal prediction to the domain of two-stage frequency-severity modeling. The effectiveness of the framework is showcased with simulated and real datasets. When the underlying severity model is a random forest, we extend the two-stage split conformal prediction procedure, showing how the out-of-bag mechanism can be leveraged to eliminate the need for a calibration set and to enable the production of prediction intervals with adaptive width

    Bayesian tests for marginal homogeneity in contingency tables

    No full text
    O problema de testar hipóteses sobre proporções marginais de uma tabela de contingência assume papel fundamental, por exemplo, na investigação da mudança de opinião e comportamento. Apesar disso, a maioria dos textos na literatura abordam procedimentos para populações independentes, como o teste de homogeneidade de proporções. Existem alguns trabalhos que exploram testes de hipóteses em caso de respostas dependentes como, por exemplo, o teste de McNemar para tabelas 2 x 2. A extensão desse teste para tabelas k x k, denominado teste de homogeneidade marginal, usualmente requer, sob a abordagem clássica, a utilização de aproximações assintóticas. Contudo, quando o tamanho amostral é pequeno ou os dados esparsos, tais métodos podem eventualmente produzir resultados imprecisos. Neste trabalho, revisamos medidas de evidência clássicas e bayesianas comumente empregadas para comparar duas proporções marginais. Além disso, desenvolvemos o Full Bayesian Significance Test (FBST) para testar a homogeneidade marginal em tabelas de contingência bidimensionais e multidimensionais. O FBST é baseado em uma medida de evidência, denominada e-valor, que não depende de resultados assintóticos, não viola o princípio da verossimilhança e respeita a várias propriedades lógicas esperadas para testes de hipóteses. Consequentemente, a abordagem ao problema de teste de homogeneidade marginal pelo FBST soluciona diversas limitações geralmente enfrentadas por outros procedimentos.Tests of hypotheses for marginal proportions in contingency tables play a fundamental role, for instance, in the investigation of behaviour (or opinion) change. However, most texts in the literature are concerned with tests that assume independent populations (e.g: homogeneity tests). There are some works that explore hypotheses tests for dependent proportions such as the McNemar Test for 2 x 2 contingency tables. The generalization of McNemar test for k x k contingency tables, called marginal homogeneity test, usually requires asymptotic approximations. Nevertheless, for small sample sizes or sparse tables, such methods may occasionally produce imprecise results. In this work, we review some classical and Bayesian measures of evidence commonly applied to compare two marginal proportions. We propose the Full Bayesian Significance Test (FBST) to investigate marginal homogeneity in two-way and multidimensional contingency tables. The FBST is based on a measure of evidence, called e-value, which does not depend on asymptotic results, does not violate the likelihood principle and satisfies logical properties that are expected from hypothesis testing. Consequently, the FBST approach to test marginal homogeneity overcomes several limitations usually met by other procedures

    Algumas generalizações bayesianas do modelo autorregressivo de valores inteiros

    No full text
    In this thesis, we develop Bayesian generalized models for analyzing time series of counts. In our first proposal, we use a finite mixture to define the marginal distribution of the innovation process, in order to potentially account for overdispersion in the time series. Our second contribution uses a Dirichlet process at the distribution of the time-varying innovation rates, which are softly clustered through time. Finally, we examine issues of prior sensitivity in a semi-parametric extended model in which the distribution of the innovation rates follows a Pitman-Yor process. A graphical criterion to choose the Pitman-Yor base measure hyperparameters is proposed, showing explicitly that the Pitman-Yor discount parameter and the concentration parameter can interact with the chosen base measure to yield robust inferential results. The posterior distribution of the models parameters is obtained through data-augmentation schemes which allows us to obtain tractable full conditional distributions. The prediction performance of the proposed models are put to test in the analysis of two real data sets, with favorable results.Nesta tese, desenvolvemos generalizações bayesianas para analisar séries temporais de contagem. Primeiramente, modelamos a distribuição marginal do processo de inovação através de um modelo de mistura finita, de modo a acomodar sobredispersão na série temporal. Em nossa segunda contribuição, utilizamos um processo Dirichlet na distribuição das taxas de inovação, que são clusterizadas temporalmente. Finalmente, exploramos questões de sensibilidade da distribuição a priori em um terceiro modelo em que a distribuição das taxas de inovação segue um processo de Pitman-Yor. Propomos um critério gráfico para escolher os hiperparâmetros da medida base do process, mostrando explicitamente que o parâmetro de desconto e o parâmetro de concentração podem interagera com a medida base escolhida para produzir resultados inferenciais robustos. As distribuições a posterior dos parâmetros dos modelos são obtidas por meio da técnica de dados aumentados, o que viabiliza a obtenção de distribuições condicionais completas facilmente tratáveis. A performance preditiva são avaliadas em dois conjuntos de dados reais, com resultados favoráveis

    Prior sensitivity analysis in a semi-parametric integer-valued time series model

    No full text
    We examine issues of prior sensitivity in a semi-parametric hierarchical extension of the INAR(p) model with innovation rates clustered according to a Pitman–Yor process placed at the top of the model hierarchy. Our main finding is a graphical criterion that guides the specification of the hyperparameters of the Pitman–Yor process base measure. We show how the discount and concentration parameters interact with the chosen base measure to yield a gain in terms of the robustness of the inferential results. The forecasting performance of the model is exemplified in the analysis of a time series of worldwide earthquake events, for which the new model outperforms the original INAR(p) model

    Model choice for quantitative health impact assessment and modelling: an expert consultation and narrative literature review

    No full text
    Background: Health impact assessment (HIA) is a widely used process that aims to identify the health impacts, positive or negative, of a policy or intervention that is not necessarily placed in the health sector. Most HIAs are done prospectively and aim to forecast expected health impacts under assumed policy implementation. HIAs may quantitatively and/or qualitatively assess health impacts, with this study focusing on the former. A variety of quantitative modelling methods exist that are used for forecasting health impacts, however, they differ in application area, data requirements, assumptions, risk modelling, complexities, limitations, strengths, and comprehensibility. We reviewed relevant models, so as to provide public health researchers with considerations for HIA model choice. Methods: Based on an HIA expert consultation, combined with a narrative literature review, we identified the most relevant models that can be used for health impact forecasting. We narratively and comparatively reviewed the models, according to their fields of application, their configuration and purposes, counterfactual scenarios, underlying assumptions, health risk modelling, limitations and strengths. Results: Seven relevant models for health impacts forecasting were identified, consisting of (i) comparative risk assessment (CRA), (ii) time series analysis (TSA), (iii) compartmental models (CMs), (iv) structural models (SMs), (v) agentbased models (ABMs), (vi) microsimulations (MS), and (vii) artificial intelligence (AI)/machine learning (ML). These models represent a variety in approaches and vary in the fields of HIA application, complexity and comprehensibility. We provide a set of criteria for HIA model choice. Researchers must consider that model input assumptions match the available data and parameter structures, the available resources, and that model outputs match the research question, meet expectations and are comprehensible to end-users. Conclusion: The reviewed models have specific characteristics, related to available data and parameter structures, computational implementation, interpretation and comprehensibility, which the researcher should critically consider before HIA model choice
    corecore