10 research outputs found

    When Composite Likelihood Meets Stochastic Approximation

    Full text link
    A composite likelihood is an inference function derived by multiplying a set of likelihood components. This approach provides a flexible framework for drawing inference when the likelihood function of a statistical model is computationally intractable. While composite likelihood has computational advantages, it can still be demanding when dealing with numerous likelihood components and a large sample size. This paper tackles this challenge by employing an approximation of the conventional composite likelihood estimator, which is derived from an optimization procedure relying on stochastic gradients. This novel estimator is shown to be asymptotically normally distributed around the true parameter. In particular, based on the relative divergent rate of the sample size and the number of iterations of the optimization, the variance of the limiting distribution is shown to compound for two sources of uncertainty: the sampling variability of the data and the optimization noise, with the latter depending on the sampling distribution used to construct the stochastic gradients. The advantages of the proposed framework are illustrated through simulation studies on two working examples: an Ising model for binary data and a gamma frailty model for count data. Finally, a real-data application is presented, showing its effectiveness in a large-scale mental health survey

    Stochastic estimation methods for latent variable models

    No full text
    Le variabili latenti sono uno strumento matematico diffuso in molti campi di applicazione, come la psicometria e le scienze sociali, per modellare le strutture di dipendenza osservate su dati multivariati. Anche se pratiche da un punto di visto modellistico, tipicamente implicano importanti sfide computazionali e di scalabilità legate all'intrattabilità analitica della verosimiglianza marginale dei dati. Pertanto, una soluzione naturale è spesso quella di ottimizzare un'approssimazione di qualche tipo della funzione di verosimiglianza in questione. Tali approssimazione permettono la stima del modello al costo di una perdita di efficienza rispetto allo stimatore teorico di massima verosimiglianza. Questa tesi è organizzata in tre capitoli, i quali affrontano l'intrattabilità analitica della verosimiglianza sfruttando strategie basate su approssimazioni stocastiche. Il primo capitolo considera un'approssimazione Monte Carlo della verosimiglianza. In particolare, concentrandosi sull'efficienza della procedura di campionamento, evidenzia netti margini di miglioramento nell'accuratezza delle stime rispetto ad altri algoritmi più affermati, anche di fronte a problemi di dimensionalità solitamente considerata non particolarmente impegnativa. Il secondo capitolo combina strumenti di ottimizzazione stocastica con le approssimazioni basate su verosimiglianza composita. Sfruttando l'unione d'idee provenienti dalle due letterature, introduce un algoritmo stocastico per un'ottimizzazione efficiente e scalabile delle verosimiglianze composite. Il capitolo finale si concentra su modelli fattoriali per questionari con dati di tipo ordinale ad alta dimensionalità. A questo proposito, estende l'algoritmo descritto nel capitolo precedente introducendo la possibilità d'incorporare proiezioni e penalizzazioni allo scopo di gestire problemi di maggior complessità.Latent variables are widely spread in many application fields, such as psychometrics and social sciences, as a mathematical tool to account for dependence structures among multivariate data. While practical from a modelling perspective, they come with computational and scalability challenges related to the analytical intractability of the marginal likelihood of the data. Thus, a natural solution is often to optimise an approximation of the likelihood function of interest. Such approximations allow fitting the model at the cost of some statistical efficiency loss compared to the theoretical maximum likelihood estimator. This thesis is organised into three main chapters, which examine the intractable likelihood problem by taking advantage of stochastic-based approximations. The first one considers the optimisation of a Monte Carlo approximation of the likelihood of interest. By focusing on the efficiency of the sampling procedure, it shows it is possible to improve on standard estimation algorithms even when the dimensionality of the problem is not considered particularly challenging. The second one combines stochastic optimisation with approximations based on composite likelihoods. By merging ideas from the two literatures, we provide a stochastic algorithm for scalable and efficient composite likelihood estimation. The final chapter focuses on factor models for large-scale ordinal surveys and extends the algorithm introduced in the second one by incorporating projections and regularisers to deal with more complex estimation problems

    Pairwise stochastic approximation for confirmatory factor analysis of categorical data

    No full text
    Pairwise likelihood is a limited information method widely used to estimate latent variable models, including factor analysis of categorical data. It can often avoid evaluating high-dimensional integrals and, thus, is computationally more efficient than relying on the full likelihood. Despite its computational advantage, the pairwise likelihood approach can still be demanding for large-scale problems that involve many observed variables. We tackle this challenge by employing an approximation of the pairwise likelihood estimator, which is derived from an optimisation procedure relying on stochastic gradients. The stochastic gradients are constructed by subsampling the pairwise log-likelihood contributions, for which the subsampling scheme controls the per-iteration computational complexity. The stochastic estimator is shown to be asymptotically equivalent to the pairwise likelihood one. However, finite sample performances can be improved by compounding the sampling variability of the data with the uncertainty introduced by the subsampling scheme. We demonstrate the performance of the proposed method using simulation studies and two real data applications

    Observed expenditures vs estimated burden of health care: a comparative evaluation based on spatial analysis

    No full text
    In the context of increasing life expectancy and growth of the elderly population, the assessment of time and spatial patterns of over 65 population health care need is a key step in order to better manage public resources (Gray, 2005). The aim of this study is to highlight the existence of spatial heterogeneity in the elderly healthcare burden, comparing alternative modelling approaches, in the context of Regione Friuli Venezia Giulia (FVG). Data on estimated health burden in 2017 and 2018 were aggregated on age classes within each municipality. The population size, the ratio between males and females, and the death rate, the counts of 21 chronic conditions, the Resource Utilization Band (RUB) indicator, and the expenditures for healthcare services (Pharmaceutical, Hospital, and Outpatient types) in years from 2002 to 2017 were also collected. A descriptive analysis both of ageing phenomenon and of health care expenditures trends has been performed. The availability of the RUB indicator, provided in the John Hopkins ACG System (version11.1.2), allows comparing observed healthcare expenditures (HCE) with the estimated healthcare burdens. In particular, different spatial econometrics models (such as those discussed in Elhorst, 2014; Moscone and Tosetti,2014; Le Sage and Pace, 2009) have been compared to explore spatial heterogeneity of the differences between demand and health need. The analyses are developed on the full population and also focusing on the elderly population only. The empirical evidence shows that while HCE does not present any spatial pattern, the RUB indicator is characterized by some strong geographical clusterization even after controlling for the demographical structure of municipalities. In order to model the spatial heterogeneity, an SDM speci\ufb01cation is chosen after an appropriate set of tests. The spatial patterns of morbidities play an important role in the explanation of the healthcare burden, together with the economic characteristic of the municipality. The model estimation, based on the elderly subpopulation, provides further insights on the diseases mostly in\ufb02uencing the healthcare burden, namely age macular degeneration, human immunode\ufb01ciency virus and low back pain. Surprisingly, the focus on the subpopulation points out that elderlies living in areas with higher shares of elderly population are healthier and needs fewer resources than their peers in other areas
    corecore