89 research outputs found

    Semi-Parametric Empirical Best Prediction for small area estimation of unemployment indicators

    Full text link
    The Italian National Institute for Statistics regularly provides estimates of unemployment indicators using data from the Labor Force Survey. However, direct estimates of unemployment incidence cannot be released for Local Labor Market Areas. These are unplanned domains defined as clusters of municipalities; many are out-of-sample areas and the majority is characterized by a small sample size, which render direct estimates inadequate. The Empirical Best Predictor represents an appropriate, model-based, alternative. However, for non-Gaussian responses, its computation and the computation of the analytic approximation to its Mean Squared Error require the solution of (possibly) multiple integrals that, generally, have not a closed form. To solve the issue, Monte Carlo methods and parametric bootstrap are common choices, even though the computational burden is a non trivial task. In this paper, we propose a Semi-Parametric Empirical Best Predictor for a (possibly) non-linear mixed effect model by leaving the distribution of the area-specific random effects unspecified and estimating it from the observed data. This approach is known to lead to a discrete mixing distribution which helps avoid unverifiable parametric assumptions and heavy integral approximations. We also derive a second-order, bias-corrected, analytic approximation to the corresponding Mean Squared Error. Finite sample properties of the proposed approach are tested via a large scale simulation study. Furthermore, the proposal is applied to unit-level data from the 2012 Italian Labor Force Survey to estimate unemployment incidence for 611 Local Labor Market Areas using auxiliary information from administrative registers and the 2011 Census

    Finite mixture clustering of human tissues with different levels of IGF-1 splice variants mRNA transcripts

    Get PDF
    BACKGROUND: This study addresses a recurrent biological problem, that is to define a formal clustering structure for a set of tissues on the basis of the relative abundance of multiple alternatively spliced isoforms mRNAs generated by the same gene. To this aim, we have used a model-based clustering approach, based on a finite mixture of multivariate Gaussian densities. However, given we had more technical replicates from the same tissue for each quantitative measurement, we also employed a finite mixture of linear mixed models, with tissue-specific random effects. RESULTS: A panel of human tissues was analysed through quantitative real-time PCR methods, to quantify the relative amount of mRNA encoding different IGF-1 alternative splicing variants. After an appropriate, preliminary, equalization of the quantitative data, we provided an estimate of the distribution of the observed concentrations for the different IGF-1 mRNA splice variants in the cohort of tissues by employing suitable kernel density estimators. We observed that the analysed IGF-1 mRNA splice variants were characterized by multimodal distributions, which could be interpreted as describing the presence of several sub-population, i.e. potential tissue clusters. In this context, a formal clustering approach based on a finite mixture model (FMM) with Gaussian components is proposed. Due to the presence of potential dependence between the technical replicates (originated by repeated quantitative measurements of the same mRNA splice isoform in the same tissue) we have also employed the finite mixture of linear mixed models (FMLMM), which allowed to take into account this kind of within-tissue dependence. CONCLUSIONS: The FMM and the FMLMM provided a convenient yet formal setting for a model-based clustering of the human tissues in sub-populations, characterized by homogeneous values of concentrations of the mRNAs for one or multiple IGF-1 alternative splicing isoforms. The proposed approaches can be applied to any cohort of tissues expressing several alternatively spliced mRNAs generated by the same gene, and can overcome the limitations of clustering methods based on simple comparisons between splice isoform expression levels

    Mixed hidden Markov quantile regression models for longitudinal data with possibly incomplete sequences

    No full text
    Quantile regression provides a detailed and robust picture of the distribution of a response variable, conditional on a set of observed covariates. Recently, it has be been extended to the analysis of longitudinal continuous outcomes using either time-constant or time-varying random parameters. However, in real-life data, we frequently observe both temporal shocks in the overall trend and individual-specific heterogeneity in model parameters. A benchmark dataset on HIV progression gives a clear example. Here, the evolution of the CD4 log counts exhibits both sudden temporal changes in the overall trend and heterogeneity in the effect of the time since seroconversion on the response dynamics. To accommodate such situations, we propose a quantile regression model, where time-varying and time-constant random coefficients are jointly considered. Since observed data may be incomplete due to early drop-out, we also extend the proposed model in a pattern mixture perspective. We assess the performance of the proposals via a large-scale simulation study and the analysis of the CD4 count data

    Finite mixtures of quantile and M-quantile regression models

    Get PDF
    In this paper we define a finite mixture of quan- tile and M-quantile regression models for heterogeneous and /or for dependent/clustered data. Components of the finite mixture represent clusters of individuals with homogeneous values of model parameters. For its flexibility and ease of estimation, the proposed approaches can be extended to ran- dom coefficients with a higher dimension than the simple random intercept case. Estimation of model parameters is obtained through maximum likelihood, by implementing an EM-type algorithm. The standard error estimates for model parameters are obtained using the inverse of the observed information matrix, derived through the Oakes (J R Stat Soc Ser B 61:479–482, 1999) formula in the M-quantile setting, and through nonparametric bootstrap in the quantile case. We present a large scale simulation study to analyse the practical behaviour of the proposed model and to evaluate the empiri- cal performance of the proposed standard error estimates for model parameters. We considered a variety of empirical set- tings in both the random intercept and the random coefficient case. The proposed modelling approaches are also applied to two well-known datasets which give further insights on their empirical behaviour

    A flexible ratio regression approach for zero-truncated capture–recapture counts

    No full text
    Capture–recapture methods are used to estimate the size of a population of interest which is only partially observed. In such studies, each member of the population carries a count of the number of times it has been identified during the observational period. In real-life applications, only positive counts are recorded, and we get a truncated at zero-observed distribution. We need to use the truncated count distribution to estimate the number of unobserved units. We consider ratios of neighboring count probabilities, estimated by ratios of observed frequencies, regardless of whether we have a zero-truncated or an untruncated distribution. Rocchetti et al. (2011) have shown that, for densities in the Katz family, these ratios can be modeled by a regression approach, and Rocchetti et al. (2014) have specialized the approach to the beta-binomial distribution. Once the regression model has been estimated, the unobserved frequency of zero counts can be simply derived. The guiding principle is that it is often easier to find an appropriate regression model than a proper model for the count distribution. However, a full analysis of the connection between the regression model and the associated count distribution has been missing. In this manuscript, we fill the gap and show that the regression model approach leads, under general conditions, to a valid count distribution; we also consider a wider class of regression models, based on fractional polynomials. The proposed approach is illustrated by analyzing various empirical applications, and by means of a simulation study

    Predictors of Lung Cancer Risk: An Ecological Study Using Mortality and Environmental Data by Municipalities in Italy

    Get PDF
    Lung cancer (LC) mortality remains a consistent part of the total deaths occurring world-wide. Its etiology is complex as it involves multifactorial components. This work aims in providing an epidemiological assessment on occupational and environmental factors associated to LC risk by means of an ecological study involving the 8092 Italian municipalities for the period 2006–2015. We consider mortality data from mesothelioma as proxy of asbestos exposure, as well as PM2.5 and radon levels as a proxy of environmental origin. The compensated cases for occupational respiratory diseases, urbanization and deprivation were included as predictors. We used a negative binomial distribution for the response, with analysis stratified by gender. We estimated that asbestos is responsible for about 1.1% (95% CI: 0.8, 1.4) and 0.5% (95% CI: 0.2, 0.8) of LC mortality in males and females, respectively. The corresponding figures are 14.0% (95% CI: 12.5, 15.7) and 16.3% (95% CI: 16.2, 16.3) for PM2.5 exposure, and 3.9% (95% CI: 3.5, 4.2) and 1.6% (95% CI: 1.4, 1.7) for radon expo-sure. The assessment of determinants contribution to observed LC deaths is crucial for improving awareness of its origin, leading to increase the equity of the welfare system

    Monitoring extreme meteo-marine events in the Mediterranean area using the microseism (Medicane Apollo case study)

    Get PDF
    Microseism is the continuous background seismic signal caused by the interaction between the atmosphere, the hydrosphere and the solid Earth. Several studies have dealt with the relationship between microseisms and the tropical cyclones, but none focused on the small-scale tropical cyclones that occur in the Mediterranean Sea, called Medicanes. In this work, we analysed the Medicane Apollo which impacted the eastern part of Sicily during the period 25 October–5 November 2021 causing heavy rainfall, strong wind gusts and violent sea waves. We investigated the microseism accompanying this extreme Mediterranean weather event, and its relationship with the sea state retrieved from hindcast maps and wave buoys. The spectral and amplitude analyses showed the space–time variation of the microseism amplitude. In addition, we tracked the position of Apollo during the time using two different methods: (i) a grid search method; (ii) an array analysis. We obtained a good match between the real position of Apollo and the location constraint by both methods. This work shows that it is possible to extract information on Medicanes from microseisms for both research and monitoring purposes.peer-reviewe
    • …