12 research outputs found

    BAYESIAN NONPARAMETRIC DISCLOSURE RISK ESTIMATION VIA MIXED EFFECTS LOG-LINEAR MODELS

    Get PDF
    Statistical agencies and other institutions collect data under the promise to protect the confidentiality of respondents. When releasing microdata samples, the risk that records can be identified must be assessed. To this aim, a widely adopted approach is to isolate categorical variables key to the identification and analyze multi-way contingency tables of such variables. Common disclosure risk measures focus on sample unique cells in these tables and adopt parametric log-linear models as the standard statistical tools for the problem. Such models have often to deal with large and extremely sparse tables that pose a number of challenges to risk estimation. This paper proposes to overcome these problems by studying nonparametric alternatives based on Dirichlet process random effects. The main finding is that the inclusion of such random effects allows us to reduce considerably the number of fixed effects required to achieve reliable risk estimates. This is studied on applications to real data, suggesting in particular that our mixed models with main effects only produces roughly equivalent estimates compared to the all-two way interactions models, and is effective in defusing potential shortcomings of traditional log-linear models. This paper adopts a fully Bayesian approach that accounts for all sources of uncertainty, including that about the population frequencies, and supplies unconditional (posterior) variances and credible intervals

    Colorectal Cancer Stage at Diagnosis Before vs During the COVID-19 Pandemic in Italy

    Get PDF
    IMPORTANCE Delays in screening programs and the reluctance of patients to seek medical attention because of the outbreak of SARS-CoV-2 could be associated with the risk of more advanced colorectal cancers at diagnosis. OBJECTIVE To evaluate whether the SARS-CoV-2 pandemic was associated with more advanced oncologic stage and change in clinical presentation for patients with colorectal cancer. DESIGN, SETTING, AND PARTICIPANTS This retrospective, multicenter cohort study included all 17 938 adult patients who underwent surgery for colorectal cancer from March 1, 2020, to December 31, 2021 (pandemic period), and from January 1, 2018, to February 29, 2020 (prepandemic period), in 81 participating centers in Italy, including tertiary centers and community hospitals. Follow-up was 30 days from surgery. EXPOSURES Any type of surgical procedure for colorectal cancer, including explorative surgery, palliative procedures, and atypical or segmental resections. MAIN OUTCOMES AND MEASURES The primary outcome was advanced stage of colorectal cancer at diagnosis. Secondary outcomes were distant metastasis, T4 stage, aggressive biology (defined as cancer with at least 1 of the following characteristics: signet ring cells, mucinous tumor, budding, lymphovascular invasion, perineural invasion, and lymphangitis), stenotic lesion, emergency surgery, and palliative surgery. The independent association between the pandemic period and the outcomes was assessed using multivariate random-effects logistic regression, with hospital as the cluster variable. RESULTS A total of 17 938 patients (10 007 men [55.8%]; mean [SD] age, 70.6 [12.2] years) underwent surgery for colorectal cancer: 7796 (43.5%) during the pandemic period and 10 142 (56.5%) during the prepandemic period. Logistic regression indicated that the pandemic period was significantly associated with an increased rate of advanced-stage colorectal cancer (odds ratio [OR], 1.07; 95%CI, 1.01-1.13; P = .03), aggressive biology (OR, 1.32; 95%CI, 1.15-1.53; P < .001), and stenotic lesions (OR, 1.15; 95%CI, 1.01-1.31; P = .03). CONCLUSIONS AND RELEVANCE This cohort study suggests a significant association between the SARS-CoV-2 pandemic and the risk of a more advanced oncologic stage at diagnosis among patients undergoing surgery for colorectal cancer and might indicate a potential reduction of survival for these patients

    Assessing Bayesian Semi‐Parametric Log‐Linear Models: An Application to Disclosure Risk Estimation

    No full text
    We propose a method for identifying models with good predictive performance in the family of Bayesian log-linear mixed models with Dirichlet process random effects for count data. Their wide applicability makes the assessment of model performance crucial in many fields, including disclosure risk estimation, which is the focus of the present work. Rather than assessing models on the whole contingency table, we target the specific objective of the analysis and propose a two-stage model selection procedure aimed at limiting a form of bias arising in the process of model selection. Our proposal combines two different criteria: at the first stage, a path in the model search space is identified through a strongly penalized log-likelihood; at the second, a small number of semi-parametric models is evaluated through a context-dependent score-based information criterion. Tested on a variety of contingency tables, our method proves to be able to identify models with good predictive performance in a few steps, even in the presence of large tables with many sampling and structural zeros. We carefully discuss the proposed method in the context of the literature on model assessment and contextualize the illustrative application in the recent debate on statistical disclosure limitation. Finally, we provide examples of further applications in different research areas

    Disclosure risk estimation via nonparametric log-linear models

    No full text
    A major concern in releasing microdata sets is protecting the privacy of individuals in the sample. Consider a data set in the form of a high-dimensional con- tingency table. If an individual belongs to a cell with small frequency, an intruder with certain knowledge about the individual may identify him and learn sensitive information about him in the data. To estimate the risk of such breach of confidentiality we introduce several nonparametric models which represent progressive extensions of the one adopted by Skinner and Holmes (1998). The latter is a Poisson model with rates modeled through a mixed effects log-linear model with normal random effects. In the first extension, we assume Dirichlet process random effects and, mimicking Skinner and Holmes (1998), we keep the fixed effects constant. Next, we relax the latter assumption and consider a model all effects of which are unknown. In both extended models the total mass parameter of the Dirichlet process is also unknown. The MCMC methods used for inference are extensively discussed. An application to real data concludes the article

    Bayesian semiparametric disclosure risk estimation via mixed effects log-linear models

    No full text
    In releasing data arising from sample surveys, statistical Agencies must face the obligation to protect the confidentiality of respondents. Prior to any data release, disclosure risk assessment is necessary. Disclosure may occur because ill-intentioned users might exploit their own information to link records in the re- leased data to target individuals by matching on common characteristics (keys) that permit identification. Following the literature, we focus on categorical key variables, which is typical for socio-demographic surveys. Intuitively, individu- als who are unique or rare in the population with respect to the key variables are at high risk of disclosure. Indeed the number of observations that are unique in a sample and also unique, or rare, in the population is commonly used to measure the overall risk of disclosure in the sample data. Many authors have attempted to estimate risk by employing parametric models on cross classifications of the keys, i.e. multi-way continge

    Semi parametric log-linear models for Bayesian re-identification risk assessment

    No full text
    The number of categorical observations that are unique in a sample and also unique, or rare, in the population is usually taken as a measure of the overall risk of disclosure in the sample data. Another important and commonly used risk measure is the expected number of correct guesses if each sample unique is matched with an individual chosen at random from the corresponding population cell. Many authors have attempted to estimate these quantities in cross classifications of the key variables, i.e. multi-way contingency tables of those categorical (key) variables that permit the identification of individuals in the sample. Methods based on parametric assumptions dominate this literature. On one hand, assuming exchangeability of cells, elaborations of the Poisson model (Poisson-gamma, Poisson-lognormal, etc.) have been extensively applied; on the other hand, relaxing the exchangeability assumption, logistic or log-linear models have been introduced to capture the underlying probability structure of the contingency table. Our Bayesian semi-parametric approach considers a Poisson model with rates explained by a mixed effects log-linear model with Dirichlet process random effects. Suitable specifications of the base measure of the Dirichlet process allow for useful and interesting extensions of many parametric models for disclosure risk estimation. An application to real data is also considered

    Bayesian nonparametric disclosure risk estimation via mixed effects log-linear models

    Get PDF
    Statistical agencies and other institutions collect data under the promise to protect the confidentiality of respondents. When releasing microdata samples, the risk that records can be identified must be assessed. To this aim, a widely adopted approach is to isolate categorical variables key to the identification and analyze multi-way contingency tables of such variables. Common disclosure risk measures focus on sample unique cells in these tables and adopt parametric log-linear models as the standard statistical tools for the problem. Such models often have to deal with large and extremely sparse tables that pose a number of challenges to risk estimation. This paper proposes to overcome these problems by studying nonparametric alternatives based on Dirichlet process random effects. The main finding is that the inclusion of such random effects allows us to reduce considerably the number of fixed effects required to achieve reliable risk estimates. This is studied on applications to real data, suggesting, in particular, that our mixed models with main effects only produce roughly equivalent estimates compared to the all two-way interactions models, and are effective in defusing potential shortcomings of traditional log-linear models. This paper adopts a fully Bayesian approach that accounts for all sources of uncertainty, including that about the population frequencies, and supplies unconditional (posterior) variances and credible intervals
    corecore