Search CORE

301 research outputs found

Looking Back at the Gifi System of Nonlinear Multivariate Analysis

Author: van Buuren Stef
van der Heijden Peter G. M.
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/01/2016
Field of study

Gifi was the nom de plume for a group of researchers led by Jan de Leeuw at the University of Leiden. Between 1970 and 1990 the group produced a stream of theoretical papers and computer programs in the area of nonlinear multivariate analysis that were very innovative. In an informal way this paper discusses the so-called Gifi system of nonlinear multivariate analysis, that entails homogeneity analysis (which is closely related to multiple correspondence analysis) and generalizations. The history is discussed, giving attention to the scientific philosophy of this group, and links to machine learning are indicated

Directory of Open Access Journals

Journal of Statistical Software

Utrecht University Repository

On estimating the size of overcoverage with the latent class model. A critique of the paper "Population Size Estimation Using Multiple Incomplete Lists with Overcoverage" by di Cecco, di Zio, Filipponi and Rocchetti (2018, JOS 34 557-572)

Author: Smith Paul A.
van der Heijden Peter G M
Publication venue
Publication date: 11/05/2020
Field of study

We read with interest the article by di Cecco et al. (2018), but have reservations about the usefulness of the latent class model specifically for estimating overcoverage. In particular, we question the interpretation of the parameters of the fitted latent class model.Comment: 5 page

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

DOI: 10.1007/s11336-005-1495-y ITEM RANDOMIZED-RESPONSE MODELS FOR MEASURING NONCOMPLIANCE: RISK-RETURN PERCEPTIONS, SOCIAL INFLUENCES, AND SELF-PROTECTIVE RESPONSES

Author: Mcgill University
Peter G. M
Ulf Böckenholt
Van Der Heijden
Publication venue
Publication date
Field of study

Randomized response (RR) is a well-known method for measuring sensitive behavior. Yet this method is not often applied because: (i) of its lower efficiency and the resulting need for larger sample sizes which make applications of RR costly; (ii) despite its privacy-protection mechanism the RR design may not be followed by every respondent; and (iii) the incorrect belief that RR yields estimates only of aggregate-level behavior but that these estimates cannot be linked to individual-level covariates. This paper addresses the efficiency problem by applying item randomized-response (IRR) models for the analysis of multivariate RR data. In these models, a person parameter is estimated based on multiple measures of a sensitive behavior under study which allow for more powerful analyses of individual differences than available from univariate RR data. Response behavior that does not follow the RR design is approached by introducing mixture components in the IRR models with one component consisting of respondents who answer truthfully and another component consisting of respondents who do not provide truthful responses. An analysis of data from two large-scale Dutch surveys conducted among recipients of invalidity insurance benefits shows that the willingness of a respondent to answer truthfully is related to the educational level of the respondents and the perceived clarity of the instructions. A person is more willing to comply when the expected benefits of noncompliance are minor and social control is strong. Key words: randomized response, item response theory, cheating, concomitant variable, sensitive behavior, efficiency

CiteSeerX

Improving information retrieval through correspondence analysis instead of latent semantic analysis

Author: Hessen David J.
Qi Qianqian
van der Heijden Peter G. M.
Publication venue
Publication date: 14/03/2023
Field of study

Both latent semantic analysis (LSA) and correspondence analysis (CA) are dimensionality reduction techniques that use singular value decomposition (SVD) for information retrieval. Theoretically, the results of LSA display both the association between documents and terms, and marginal effects; in comparison, CA only focuses on the associations between documents and terms. Marginal effects are usually not relevant for information retrieval, and therefore, from a theoretical perspective CA is more suitable for information retrieval. In this paper, we empirically compare LSA and CA. The elements of the raw document-term matrix are weighted, and the weighting exponent of singular values is adjusted to improve the performance of LSA. We explore whether these two weightings also improve the performance of CA. In addition, we compare the optimal singular value weighting exponents for LSA and CA to identify what the initial dimensions in LSA correspond to. The results for four empirical datasets show that CA always performs better than LSA. Weighting the elements of the raw data matrix can improve CA; however, it is data dependent and the improvement is small. Adjusting the singular value weighting exponent usually improves the performance of CA; however, the extent of the improved performance depends on the dataset and number of dimensions. In general, CA needs a larger singular value weighting exponent than LSA to obtain the optimal performance. This indicates that CA emphasizes initial dimensions more than LSA, and thus, margins play an important role in the initial dimensions in LSA

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

Bias correction in multiple-systems estimation

Author: Bakker Bart F. M.
van der Heijden Peter G. M.
Zult Daan B.
Publication venue
Publication date: 03/11/2023
Field of study

If part of a population is hidden but two or more sources are available that each cover parts of this population, dual- or multiple-system(s) estimation can be applied to estimate this population. For this it is common to use the log-linear model, estimated with maximum likelihood. These maximum likelihood estimates are based on a non-linear model and therefore suffer from finite-sample bias, which can be substantial in case of small samples or a small population size. This problem was recognised by Chapman, who derived an estimator with good small sample properties in case of two available sources. However, he did not derive an estimator for more than two sources. We propose an estimator that is an extension of Chapman's estimator to three or more sources and compare this estimator with other bias-reduced estimators in a simulation study. The proposed estimator performs well, and much better than the other estimators. A real data example on homelessness in the Netherlands shows that our proposed model can make a substantial difference

arXiv.org e-Print Archive

Functionality of the Crosswise Model for Assessing Sensitive or Transgressive Behavior: A Systematic Review and Meta-Analysis

Author: Andrea Petróczi
Andrea Petróczi
Dominic Sagoe
Maarten Cruyff
Martial Saugy
Olivier de Hon
Owen Spendiff
Peter G. M. van der Heijden
Peter G. M. van der Heijden
Razieh Chegeni
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2021
Field of study

Tools for reliable assessment of socially sensitive or transgressive behavior warrant constant development. Among them, the Crosswise Model (CM) has gained considerable attention. We systematically reviewed and meta-analyzed empirical applications of CM and addressed a gap for quality assessment of indirect estimation models. Guided by the PRISMA protocol, we identified 45 empirical studies from electronic database and reference searches. Thirty of these were comparative validation studies (CVS) comparing CM and direct question (DQ) estimates. Six prevalence studies exclusively used CM. One was a qualitative study. Behavior investigated were substance use and misuse (k = 13), academic misconduct (k = 8), and corruption, tax evasion, and theft (k = 7) among others. Majority of studies (k = 39) applied the “more is better” hypothesis. Thirty-five studies relied on birthday distribution and 22 of these used P = 0.25 for the non-sensitive item. Overall, 11 studies were assessed as high-, 31 as moderate-, and two as low quality (excluding the qualitative study). The effect of non-compliance was assessed in eight studies. From mixed CVS results, the meta-analysis indicates that CM outperforms DQ on the “more is better” validation criterion, and increasingly so with higher behavior sensitivity. However, little difference was observed between DQ and CM estimates for items with DQ prevalence estimate around 50%. Based on empirical evidence available to date, our study provides support for the superiority of CM to DQ in assessing sensitive/transgressive behavior. Despite some limitations, CM is a valuable and promising tool for population level investigation.publishedVersio

University of Bergen

Directory of Open Access Journals

Kingston University Research Repository

NORA - Norwegian Open Research Archives

Utrecht University Repository

Introduction to special issue on quasi-symmetry and categorical data analysis

Author: Peter G. M. van der Heijden
Stephen E. Fienberg
Publication venue
Publication date: 01/01/2002
Field of study

Annales de la Faculté des Sciences de Toulouse

Accounting for self-protective responses in randomized response data from a social security survey using the zero-inflated Poisson model

Author: Böckenholt Ulf
Cruyff Maarten J. L. F.
Hout Ardo van den
van der Heijden Peter G. M.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

In 2004 the Dutch Department of Social Affairs conducted a survey to assess the extent of noncompliance with social security regulations. The survey was conducted among 870 recipients of social security benefits and included a series of sensitive questions about regulatory noncompliance. Due to the sensitive nature of the questions the randomized response design was used. Although randomized response protects the privacy of the respondent, it is unlikely that all respondents followed the design. In this paper we introduce a model that allows for respondents displaying self-protective response behavior by consistently giving the nonincriminating response, irrespective of the outcome of the randomizing device. The dependent variable denoting the total number of incriminating responses is assumed to be generated by the application of randomized response to a latent Poisson variable denoting the true number of rule violations. Since self-protective responses result in an excess of observed zeros in relation to the Poisson randomized response distribution, these are modeled as observed zero-inflation. The model includes predictors of the Poisson parameters, as well as predictors of the probability of self-protective response behavior.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS135 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

Crossref

Utrecht University Repository

Circulating angiopoietin-2 levels in the course of septic shock: relation with fluid balance, pulmonary dysfunction and mortality

Author: Bouw Martijn P. W. J. M.
Groeneveld A. B. Johan
Pickkers Peter
van der Heijden Melanie
van der Hoeven Johannes G.
van Hinsbergh Victor W. M.
van Nieuw Amerongen Geerten P.
Publication venue: Springer-Verlag
Publication date: 01/01/2009
Field of study

Contains fulltext : 79899.pdf (publisher's version ) (Closed access)PURPOSE: To investigate whether angiopoietin-2, von Willebrand factor (VWF) and angiopoietin-1 relate to surrogate indicators of vascular permeability, pulmonary dysfunction and intensive care unit (ICU) mortality throughout the course of septic shock. METHODS: In 50 consecutive mechanically ventilated septic shock patients, plasma angiopoietin-2, VWF and angiopoietin-1 levels and fluid balance, partial pressure of oxygen/inspiratory oxygen fraction and the oxygenation index as indicators of vascular permeability and pulmonary dysfunction, respectively, were measured until day 28. RESULTS: Angiopoietin-2 positively related to the fluid balance and pulmonary dysfunction, was higher in non-survivors than in survivors and independently predicted non-survival throughout the course of septic shock. VWF inversely related to the fluid balance and pulmonary dysfunction throughout the course of septic shock, was comparable between survivors and non-survivors and predicted non-survival on day 0 only. Angiopoietin-1 positively related to pulmonary dysfunction throughout the course, but did not differ between survivors and non-survivors. CONCLUSIONS: In contrast to VWF, plasma angiopoietin-2 positively relates to fluid balance, pulmonary dysfunction and mortality throughout the course of septic shock, in line with a suggested mediator role of the protein

Springer - Publisher Connector

PubMed Central

Radboud Repository