301 research outputs found
Looking Back at the Gifi System of Nonlinear Multivariate Analysis
Gifi was the nom de plume for a group of researchers led by Jan de Leeuw at the University of Leiden. Between 1970 and 1990 the group produced a stream of theoretical papers and computer programs in the area of nonlinear multivariate analysis that were very innovative. In an informal way this paper discusses the so-called Gifi system of nonlinear multivariate analysis, that entails homogeneity analysis (which is closely related to multiple correspondence analysis) and generalizations. The history is discussed, giving attention to the scientific philosophy of this group, and links to machine learning are indicated
On estimating the size of overcoverage with the latent class model. A critique of the paper "Population Size Estimation Using Multiple Incomplete Lists with Overcoverage" by di Cecco, di Zio, Filipponi and Rocchetti (2018, JOS 34 557-572)
We read with interest the article by di Cecco et al. (2018), but have
reservations about the usefulness of the latent class model specifically for
estimating overcoverage. In particular, we question the interpretation of the
parameters of the fitted latent class model.Comment: 5 page
DOI: 10.1007/s11336-005-1495-y ITEM RANDOMIZED-RESPONSE MODELS FOR MEASURING NONCOMPLIANCE: RISK-RETURN PERCEPTIONS, SOCIAL INFLUENCES, AND SELF-PROTECTIVE RESPONSES
Randomized response (RR) is a well-known method for measuring sensitive behavior. Yet this method is not often applied because: (i) of its lower efficiency and the resulting need for larger sample sizes which make applications of RR costly; (ii) despite its privacy-protection mechanism the RR design may not be followed by every respondent; and (iii) the incorrect belief that RR yields estimates only of aggregate-level behavior but that these estimates cannot be linked to individual-level covariates. This paper addresses the efficiency problem by applying item randomized-response (IRR) models for the analysis of multivariate RR data. In these models, a person parameter is estimated based on multiple measures of a sensitive behavior under study which allow for more powerful analyses of individual differences than available from univariate RR data. Response behavior that does not follow the RR design is approached by introducing mixture components in the IRR models with one component consisting of respondents who answer truthfully and another component consisting of respondents who do not provide truthful responses. An analysis of data from two large-scale Dutch surveys conducted among recipients of invalidity insurance benefits shows that the willingness of a respondent to answer truthfully is related to the educational level of the respondents and the perceived clarity of the instructions. A person is more willing to comply when the expected benefits of noncompliance are minor and social control is strong. Key words: randomized response, item response theory, cheating, concomitant variable, sensitive behavior, efficiency
Improving information retrieval through correspondence analysis instead of latent semantic analysis
Both latent semantic analysis (LSA) and correspondence analysis (CA) are
dimensionality reduction techniques that use singular value decomposition (SVD)
for information retrieval. Theoretically, the results of LSA display both the
association between documents and terms, and marginal effects; in comparison,
CA only focuses on the associations between documents and terms. Marginal
effects are usually not relevant for information retrieval, and therefore, from
a theoretical perspective CA is more suitable for information retrieval.
In this paper, we empirically compare LSA and CA. The elements of the raw
document-term matrix are weighted, and the weighting exponent of singular
values is adjusted to improve the performance of LSA. We explore whether these
two weightings also improve the performance of CA. In addition, we compare the
optimal singular value weighting exponents for LSA and CA to identify what the
initial dimensions in LSA correspond to.
The results for four empirical datasets show that CA always performs better
than LSA. Weighting the elements of the raw data matrix can improve CA;
however, it is data dependent and the improvement is small. Adjusting the
singular value weighting exponent usually improves the performance of CA;
however, the extent of the improved performance depends on the dataset and
number of dimensions. In general, CA needs a larger singular value weighting
exponent than LSA to obtain the optimal performance. This indicates that CA
emphasizes initial dimensions more than LSA, and thus, margins play an
important role in the initial dimensions in LSA
Bias correction in multiple-systems estimation
If part of a population is hidden but two or more sources are available that
each cover parts of this population, dual- or multiple-system(s) estimation can
be applied to estimate this population. For this it is common to use the
log-linear model, estimated with maximum likelihood. These maximum likelihood
estimates are based on a non-linear model and therefore suffer from
finite-sample bias, which can be substantial in case of small samples or a
small population size. This problem was recognised by Chapman, who derived an
estimator with good small sample properties in case of two available sources.
However, he did not derive an estimator for more than two sources. We propose
an estimator that is an extension of Chapman's estimator to three or more
sources and compare this estimator with other bias-reduced estimators in a
simulation study. The proposed estimator performs well, and much better than
the other estimators. A real data example on homelessness in the Netherlands
shows that our proposed model can make a substantial difference
Functionality of the Crosswise Model for Assessing Sensitive or Transgressive Behavior: A Systematic Review and Meta-Analysis
Tools for reliable assessment of socially sensitive or transgressive behavior warrant constant development. Among them, the Crosswise Model (CM) has gained considerable attention. We systematically reviewed and meta-analyzed empirical applications of CM and addressed a gap for quality assessment of indirect estimation models. Guided by the PRISMA protocol, we identified 45 empirical studies from electronic database and reference searches. Thirty of these were comparative validation studies (CVS) comparing CM and direct question (DQ) estimates. Six prevalence studies exclusively used CM. One was a qualitative study. Behavior investigated were substance use and misuse (k = 13), academic misconduct (k = 8), and corruption, tax evasion, and theft (k = 7) among others. Majority of studies (k = 39) applied the âmore is betterâ hypothesis. Thirty-five studies relied on birthday distribution and 22 of these used P = 0.25 for the non-sensitive item. Overall, 11 studies were assessed as high-, 31 as moderate-, and two as low quality (excluding the qualitative study). The effect of non-compliance was assessed in eight studies. From mixed CVS results, the meta-analysis indicates that CM outperforms DQ on the âmore is betterâ validation criterion, and increasingly so with higher behavior sensitivity. However, little difference was observed between DQ and CM estimates for items with DQ prevalence estimate around 50%. Based on empirical evidence available to date, our study provides support for the superiority of CM to DQ in assessing sensitive/transgressive behavior. Despite some limitations, CM is a valuable and promising tool for population level investigation.publishedVersio
Accounting for self-protective responses in randomized response data from a social security survey using the zero-inflated Poisson model
In 2004 the Dutch Department of Social Affairs conducted a survey to assess
the extent of noncompliance with social security regulations. The survey was
conducted among 870 recipients of social security benefits and included a
series of sensitive questions about regulatory noncompliance. Due to the
sensitive nature of the questions the randomized response design was used.
Although randomized response protects the privacy of the respondent, it is
unlikely that all respondents followed the design. In this paper we introduce a
model that allows for respondents displaying self-protective response behavior
by consistently giving the nonincriminating response, irrespective of the
outcome of the randomizing device. The dependent variable denoting the total
number of incriminating responses is assumed to be generated by the application
of randomized response to a latent Poisson variable denoting the true number of
rule violations. Since self-protective responses result in an excess of
observed zeros in relation to the Poisson randomized response distribution,
these are modeled as observed zero-inflation. The model includes predictors of
the Poisson parameters, as well as predictors of the probability of
self-protective response behavior.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS135 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Circulating angiopoietin-2 levels in the course of septic shock: relation with fluid balance, pulmonary dysfunction and mortality
Contains fulltext :
79899.pdf (publisher's version ) (Closed access)PURPOSE: To investigate whether angiopoietin-2, von Willebrand factor (VWF) and angiopoietin-1 relate to surrogate indicators of vascular permeability, pulmonary dysfunction and intensive care unit (ICU) mortality throughout the course of septic shock. METHODS: In 50 consecutive mechanically ventilated septic shock patients, plasma angiopoietin-2, VWF and angiopoietin-1 levels and fluid balance, partial pressure of oxygen/inspiratory oxygen fraction and the oxygenation index as indicators of vascular permeability and pulmonary dysfunction, respectively, were measured until day 28. RESULTS: Angiopoietin-2 positively related to the fluid balance and pulmonary dysfunction, was higher in non-survivors than in survivors and independently predicted non-survival throughout the course of septic shock. VWF inversely related to the fluid balance and pulmonary dysfunction throughout the course of septic shock, was comparable between survivors and non-survivors and predicted non-survival on day 0 only. Angiopoietin-1 positively related to pulmonary dysfunction throughout the course, but did not differ between survivors and non-survivors. CONCLUSIONS: In contrast to VWF, plasma angiopoietin-2 positively relates to fluid balance, pulmonary dysfunction and mortality throughout the course of septic shock, in line with a suggested mediator role of the protein
- âŠ