56,471 research outputs found

    Detecting Differential Item and Step Functioning with Rating Scale and Partial Credit Trees

    Get PDF
    Several statistical procedures have been suggested for detecting differential item functioning (DIF) and differential step functioning (DSF) in polytomous items. However, standard procedures are designed for the comparison of pre-specified reference and focal groups, such as males and females. Here, we propose a framework for the detection of DIF and DSF in polytomous items under the rating scale and partial credit model, that employs a model-based recursive partitioning algorithm. In contrast to existing procedures, with this approach no pre-specification of reference and focal groups is necessary, because they are detected in a data-driven way. The resulting groups are characterized by (combinations of) covariates and thus directly interpretable. The statistical background and construction of the new procedures are introduced along with an instructive example. Four simulation studies illustrate and compare their statistical properties to the well-established likelihood ratio test (LRT). While both the LRT and the new procedures respect a given significance level, the new procedures are in most cases equally (simple DIF groups) or more powerful (complex DIF groups) and can also detect DSF. The sensitivity to model misspecification is investigated. An application example with empirical data illustrates the practical use. A software implementation of the new procedures is freely available in the R system for statistical computing

    Multilingual Twitter Sentiment Classification: The Role of Human Annotators

    Get PDF
    What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered

    Detection of Uniform and Non-Uniform Differential Item Functioning by Item Focussed Trees

    Get PDF
    Detection of differential item functioning by use of the logistic modelling approach has a long tradition. One big advantage of the approach is that it can be used to investigate non-uniform DIF as well as uniform DIF. The classical approach allows to detect DIF by distinguishing between multiple groups. We propose an alternative method that is a combination of recursive partitioning methods (or trees) and logistic regression methodology to detect uniform and non-uniform DIF in a nonparametric way. The output of the method are trees that visualize in a simple way the structure of DIF in an item showing which variables are interacting in which way when generating DIF. In addition we consider a logistic regression method in which DIF can by induced by a vector of covariates, which may include categorical but also continuous covariates. The methods are investigated in simulation studies and illustrated by two applications.Comment: 32 pages, 13 figures, 7 table

    Stepup procedures for control of generalizations of the familywise error rate

    Full text link
    Consider the multiple testing problem of testing null hypotheses H1,...,HsH_1,...,H_s. A classical approach to dealing with the multiplicity problem is to restrict attention to procedures that control the familywise error rate (FWER\mathit{FWER}), the probability of even one false rejection. But if ss is large, control of the FWER\mathit{FWER} is so stringent that the ability of a procedure that controls the FWER\mathit{FWER} to detect false null hypotheses is limited. It is therefore desirable to consider other measures of error control. This article considers two generalizations of the FWER\mathit{FWER}. The first is the kFWERk-\mathit{FWER}, in which one is willing to tolerate kk or more false rejections for some fixed k1k\geq 1. The second is based on the false discovery proportion (FDP\mathit{FDP}), defined to be the number of false rejections divided by the total number of rejections (and defined to be 0 if there are no rejections). Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289--300] proposed control of the false discovery rate (FDR\mathit{FDR}), by which they meant that, for fixed α\alpha, E(FDP)αE(\mathit{FDP})\leq\alpha. Here, we consider control of the FDP\mathit{FDP} in the sense that, for fixed γ\gamma and α\alpha, P{FDP>γ}αP\{\mathit{FDP}>\gamma\}\leq \alpha. Beginning with any nondecreasing sequence of constants and pp-values for the individual tests, we derive stepup procedures that control each of these two measures of error control without imposing any assumptions on the dependence structure of the pp-values. We use our results to point out a few interesting connections with some closely related stepdown procedures. We then compare and contrast two FDP\mathit{FDP}-controlling procedures obtained using our results with the stepup procedure for control of the FDR\mathit{FDR} of Benjamini and Yekutieli [Ann. Statist. 29 (2001) 1165--1188].Comment: Published at http://dx.doi.org/10.1214/009053606000000461 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Controlling the False Discovery Rate in Astrophysical Data Analysis

    Get PDF
    The False Discovery Rate (FDR) is a new statistical procedure to control the number of mistakes made when performing multiple hypothesis tests, i.e. when comparing many data against a given model hypothesis. The key advantage of FDR is that it allows one to a priori control the average fraction of false rejections made (when comparing to the null hypothesis) over the total number of rejections performed. We compare FDR to the standard procedure of rejecting all tests that do not match the null hypothesis above some arbitrarily chosen confidence limit, e.g. 2 sigma, or at the 95% confidence level. When using FDR, we find a similar rate of correct detections, but with significantly fewer false detections. Moreover, the FDR procedure is quick and easy to compute and can be trivially adapted to work with correlated data. The purpose of this paper is to introduce the FDR procedure to the astrophysics community. We illustrate the power of FDR through several astronomical examples, including the detection of features against a smooth one-dimensional function, e.g. seeing the ``baryon wiggles'' in a power spectrum of matter fluctuations, and source pixel detection in imaging data. In this era of large datasets and high precision measurements, FDR provides the means to adaptively control a scientifically meaningful quantity -- the number of false discoveries made when conducting multiple hypothesis tests.Comment: 15 pages, 9 figures. Submitted to A

    Measurement in marketing

    Get PDF
    We distinguish three senses of the concept of measurement (measurement as the selection of observable indicators of theoretical concepts, measurement as the collection of data from respondents, and measurement as the formulation of measurement models linking observable indicators to latent factors representing the theoretical concepts), and we review important issues related to measurement in each of these senses. With regard to measurement in the first sense, we distinguish the steps of construct definition and item generation, and we review scale development efforts reported in three major marketing journals since 2000 to illustrate these steps and derive practical guidelines. With regard to measurement in the second sense, we look at the survey process from the respondent's perspective and discuss the goals that may guide participants' behavior during a survey, the cognitive resources that respondents devote to answering survey questions, and the problems that may occur at the various steps of the survey process. Finally, with regard to measurement in the third sense, we cover both reflective and formative measurement models, and we explain how researchers can assess the quality of measurement in both types of measurement models and how they can ascertain the comparability of measurements across different populations of respondents or conditions of measurement. We also provide a detailed empirical example of measurement analysis for reflective measurement models

    Invariant Causal Prediction for Sequential Data

    Full text link
    We investigate the problem of inferring the causal predictors of a response YY from a set of dd explanatory variables (X1,,Xd)(X^1,\dots,X^d). Classical ordinary least squares regression includes all predictors that reduce the variance of YY. Using only the causal predictors instead leads to models that have the advantage of remaining invariant under interventions, loosely speaking they lead to invariance across different "environments" or "heterogeneity patterns". More precisely, the conditional distribution of YY given its causal predictors remains invariant for all observations. Recent work exploits such a stability to infer causal relations from data with different but known environments. We show that even without having knowledge of the environments or heterogeneity pattern, inferring causal relations is possible for time-ordered (or any other type of sequentially ordered) data. In particular, this allows detecting instantaneous causal relations in multivariate linear time series which is usually not the case for Granger causality. Besides novel methodology, we provide statistical confidence bounds and asymptotic detection results for inferring causal predictors, and present an application to monetary policy in macroeconomics.Comment: 55 page

    Reconstructing DNA copy number by joint segmentation of multiple sequences

    Get PDF
    The variation in DNA copy number carries information on the modalities of genome evolution and misregulation of DNA replication in cancer cells; its study can be helpful to localize tumor suppressor genes, distinguish different populations of cancerous cell, as well identify genomic variations responsible for disease phenotypes. A number of different high throughput technologies can be used to identify copy number variable sites, and the literature documents multiple effective algorithms. We focus here on the specific problem of detecting regions where variation in copy number is relatively common in the sample at hand: this encompasses the cases of copy number polymorphisms, related samples, technical replicates, and cancerous sub-populations from the same individual. We present an algorithm based on regularization approaches with significant computational advantages and competitive accuracy. We illustrate its applicability with simulated and real data sets.Comment: 54 pages, 5 figure
    corecore