56,471 research outputs found
Detecting Differential Item and Step Functioning with Rating Scale and Partial Credit Trees
Several statistical procedures have been suggested for detecting
differential item functioning (DIF) and differential step
functioning (DSF) in polytomous items. However, standard
procedures are designed for the comparison of pre-specified
reference and focal groups, such as males and females.
Here, we propose a framework for the detection of DIF and DSF in
polytomous items under the rating scale and partial credit model,
that employs a model-based recursive partitioning algorithm. In contrast to existing
procedures, with this approach no pre-specification of reference
and focal groups is necessary, because they are detected in a
data-driven way. The resulting groups are characterized by
(combinations of) covariates and thus directly interpretable.
The statistical background and construction of the new procedures
are introduced along with an instructive example. Four simulation
studies illustrate and compare their statistical properties to
the well-established likelihood ratio test (LRT). While both the
LRT and the new procedures respect a given significance level,
the new procedures are in most cases equally (simple DIF groups)
or more powerful (complex DIF groups) and can also detect
DSF. The sensitivity to model misspecification is
investigated. An application example with empirical data
illustrates the practical use.
A software implementation of the new procedures is freely
available in the R system for statistical computing
Multilingual Twitter Sentiment Classification: The Role of Human Annotators
What are the limits of automated Twitter sentiment classification? We analyze
a large set of manually labeled tweets in different languages, use them as
training data, and construct automated classification models. It turns out that
the quality of classification models depends much more on the quality and size
of training data than on the type of the model trained. Experimental results
indicate that there is no statistically significant difference between the
performance of the top classification models. We quantify the quality of
training data by applying various annotator agreement measures, and identify
the weakest points of different datasets. We show that the model performance
approaches the inter-annotator agreement when the size of the training set is
sufficiently large. However, it is crucial to regularly monitor the self- and
inter-annotator agreements since this improves the training datasets and
consequently the model performance. Finally, we show that there is strong
evidence that humans perceive the sentiment classes (negative, neutral, and
positive) as ordered
Detection of Uniform and Non-Uniform Differential Item Functioning by Item Focussed Trees
Detection of differential item functioning by use of the logistic modelling
approach has a long tradition. One big advantage of the approach is that it can
be used to investigate non-uniform DIF as well as uniform DIF. The classical
approach allows to detect DIF by distinguishing between multiple groups. We
propose an alternative method that is a combination of recursive partitioning
methods (or trees) and logistic regression methodology to detect uniform and
non-uniform DIF in a nonparametric way. The output of the method are trees that
visualize in a simple way the structure of DIF in an item showing which
variables are interacting in which way when generating DIF. In addition we
consider a logistic regression method in which DIF can by induced by a vector
of covariates, which may include categorical but also continuous covariates.
The methods are investigated in simulation studies and illustrated by two
applications.Comment: 32 pages, 13 figures, 7 table
Stepup procedures for control of generalizations of the familywise error rate
Consider the multiple testing problem of testing null hypotheses
. A classical approach to dealing with the multiplicity problem is
to restrict attention to procedures that control the familywise error rate
(), the probability of even one false rejection. But if is
large, control of the is so stringent that the ability of a
procedure that controls the to detect false null hypotheses is
limited. It is therefore desirable to consider other measures of error control.
This article considers two generalizations of the . The first is
the , in which one is willing to tolerate or more false
rejections for some fixed . The second is based on the false discovery
proportion (), defined to be the number of false rejections
divided by the total number of rejections (and defined to be 0 if there are no
rejections). Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995)
289--300] proposed control of the false discovery rate (), by
which they meant that, for fixed , . Here,
we consider control of the in the sense that, for fixed
and , . Beginning with any
nondecreasing sequence of constants and -values for the individual tests, we
derive stepup procedures that control each of these two measures of error
control without imposing any assumptions on the dependence structure of the
-values. We use our results to point out a few interesting connections with
some closely related stepdown procedures. We then compare and contrast two
-controlling procedures obtained using our results with the
stepup procedure for control of the of Benjamini and Yekutieli
[Ann. Statist. 29 (2001) 1165--1188].Comment: Published at http://dx.doi.org/10.1214/009053606000000461 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Controlling the False Discovery Rate in Astrophysical Data Analysis
The False Discovery Rate (FDR) is a new statistical procedure to control the
number of mistakes made when performing multiple hypothesis tests, i.e. when
comparing many data against a given model hypothesis. The key advantage of FDR
is that it allows one to a priori control the average fraction of false
rejections made (when comparing to the null hypothesis) over the total number
of rejections performed. We compare FDR to the standard procedure of rejecting
all tests that do not match the null hypothesis above some arbitrarily chosen
confidence limit, e.g. 2 sigma, or at the 95% confidence level. When using FDR,
we find a similar rate of correct detections, but with significantly fewer
false detections. Moreover, the FDR procedure is quick and easy to compute and
can be trivially adapted to work with correlated data. The purpose of this
paper is to introduce the FDR procedure to the astrophysics community. We
illustrate the power of FDR through several astronomical examples, including
the detection of features against a smooth one-dimensional function, e.g.
seeing the ``baryon wiggles'' in a power spectrum of matter fluctuations, and
source pixel detection in imaging data. In this era of large datasets and high
precision measurements, FDR provides the means to adaptively control a
scientifically meaningful quantity -- the number of false discoveries made when
conducting multiple hypothesis tests.Comment: 15 pages, 9 figures. Submitted to A
Measurement in marketing
We distinguish three senses of the concept of measurement (measurement as the selection of observable indicators of theoretical concepts, measurement as the collection of data from respondents, and measurement as the formulation of measurement models linking observable indicators to latent factors representing the theoretical concepts), and we review important issues related to measurement in each of these senses. With regard to measurement in the first sense, we distinguish the steps of construct definition and item generation, and we review scale development efforts reported in three major marketing journals since 2000 to illustrate these steps and derive practical guidelines. With regard to measurement in the second sense, we look at the survey process from the respondent's perspective and discuss the goals that may guide participants' behavior during a survey, the cognitive resources that respondents devote to answering survey questions, and the problems that may occur at the various steps of the survey process. Finally, with regard to measurement in the third sense, we cover both reflective and formative measurement models, and we explain how researchers can assess the quality of measurement in both types of measurement models and how they can ascertain the comparability of measurements across different populations of respondents or conditions of measurement. We also provide a detailed empirical example of measurement analysis for reflective measurement models
Invariant Causal Prediction for Sequential Data
We investigate the problem of inferring the causal predictors of a response
from a set of explanatory variables . Classical
ordinary least squares regression includes all predictors that reduce the
variance of . Using only the causal predictors instead leads to models that
have the advantage of remaining invariant under interventions, loosely speaking
they lead to invariance across different "environments" or "heterogeneity
patterns". More precisely, the conditional distribution of given its causal
predictors remains invariant for all observations. Recent work exploits such a
stability to infer causal relations from data with different but known
environments. We show that even without having knowledge of the environments or
heterogeneity pattern, inferring causal relations is possible for time-ordered
(or any other type of sequentially ordered) data. In particular, this allows
detecting instantaneous causal relations in multivariate linear time series
which is usually not the case for Granger causality. Besides novel methodology,
we provide statistical confidence bounds and asymptotic detection results for
inferring causal predictors, and present an application to monetary policy in
macroeconomics.Comment: 55 page
Reconstructing DNA copy number by joint segmentation of multiple sequences
The variation in DNA copy number carries information on the modalities of
genome evolution and misregulation of DNA replication in cancer cells; its
study can be helpful to localize tumor suppressor genes, distinguish different
populations of cancerous cell, as well identify genomic variations responsible
for disease phenotypes. A number of different high throughput technologies can
be used to identify copy number variable sites, and the literature documents
multiple effective algorithms. We focus here on the specific problem of
detecting regions where variation in copy number is relatively common in the
sample at hand: this encompasses the cases of copy number polymorphisms,
related samples, technical replicates, and cancerous sub-populations from the
same individual. We present an algorithm based on regularization approaches
with significant computational advantages and competitive accuracy. We
illustrate its applicability with simulated and real data sets.Comment: 54 pages, 5 figure
- …