25 research outputs found
The United States COVID-19 Forecast Hub dataset
Academic researchers, government agencies, industry groups, and individuals have produced forecasts at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United States Centers for Disease Control and Prevention (CDC) partnered with an academic research lab at the University of Massachusetts Amherst to create the US COVID-19 Forecast Hub. Launched in April 2020, the Forecast Hub is a dataset with point and probabilistic forecasts of incident cases, incident hospitalizations, incident deaths, and cumulative deaths due to COVID-19 at county, state, and national, levels in the United States. Included forecasts represent a variety of modeling approaches, data sources, and assumptions regarding the spread of COVID-19. The goal of this dataset is to establish a standardized and comparable set of short-term forecasts from modeling teams. These data can be used to develop ensemble models, communicate forecasts to the public, create visualizations, compare models, and inform policies regarding COVID-19 mitigation. These open-source data are available via download from GitHub, through an online API, and through R packages
An approach to assessing unidimensionality revisited
A reanalysis of data from Hambleton and Rovinelli
(1986) argues that the methods suggested by Bejar
(1980) are a valuable descriptive tool for assessing the
unidimensionality assumption when a priori information
is available about possible response factors. Index
terms: achievement testing, item response theory, unidimensionality.Bejar, Isaac I.. (1988). An approach to assessing unidimensionality revisited. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/104312
Subject matter experts' assessment of item statistics
This study was conducted to determine the degree
to which subject matter experts could predict the difficulty
and discrimination of items on the Test of Standard
Written English. It was concluded that despite an
extended training period the raters did not approach a
high level of accuracy, nor were they able to pinpoint
the factors that contribute to item difficulty and discrimination.
Further research should attempt to uncover
those factors by examining the items from a linguistic
and psycholinguistic perspective. It is argued
that by coupling linguistic features of the items with
subject matter ratings it may be possible to attain more
accurate predictions of item difficulty and discrimination
An application of the continuous response level model to personality measurement
This paper reports an application of Samejima’s
latent trait model for continuous responses. A brief
review of latent trait theory is presented, including
an elaboration of the theory for test responses other
than dichotomous responses, in order to put the
continuous model in perspective. The model is then
applied using the Impulsivity and Harmavoidance
scales of Jackson’s Personality Research Form.
Special attention is given to the requirement that
the model be invariant across populations and sex
groups. Results showed that responses from males
fit the model better than those from females, especially
for the Harmavoidance scale. The practical
and theoretical implications of the study are discussed
Towards automatic scoring of non-native spontaneous speech
This paper investigates the feasibility of automated scoring of spoken English proficiency of non-native speakers. Unlike existing automated assessments of spoken English, our data consists of spontaneous spoken responses to complex test items. We first compute a set of features relevant for measuring communicative competence based on speech recognition output. We then perform both a quantitative and a qualitative analysis of these features using two different machine learning approaches. (1) We use support vector machines to produce a score and evaluate it with respect to a mode baseline and to human rater agreement. We find that scoring based on support vector machines yields accuracies approaching inter-rater agreement in some cases. (2) We use classification and regression trees to understand the role of different features and feature classes in the characterization of speaking proficiency by human scorers. Our analysis shows that across all the test items most or all the feature classes are used in the nodes of the trees suggesting that the scores are, appropriately, a combination of multiple components of speaking proficiency. Future research will concentrate on extending the set of features and introducing new feature classes to arrive at a scoring model that comprises additional relevant aspects of speaking proficiency
A study of pre-equating based on item response theory
The study reports a feasibility study using item
response theory (IRT) as a means of equating the
Test of Standard Written English (TSWE). The
study focused on the possibility of pre-equating,
that is, deriving the equating transformation prior
to the final administration of the test. The three-parameter
logistic model was postulated as the response
model and its fit was assessed at the item,
subscore, and total score level. Minor problems
were found at each of these levels; but, on the
whole, the three-parameter model was found to portray
the data well. The adequacy of the equating
provided by IRT procedures was investigated in two
TSWE forms. It was concluded that pre-equating
does not appear to present problems beyond those
inherent to IRT-equating.Bejar, Isaac I.; Wingersky, Marilyn S.. (1982). A study of pre-equating based on item response theory. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/101475
Factorial invariance in student ratings on instruction
The factorial invariance of student ratings of instruction
across three curricular areas was investigated
by means of maximum likelihood factor
analysis. The results indicate that a one-factor
model was not completely adequate from a statistical
point of view. Nevertheless, a single factor was
accepted as reasonable from a practical point of
view. It was concluded that the single factor was invariant
across three curricular groups. The reliability
of the single factor was essentially the same in
the three groups, but in every case it was very high.
Some of the theoretical and practical implications
of the study were discussed.Bejar, Isaac I.; Doyle, Kenneth O.. (1981). Factorial invariance in student ratings on instruction. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/100400