356,798 research outputs found
Psychometrics in Practice at RCEC
A broad range of topics is dealt with in this volume: from combining the psychometric generalizability and item response theories to the ideas for an integrated formative use of data-driven decision making, assessment for learning and diagnostic testing. A number of chapters pay attention to computerized (adaptive) and classification testing. Other chapters treat the quality of testing in a general sense, but for topics like maintaining standards or the testing of writing ability, the quality of testing is dealt with more specifically.\ud
All authors are connected to RCEC as researchers. They present one of their current research topics and provide some insight into the focus of RCEC. The selection of the topics and the editing intends that the book should be of special interest to educational researchers, psychometricians and practitioners in educational assessment
Transparency in Complex Computational Systems
Scientists depend on complex computational systems that are often ineliminably opaque, to the detriment of our ability to give scientific explanations and detect artifacts. Some philosophers have s..
CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks
Data quality affects machine learning (ML) model performances, and data
scientists spend considerable amount of time on data cleaning before model
training. However, to date, there does not exist a rigorous study on how
exactly cleaning affects ML -- ML community usually focuses on developing ML
algorithms that are robust to some particular noise types of certain
distributions, while database (DB) community has been mostly studying the
problem of data cleaning alone without considering how data is consumed by
downstream ML analytics. We propose a CleanML study that systematically
investigates the impact of data cleaning on ML classification tasks. The
open-source and extensible CleanML study currently includes 14 real-world
datasets with real errors, five common error types, seven different ML models,
and multiple cleaning algorithms for each error type (including both commonly
used algorithms in practice as well as state-of-the-art solutions in academic
literature). We control the randomness in ML experiments using statistical
hypothesis testing, and we also control false discovery rate in our experiments
using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a
systematic way to derive many interesting and nontrivial observations. We also
put forward multiple research directions for researchers.Comment: published in ICDE 202
Object Classification in Astronomical Multi-Color Surveys
We present a photometric method for identifying stars, galaxies and quasars
in multi-color surveys, which uses a library of >65000 color templates. The
method aims for extracting the information content of object colors in a
statistically correct way and performs a classification as well as a redshift
estimation for galaxies and quasars in a unified approach. For the redshift
estimation, we use an advanced version of the MEV estimator which determines
the redshift error from the redshift dependent probability density function.
The method was originally developed for the CADIS survey, where we checked
its performance by spectroscopy. The method provides high reliability (6 errors
among 151 objects with R<24), especially for quasar selection, and redshifts
accurate within sigma ~ 0.03 for galaxies and sigma ~ 0.1 for quasars.
We compare a few model surveys using the same telescope time but different
sets of broad-band and medium-band filters. Their performance is investigated
by Monte-Carlo simulations as well as by analytic evaluation in terms of
classification and redshift estimation. In practice, medium-band surveys show
superior performance. Finally, we discuss the relevance of color calibration
and derive important conclusions for the issues of library design and choice of
filters. The calibration accuracy poses strong constraints on an accurate
classification, and is most critical for surveys with few, broad and deeply
exposed filters, but less severe for many, narrow and less deep filters.Comment: 21 pages including 10 figures. Accepted for publication in Astronomy
& Astrophysic
An Overview of Classifier Fusion Methods
A number of classifier fusion methods have been
recently developed opening an alternative approach
leading to a potential improvement in the
classification performance. As there is little theory of
information fusion itself, currently we are faced with
different methods designed for different problems and
producing different results. This paper gives an
overview of classifier fusion methods and attempts to
identify new trends that may dominate this area of
research in future. A taxonomy of fusion methods
trying to bring some order into the existing âpudding
of diversitiesâ is also provided
Visual Integration of Data and Model Space in Ensemble Learning
Ensembles of classifier models typically deliver superior performance and can
outperform single classifier models given a dataset and classification task at
hand. However, the gain in performance comes together with the lack in
comprehensibility, posing a challenge to understand how each model affects the
classification outputs and where the errors come from. We propose a tight
visual integration of the data and the model space for exploring and combining
classifier models. We introduce a workflow that builds upon the visual
integration and enables the effective exploration of classification outputs and
models. We then present a use case in which we start with an ensemble
automatically selected by a standard ensemble selection algorithm, and show how
we can manipulate models and alternative combinations.Comment: 8 pages, 7 picture
- âŠ