Search CORE

356,798 research outputs found

Psychometrics in Practice at RCEC

Author: Eggen T.J.H.M.
Veldkamp B.P.
Publication venue: Ipskamp Drukkers
Publication date: 01/01/2012
Field of study

A broad range of topics is dealt with in this volume: from combining the psychometric generalizability and item response theories to the ideas for an integrated formative use of data-driven decision making, assessment for learning and diagnostic testing. A number of chapters pay attention to computerized (adaptive) and classification testing. Other chapters treat the quality of testing in a general sense, but for topics like maintaining standards or the testing of writing ability, the quality of testing is dealt with more specifically.\ud All authors are connected to RCEC as researchers. They present one of their current research topics and provide some insight into the focus of RCEC. The selection of the topics and the editing intends that the book should be of special interest to educational researchers, psychometricians and practitioners in educational assessment

University of Twente Research Information

Transparency in Complex Computational Systems

Author: Creel Kathleen A.
Publication venue
Publication date: 28/11/2019
Field of study

Scientists depend on complex computational systems that are often ineliminably opaque, to the detriment of our ability to give scientific explanations and detect artifacts. Some philosophers have s..

PhilPapers

PhilSci Archive

CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks

Author: Blase Jennifer
Chu Xu
Li Peng
Rao Xi
Zhang Ce
Zhang Yue
Publication venue
Publication date: 01/01/2020
Field of study

Data quality affects machine learning (ML) model performances, and data scientists spend considerable amount of time on data cleaning before model training. However, to date, there does not exist a rigorous study on how exactly cleaning affects ML -- ML community usually focuses on developing ML algorithms that are robust to some particular noise types of certain distributions, while database (DB) community has been mostly studying the problem of data cleaning alone without considering how data is consumed by downstream ML analytics. We propose a CleanML study that systematically investigates the impact of data cleaning on ML classification tasks. The open-source and extensible CleanML study currently includes 14 real-world datasets with real errors, five common error types, seven different ML models, and multiple cleaning algorithms for each error type (including both commonly used algorithms in practice as well as state-of-the-art solutions in academic literature). We control the randomness in ML experiments using statistical hypothesis testing, and we also control false discovery rate in our experiments using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a systematic way to derive many interesting and nontrivial observations. We also put forward multiple research directions for researchers.Comment: published in ICDE 202

arXiv.org e-Print Archive

Repository for Publications and Research Data

Object Classification in Astronomical Multi-Color Surveys

Author: Abramowitz
Bahcall
Baum
Boyle
C. Wolf
Francis
Gunn
H.-J. Röser
Hartwick
K. Meisenheimer
Kinney
Lanzetta
Metcalfe
Oke
Phleps
Pickles
Schmidt
Steidel
Storrie-Lombardi
Warren
Wolf
Wolf
Wolf
York
Publication venue: 'EDP Sciences'
Publication date: 04/10/2000
Field of study

We present a photometric method for identifying stars, galaxies and quasars in multi-color surveys, which uses a library of >65000 color templates. The method aims for extracting the information content of object colors in a statistically correct way and performs a classification as well as a redshift estimation for galaxies and quasars in a unified approach. For the redshift estimation, we use an advanced version of the MEV estimator which determines the redshift error from the redshift dependent probability density function. The method was originally developed for the CADIS survey, where we checked its performance by spectroscopy. The method provides high reliability (6 errors among 151 objects with R<24), especially for quasar selection, and redshifts accurate within sigma ~ 0.03 for galaxies and sigma ~ 0.1 for quasars. We compare a few model surveys using the same telescope time but different sets of broad-band and medium-band filters. Their performance is investigated by Monte-Carlo simulations as well as by analytic evaluation in terms of classification and redshift estimation. In practice, medium-band surveys show superior performance. Finally, we discuss the relevance of color calibration and derive important conclusions for the issues of library design and choice of filters. The calibration accuracy poses strong constraints on an accurate classification, and is most critical for surveys with few, broad and deeply exposed filters, but less severe for many, narrow and less deep filters.Comment: 21 pages including 10 figures. Accepted for publication in Astronomy & Astrophysic

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

CERN Document Server

An Overview of Classifier Fusion Methods

Author: Gabrys Bogdan
Ruta Dymitr
Publication venue
Publication date: 01/01/2000
Field of study

A number of classifier fusion methods have been recently developed opening an alternative approach leading to a potential improvement in the classification performance. As there is little theory of information fusion itself, currently we are faced with different methods designed for different problems and producing different results. This paper gives an overview of classifier fusion methods and attempts to identify new trends that may dominate this area of research in future. A taxonomy of fusion methods trying to bring some order into the existing “pudding of diversities” is also provided

CiteSeerX

Bournemouth University Research Online

Visual Integration of Data and Model Space in Ensemble Learning

Author: Diehl Alexandra
Fuchs Johannes
Jäckle Dominik
Keim Daniel
Schneider Bruno
Stoffel Florian
Publication venue
Publication date: 01/01/2017
Field of study

Ensembles of classifier models typically deliver superior performance and can outperform single classifier models given a dataset and classification task at hand. However, the gain in performance comes together with the lack in comprehensibility, posing a challenge to understand how each model affects the classification outputs and where the errors come from. We propose a tight visual integration of the data and the model space for exploring and combining classifier models. We introduce a workflow that builds upon the visual integration and enables the effective exploration of classification outputs and models. We then present a use case in which we start with an ensemble automatically selected by a standard ensemble selection algorithm, and show how we can manipulate models and alternative combinations.Comment: 8 pages, 7 picture

arXiv.org e-Print Archive

Crossref