Search CORE

109,009 research outputs found

Making inferences with small numbers of training sets

Author: Albrecht
Boehm
C. Kirsopp
Ebert
Efron
Kadoda
Kemerer
Kitchenham
Kitchenham
Kok
M. Shepperd
MacDonell
Mair
Miyazaki
Shepperd
Shepperd
Srinivasan
Walston
Wittig
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2002
Field of study

A potential methodological problem with empirical studies that assess project effort prediction system is discussed. Frequently, a hold-out strategy is deployed so that the data set is split into a training and a validation set. Inferences are then made concerning the relative accuracy of the different prediction techniques under examination. This is typically done on very small numbers of sampled training sets. It is shown that such studies can lead to almost random results (particularly where relatively small effects are being studied). To illustrate this problem, two data sets are analysed using a configuration problem for case-based prediction and results generated from 100 training sets. This enables results to be produced with quantified confidence limits. From this it is concluded that in both cases using less than five training sets leads to untrustworthy results, and ideally more than 20 sets should be deployed. Unfortunately, this raises a question over a number of empirical validations of prediction techniques, and so it is suggested that further research is needed as a matter of urgency

CiteSeerX

Crossref

Brunel University Research Archive

Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality

Author: Drechsler Jörg
Reiter Jerome P.
Publication venue
Publication date
Field of study

"To protect the cofidentiality of survey respondents' identities and sensitive attributes, statistical agencies can release data in which cofidential values are replaced with multiple imputations. These are called synthetic data. We propose a two-stage approach to generating synthetic data that enables agencies to release different numbers of imputations for different variables. Generation in two stages can reduce computational burdens, decrease disclosure risk, and increase inferential accuracy relative to generation in one stage. We present methods for obtaining inferences from such data. We describe the application of two stage synthesis to creating a public use file for a German business database." (Author's abstract, IAB-Doku) ((en))IAB-Betriebspanel, Datenaufbereitung, Datenanonymisierung, Datenschutz, angewandte Statistik, statistische Methode, Arbeitsmarktforschung, Imputationsverfahren

Research Papers in Economics

Fuzzy Interval-Valued Multi Criteria Based Decision Making for Ranking Features in Multi-Modal 3D Face Recognition

Author: Ramalingam Soodamani
Publication venue: 'Elsevier BV'
Publication date: 13/06/2017
Field of study

Soodamani Ramalingam, 'Fuzzy interval-valued multi criteria based decision making for ranking features in multi-modal 3D face recognition', Fuzzy Sets and Systems, In Press version available online 13 June 2017. This is an Open Access paper, made available under the Creative Commons license CC BY 4.0 https://creativecommons.org/licenses/by/4.0/This paper describes an application of multi-criteria decision making (MCDM) for multi-modal fusion of features in a 3D face recognition system. A decision making process is outlined that is based on the performance of multi-modal features in a face recognition task involving a set of 3D face databases. In particular, the fuzzy interval valued MCDM technique called TOPSIS is applied for ranking and deciding on the best choice of multi-modal features at the decision stage. It provides a formal mechanism of benchmarking their performances against a set of criteria. The technique demonstrates its ability in scaling up the multi-modal features.Peer reviewedProo

University of Hertfordshire Research Archive

An Algorithmic Framework for Computing Validation Performance Bounds by Using Suboptimal Models

Author: Ogawa Kohei
Shinmura Yuki
Suzuki Yoshiki
Takeuchi Ichiro
Publication venue
Publication date: 10/02/2014
Field of study

Practical model building processes are often time-consuming because many different models must be trained and validated. In this paper, we introduce a novel algorithm that can be used for computing the lower and the upper bounds of model validation errors without actually training the model itself. A key idea behind our algorithm is using a side information available from a suboptimal model. If a reasonably good suboptimal model is available, our algorithm can compute lower and upper bounds of many useful quantities for making inferences on the unknown target model. We demonstrate the advantage of our algorithm in the context of model selection for regularized learning problems

arXiv.org e-Print Archive

CiteSeerX