315,629 research outputs found
Quantification of the Impact of Feature Selection on the Variance of Cross-Validation Error Estimation
<p/> <p>Given the relatively small number of microarrays typically used in gene-expression-based classification, all of the data must be used to train a classifier and therefore the same training data is used for error estimation. The key issue regarding the quality of an error estimator in the context of small samples is its accuracy, and this is most directly analyzed via the deviation distribution of the estimator, this being the distribution of the difference between the estimated and true errors. Past studies indicate that given a prior set of features, cross-validation does not perform as well in this regard as some other training-data-based error estimators. The purpose of this study is to quantify the degree to which feature selection increases the variation of the deviation distribution in addition to the variation in the absence of feature selection. To this end, we propose the coefficient of relative increase in deviation dispersion (CRIDD), which gives the relative increase in the deviation-distribution variance using feature selection as opposed to using an optimal feature set without feature selection. The contribution of feature selection to the variance of the deviation distribution can be significant, contributing to over half of the variance in many of the cases studied. We consider linear-discriminant analysis, 3-nearest-neighbor, and linear support vector machines for classification; sequential forward selection, sequential forward floating selection, and the <inline-formula><graphic file="1687-4153-2007-16354-i1.gif"/></inline-formula>-test for feature selection; and <inline-formula><graphic file="1687-4153-2007-16354-i2.gif"/></inline-formula>-fold and leave-one-out cross-validation for error estimation. We apply these to three feature-label models and patient data from a breast cancer study. In sum, the cross-validation deviation distribution is significantly flatter when there is feature selection, compared with the case when cross-validation is performed on a given feature set. This is reflected by the observed positive values of the CRIDD, which is defined to quantify the contribution of feature selection towards the deviation variance.</p
Small Sample Issues for Microarray-Based Classification
In order to study the molecular biological differences between normal and diseased tissues,
it is desirable to perform classification among diseases and stages of disease using
microarray-based gene-expression values. Owing to the limited number of microarrays
typically used in these studies, serious issues arise with respect to the design, performance
and analysis of classifiers based on microarray data. This paper reviews some fundamental
issues facing small-sample classification: classification rules, constrained classifiers, error
estimation and feature selection. It discusses both unconstrained and constrained classifier
design from sample data, and the contributions to classifier error from constrained
optimization and lack of optimality owing to design from sample data. The difficulty with
estimating classifier error when confined to small samples is addressed, particularly
estimating the error from training data. The impact of small samples on the ability to
include more than a few variables as classifier features is explained
Croatian Emotional Speech Analyses on a Basis of Acoustic and Linguistic Features
Acoustic and linguistic speech features are used for emotional state estimation of utterances collected within the Croatian emotional speech corpus. Analyses are performed for the classification of 5 discrete emotions, i.e. happiness, sadness, fear, anger and neutral state, as well as for the estimation of two emotional dimensions: valence and arousal. Acoustic and linguistic cues of emotional speech are analyzed separately, and are also combined in two types of fusion: a feature level fusion and a decision level fusion. The Random Forest method is used for all analyses, with the combination of Info Gain feature selection method for classification tasks and Univariate Linear Regression method for regression tasks. The main hypothesis is confirmed, i.e. an increase of classification accuracy is achieved in the cases of fusion analyses (compared with separate acoustic or linguistic feature sets usages), as well as a decrease of root mean squared error when estimating emotional dimensions. Most of other hypothesis are also confirmed, which suggest that acoustic and linguistic cues of Croatian language are showing similar behavior as other languages in the context of emotional impact on speech
Multi-sensor fusion based on multiple classifier systems for human activity identification
Multimodal sensors in healthcare applications have been increasingly researched because it facilitates automatic and comprehensive monitoring of human behaviors, high-intensity sports management, energy expenditure estimation, and postural detection. Recent studies have shown the importance of multi-sensor fusion to achieve robustness, high-performance generalization, provide diversity and tackle challenging issue that maybe difficult with single sensor values. The aim of this study is to propose an innovative multi-sensor fusion framework to improve human activity detection performances and reduce misrecognition rate. The study proposes a multi-view ensemble algorithm to integrate predicted values of different motion sensors. To this end, computationally efficient classification algorithms such as decision tree, logistic regression and k-Nearest Neighbors were used to implement diverse, flexible and dynamic human activity detection systems. To provide compact feature vector representation, we studied hybrid bio-inspired evolutionary search algorithm and correlation-based feature selection method and evaluate their impact on extracted feature vectors from individual sensor modality. Furthermore, we utilized Synthetic Over-sampling minority Techniques (SMOTE) algorithm to reduce the impact of class imbalance and improve performance results. With the above methods, this paper provides unified framework to resolve major challenges in human activity identification. The performance results obtained using two publicly available datasets showed significant improvement over baseline methods in the detection of specific activity details and reduced error rate. The performance results of our evaluation showed 3% to 24% improvement in accuracy, recall, precision, F-measure and detection ability (AUC) compared to single sensors and feature-level fusion. The benefit of the proposed multi-sensor fusion is the ability to utilize distinct feature characteristics of individual sensor and multiple classifier systems to improve recognition accuracy. In addition, the study suggests a promising potential of hybrid feature selection approach, diversity-based multiple classifier systems to improve mobile and wearable sensor-based human activity detection and health monitoring system. - 2019, The Author(s).This research is supported by University of Malaya BKP Special Grant no vote BKS006-2018.Scopu
A New Methodology for Quantifying the Impact of Non-Functional Requirements on Software Effort Estimation
The effort estimation techniques used in the software industry often tend to ignore the impact of Non-functional Requirements (NFR) on effort and reuse standard effort estimation models without local calibration. Moreover, the effort estimation models are calibrated using data of previous projects that may belong to problem domains different from the project which is being estimated. The approach described in this thesis suggests a novel effort estimation methodology that can be used in the early stages of software development projects. The proposed methodology initially clusters the historical data from the previous projects into different problem domains and generates domain specific effort estimation models, each incorporating the impact of NFRs on effort by sets of objectively measured nominal features. The complexity of these models is reduced using a feature subset selection algorithm. In this thesis, our approach is discussed in detail, and the results of our experiments using different supervised machine learning algorithms are presented. The results show that our approach performs well by increasing the correlation coefficient and decreasing the error rate of the generated effort estimation models and achieving more accurate effort estimates for the new projects
An investigation of machine learning based prediction systems
Traditionally, researchers have used either o�f-the-shelf models such as COCOMO, or developed local models using statistical techniques such as stepwise regression, to obtain software eff�ort estimates. More recently, attention has turned to a variety of machine learning methods such as artifcial neural networks (ANNs), case-based reasoning (CBR) and rule induction (RI). This paper outlines some comparative research into the use of these three machine learning methods to build software e�ort prediction
systems. We briefly describe each method and then apply the techniques to a dataset of 81 software projects derived from a Canadian software house in the late 1980s. We compare the prediction systems in terms of three factors: accuracy, explanatory value and configurability. We show that ANN methods have superior accuracy and that RI methods are least accurate. However, this view is somewhat counteracted by problems with explanatory value and configurability. For example, we found that considerable
eff�ort was required to configure the ANN and that this compared very unfavourably with the other techniques, particularly CBR and least squares regression (LSR). We suggest that further work be carried out, both to further explore interaction between the enduser and the prediction system, and also to facilitate configuration, particularly of ANNs
A Differential Approach for Gaze Estimation
Non-invasive gaze estimation methods usually regress gaze directions directly
from a single face or eye image. However, due to important variabilities in eye
shapes and inner eye structures amongst individuals, universal models obtain
limited accuracies and their output usually exhibit high variance as well as
biases which are subject dependent. Therefore, increasing accuracy is usually
done through calibration, allowing gaze predictions for a subject to be mapped
to his/her actual gaze. In this paper, we introduce a novel image differential
method for gaze estimation. We propose to directly train a differential
convolutional neural network to predict the gaze differences between two eye
input images of the same subject. Then, given a set of subject specific
calibration images, we can use the inferred differences to predict the gaze
direction of a novel eye sample. The assumption is that by allowing the
comparison between two eye images, annoyance factors (alignment, eyelid
closing, illumination perturbations) which usually plague single image
prediction methods can be much reduced, allowing better prediction altogether.
Experiments on 3 public datasets validate our approach which constantly
outperforms state-of-the-art methods even when using only one calibration
sample or when the latter methods are followed by subject specific gaze
adaptation.Comment: Extension to our paper A differential approach for gaze estimation
with calibration (BMVC 2018) Submitted to PAMI on Aug. 7th, 2018 Accepted by
PAMI short on Dec. 2019, in IEEE Transactions on Pattern Analysis and Machine
Intelligenc
- …