192 research outputs found
Practical selection of SVM parameters and noise estimation for SVM regression”, Neural
Abstract We investigate practical selection of hyper-parameters for support vector machines (SVM) regression (that is, 1-insensitive zone and regularization parameter C). The proposed methodology advocates analytic parameter selection directly from the training data, rather than re-sampling approaches commonly used in SVM applications. In particular, we describe a new analytical prescription for setting the value of insensitive zone 1; as a function of training sample size. Good generalization performance of the proposed parameter selection is demonstrated empirically using several low-and high-dimensional regression problems. Further, we point out the importance of Vapnik's 1-insensitive loss for regression problems with finite samples. To this end, we compare generalization performance of SVM regression (using proposed selection of 1-values) with regression using 'least-modulus' loss Ă°1 ÂĽ 0Ăž and standard squared loss. These comparisons indicate superior generalization performance of SVM regression under sparse sample settings, for various types of additive noise.
Use of data mining tools for cut soil slope condition state identification
Introduction: Transportation systems play a fundamental rule in nowadays society. Indeed, every developed or in development
country had invested and keep investing to build a complete, safe and functional transportation network. Now, the
main concern, particularly for developed countries, is to keep it operational under all security conditions. However,
due to the network extension and increased budget constraints, such task is difficult to accomplish. In the framework
of transportations networks, particularly for highway and railway, slopes are perhaps the element for which its
failure can have a strongest impact at several levels. Although there are some models and systems to detect slop
failures, most of them were developed for natural slopes, presenting some constrains when applied to man-made
slopes. Moreover, most of the existent systems were developed based on particular case studies or require
information gathered from complex/expensive tests, which can represent an important applicability limitation.
Aiming to overcome this drawback, we are taking advantage of the learning capabilities of flexible DM
algorithms, such Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), which can model
complex nonlinear mappings. Both algorithms were fitted to predict the condition state of a given slope according to
a pre-defined classification scale contemplating four levels (classes). One of the premises of this work is to try to
identify the real condition state of a given slop using information collected during routine inspections complemented
with geometric, geologic and geographic data
Cylindrical roller bearing fault diagnosis based on VMD-SVD and Adaboost classifier method
Fault diagnosis for cylindrical roller bearing is of great significance for industry. In order to excavate the features of the vibration signal adequately, and to construct an effective classifier for complex vibration signals, this paper proposed a new fault diagnosis method based on Variational Mode Decomposition (VMD), Singular Value Decomposition (SVD) and Adaboost classifier. Firstly, the VMD was applied to decompose the sampled vibration signal in time-frequency domain. Subsequently, the features were extracted by using SVD. Finally, the constructed Adaboost classifier were employed to fault detection and diagnosis, which were trained by using the extracted features. Experimental data measured in a rotating machinery fault diagnosis experiment platform was used to verify the proposed method. The results demonstrate that the proposed method was effective to detect and diagnose the outer ring fault and rolling element fault in cylindrical roller bearing
Feature Extraction for Murmur Detection Based on Support Vector Regression of Time-Frequency Representations
This paper presents a nonlinear approach for time-frequency representations (TFR) data analysis, based on a statistical learning methodology - support vector regression(SVR), that being a nonlinear framework, matches recent findings on the underlying dynamics of cardiac mechanic activity and phonocardiographic (PCG) recordings. The proposed methodology aims to model the estimated TFRs, and extract relevant features to perform classification between normal and pathologic PCG recordings (with murmur). Modeling of TFR is done by means of SVR, and the distance between regressions is calculated through dissimilarity measures based on dot product. Finally, a k-nn classifier is used for the classification stage, obtaining a validation performance of 97.85%
Support Vector Machine Approach for Non-Technical Losses Identification in Power Distribution Systems
Electricity consumer fraud is a problem faced by all power utilities. Finding efficient measurements for detecting fraudulent electricity consumption has been an active research area in recent years. In this paper,the approach towards nontechnical loss (NTL) detection in power utilities using an artificial intelligence based technique, Support Vector Machine (SVM), are presented. This approach provides a method of data mining, which involves feature extraction from past consumption data. This SVM based approach uses customer load profile information and additional attributes to expose abnormal behavior that is known to be highly correlated with NTL activities. Some key advantages of SVM in data clustering, among which is the easy way of using them to fit the data of a wide range of features are discussed here. Finally, some major weakness of using SVM in clustering for NTL identification are identified, which leads to motivate for the scope of Optimum-Path Forest, a new model of NTL identification
Uplift Modeling with Multiple Treatments and General Response Types
Randomized experiments have been used to assist decision-making in many
areas. They help people select the optimal treatment for the test population
with certain statistical guarantee. However, subjects can show significant
heterogeneity in response to treatments. The problem of customizing treatment
assignment based on subject characteristics is known as uplift modeling,
differential response analysis, or personalized treatment learning in
literature. A key feature for uplift modeling is that the data is unlabeled. It
is impossible to know whether the chosen treatment is optimal for an individual
subject because response under alternative treatments is unobserved. This
presents a challenge to both the training and the evaluation of uplift models.
In this paper we describe how to obtain an unbiased estimate of the key
performance metric of an uplift model, the expected response. We present a new
uplift algorithm which creates a forest of randomized trees. The trees are
built with a splitting criterion designed to directly optimize their uplift
performance based on the proposed evaluation method. Both the evaluation method
and the algorithm apply to arbitrary number of treatments and general response
types. Experimental results on synthetic data and industry-provided data show
that our algorithm leads to significant performance improvement over other
applicable methods
- …