35 research outputs found

    Incremental learning algorithm based on support vector machine with Mahalanobis distance (ISVMM) for intrusion prevention

    Get PDF
    In this paper we propose a new classifier called an incremental learning algorithm based on support vector machine with Mahalanobis distance (ISVMM). Prediction of the incoming data type by supervised learning of support vector machine (SVM), reducing the step of calculation and complexity of the algorithm by finding a support set, error set and remaining set, providing of hard and soft decisions, saving the time for repeatedly training the datasets by applying the incremental learning, a new approach for building an ellipsoidal kernel for multidimensional data instead of a sphere kernel by using Mahalanobis distance, and the concept of handling the covariance matrix from dividing by zero are various features of this new algorithm. To evaluate the classification performance of the algorithm, it was applied on intrusion prevention by employing the data from the third international knowledge discovery and data mining tools competition (KDDcup'99). According to the experimental results, ISVMM can predict well on all of the 41 features of incoming datasets without even reducing the enlarged dimensions and it can compete with the similar algorithm which uses a Euclidean measurement at the kernel distance

    An incremental dual nu-support vector regression algorithm

    Full text link
    Š 2018, Springer International Publishing AG, part of Springer Nature. Support vector regression (SVR) has been a hot research topic for several years as it is an effective regression learning algorithm. Early studies on SVR mostly focus on solving large-scale problems. Nowadays, an increasing number of researchers are focusing on incremental SVR algorithms. However, these incremental SVR algorithms cannot handle uncertain data, which are very common in real life because the data in the training example must be precise. Therefore, to handle the incremental regression problem with uncertain data, an incremental dual nu-support vector regression algorithm (dual-v-SVR) is proposed. In the algorithm, a dual-v-SVR formulation is designed to handle the uncertain data at first, then we design two special adjustments to enable the dual-v-SVR model to learn incrementally: incremental adjustment and decremental adjustment. Finally, the experiment results demonstrate that the incremental dual-v-SVR algorithm is an efficient incremental algorithm which is not only capable of solving the incremental regression problem with uncertain data, it is also faster than batch or other incremental SVR algorithms

    Scalable aggregation predictive analytics: a query-driven machine learning approach

    Get PDF
    We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method

    Radio Location of Partial Discharge Sources: A Support Vector Regression Approach

    Get PDF
    Partial discharge (PD) can provide a useful forewarning of asset failure in electricity substations. A significant proportion of assets are susceptible to PD due to incipient weakness in their dielectrics. This paper examines a low cost approach for uninterrupted monitoring of PD using a network of inexpensive radio sensors to sample the spatial patterns of PD received signal strength. Machine learning techniques are proposed for localisation of PD sources. Specifically, two models based on Support Vector Machines (SVMs) are developed: Support Vector Regression (SVR) and Least-Squares Support Vector Regression (LSSVR). These models construct an explicit regression surface in a high dimensional feature space for function estimation. Their performance is compared to that of artificial neural network (ANN) models. The results show that both SVR and LSSVR methods are superior to ANNs in accuracy. LSSVR approach is particularly recommended as practical alternative for PD source localisation due to it low complexity

    Concept Drift Adaptation with Incremental–Decremental SVM

    Get PDF
    Data classification in streams where the underlying distribution changes over time is known to be difficult. This problem—known as concept drift detection—involves two aspects: (i) detecting the concept drift and (ii) adapting the classifier. Online training only considers the most recent samples; they form the so-called shifting window. Dynamic adaptation to concept drift is performed by varying the width of the window. Defining an online Support Vector Machine (SVM) classifier able to cope with concept drift by dynamically changing the window size and avoiding retraining from scratch is currently an open problem. We introduce the Adaptive Incremental–Decremental SVM (AIDSVM), a model that adjusts the shifting window width using the Hoeffding statistical test. We evaluate AIDSVM performance on both synthetic and real-world drift datasets. Experiments show a significant accuracy improvement when encountering concept drift, compared with similar drift detection models defined in the literature. The AIDSVM is efficient, since it is not retrained from scratch after the shifting window slides

    On-Line Learning and Wavelet-Based Feature Extraction Methodology for Process Monitoring using High-Dimensional Functional Data

    Get PDF
    The recent advances in information technology, such as the various automatic data acquisition systems and sensor systems, have created tremendous opportunities for collecting valuable process data. The timely processing of such data for meaningful information remains a challenge. In this research, several data mining methodology that will aid information streaming of high-dimensional functional data are developed. For on-line implementations, two weighting functions for updating support vector regression parameters were developed. The functions use parameters that can be easily set a priori with the slightest knowledge of the data involved and have provision for lower and upper bounds for the parameters. The functions are applicable to time series predictions, on-line predictions, and batch predictions. In order to apply these functions for on-line predictions, a new on-line support vector regression algorithm that uses adaptive weighting parameters was presented. The new algorithm uses varying rather than fixed regularization constant and accuracy parameter. The developed algorithm is more robust to the volume of data available for on-line training as well as to the relative position of the available data in the training sequence. The algorithm improves prediction accuracy by reducing uncertainty in using fixed values for the regression parameters. It also improves prediction accuracy by reducing uncertainty in using regression values based on some experts’ knowledge rather than on the characteristics of the incoming training data. The developed functions and algorithm were applied to feedwater flow rate data and two benchmark time series data. The results show that using adaptive regression parameters performs better than using fixed regression parameters. In order to reduce the dimension of data with several hundreds or thousands of predictors and enhance prediction accuracy, a wavelet-based feature extraction procedure called step-down thresholding procedure for identifying and extracting significant features for a single curve was developed. The procedure involves transforming the original spectral into wavelet coefficients. It is based on multiple hypothesis testing approach and it controls family-wise error rate in order to guide against selecting insignificant features without any concern about the amount of noise that may be present in the data. Therefore, the procedure is applicable for data-reduction and/or data-denoising. The procedure was compared to six other data-reduction and data-denoising methods in the literature. The developed procedure is found to consistently perform better than most of the popular methods and performs at the same level with the other methods. Many real-world data with high-dimensional explanatory variables also sometimes have multiple response variables; therefore, the selection of the fewest explanatory variables that show high sensitivity to predicting the response variable(s) and low sensitivity to the noise in the data is important for better performance and reduced computational burden. In order to select the fewest explanatory variables that can predict each of the response variables better, a two-stage wavelet-based feature extraction procedure is proposed. The first stage uses step-down procedure to extract significant features for each of the curves. Then, representative features are selected out of the extracted features for all curves using voting selection strategy. Other selection strategies such as union and intersection were also described and implemented. The essence of the first stage is to reduce the dimension of the data without any consideration for whether or not they can predict the response variables accurately. The second stage uses Bayesian decision theory approach to select some of the extracted wavelet coefficients that can predict each of the response variables accurately. The two stage procedure was implemented using near-infrared spectroscopy data and shaft misalignment data. The results show that the second stage further reduces the dimension and the prediction results are encouraging

    Incremental and Decremental SVM for Regression

    Get PDF
    Training a support vector machine (SVM) for regression (function approximation) in an incremental/decremental way consists essentially in migrating the input vectors in and out of the support vector set with specific modification of the associated thresholds. We introduce with full details such a method, which allows for defining the exact increments or decrements associated with the thresholds before vector migrations take place. Two delicate issues are especially addressed: the variation of the regularization parameter (for tuning the model performance) and the extreme situations where the support vector set becomes empty. We experimentally compare our method with several regression methods: the multilayer perceptron, two standard SVM implementations, and two models based on adaptive resonance theory
    corecore