278,564 research outputs found

    Optimization of distributions differences for classification

    Full text link
    In this paper we introduce a new classification algorithm called Optimization of Distributions Differences (ODD). The algorithm aims to find a transformation from the feature space to a new space where the instances in the same class are as close as possible to one another while the gravity centers of these classes are as far as possible from one another. This aim is formulated as a multiobjective optimization problem that is solved by a hybrid of an evolutionary strategy and the Quasi-Newton method. The choice of the transformation function is flexible and could be any continuous space function. We experiment with a linear and a non-linear transformation in this paper. We show that the algorithm can outperform 6 other state-of-the-art classification methods, namely naive Bayes, support vector machines, linear discriminant analysis, multi-layer perceptrons, decision trees, and k-nearest neighbors, in 12 standard classification datasets. Our results show that the method is less sensitive to the imbalanced number of instances comparing to these methods. We also show that ODD maintains its performance better than other classification methods in these datasets, hence, offers a better generalization ability

    HUBFIRE - A multi-class SVM based JPEG steganalysis using HBCL statistics and FR Index

    Get PDF
    Blind Steganalysis attempts to detect steganographic data without prior knowledge of either the embedding algorithm or the 'cover' image. This paper proposes new features for JPEG blind steganalysis using a combination of Huffman Bit Code Length (HBCL) Statistics and File size to Resolution ratio (FR Index); the Huffman Bit File Index Resolution (HUBFIRE) algorithm proposed uses these functionals to build the classifier using a multi-class Support Vector Machine (SVM). JPEG images spanning a wide range of resolutions are used to create a 'stego-image' database employing three embedding schemes - the advanced Least Significant Bit encoding technique, that embeds in the spatial domain, a transform-domain embedding scheme: JPEG Hide-and-Seek and Model Based Steganography which employs an adaptive embedding technique. This work employs a multi-class SVM over the proposed 'HUBFIRE' algorithm for statistical steganalysis, which is not yet explored by steganalysts. Experiments conducted prove the model's accuracy over a wide range of payloads and embedding schemes

    Solution Path Algorithm for Twin Multi-class Support Vector Machine

    Full text link
    The twin support vector machine and its extensions have made great achievements in dealing with binary classification problems, however, which is faced with some difficulties such as model selection and solving multi-classification problems quickly. This paper is devoted to the fast regularization parameter tuning algorithm for the twin multi-class support vector machine. A new sample dataset division method is adopted and the Lagrangian multipliers are proved to be piecewise linear with respect to the regularization parameters by combining the linear equations and block matrix theory. Eight kinds of events are defined to seek for the starting event and then the solution path algorithm is designed, which greatly reduces the computational cost. In addition, only few points are combined to complete the initialization and Lagrangian multipliers are proved to be 1 as the regularization parameter tends to infinity. Simulation results based on UCI datasets show that the proposed method can achieve good classification performance with reducing the computational cost of grid search method from exponential level to the constant level

    Classification of EMI discharge sources using time–frequency features and multi-class support vector machine

    Get PDF
    This paper introduces the first application of feature extraction and machine learning to Electromagnetic Interference (EMI) signals for discharge sources classification in high voltage power generating plants. This work presents an investigation on signals that represent different discharge sources, which are measured using EMI techniques from operating electrical machines within power plant. The analysis involves Time-Frequency image calculation of EMI signals using General Linear Chirplet Analysis (GLCT) which reveals both time and frequency varying characteristics. Histograms of uniform Local Binary Patterns (LBP) are implemented as a feature reduction and extraction technique for the classification of discharge sources using Multi-Class Support Vector Machine (MCSVM). The novelty that this paper introduces is the combination of GLCT and LBP applications to develop a new feature extraction algorithm applied to EMI signals classification. The proposed algorithm is demonstrated to be successful with excellent classification accuracy being achieved. For the first time, this work transfers expert's knowledge on EMI faults to an intelligent system which could potentially be exploited to develop an automatic condition monitoring system

    Bayesian Kernel Methods for Non-Gaussian Distributions: Binary and Multi- class Classification Problems

    Get PDF
    Project Objective: The objective of this project is to develop a Bayesian kernel model built around non- Gaussian prior distributions to address binary and multi-class classification problems.Recent advances in data mining have integrated kernel functions with Bayesian probabilistic analysis of Gaussian distributions. These machine learning approaches can incorporate prior information with new data to calculate probabilistic rather than deterministic values for unknown parameters. This paper analyzes extensively a specific Bayesian kernel model that uses a kernel function to calculate a posterior beta distribution that is conjugate to the prior beta distribution. Numerical testing of the beta kernel model on several benchmark data sets reveal that this model’s accuracy is comparable with those of the support vector machine and relevance vector machine, and the model runs more quickly than the other algorithms. When one class occurs much more frequently than the other class, the beta kernel model often outperforms other strategies to handle imbalanced data sets. If data arrive sequentially over time, the beta kernel model easily and quickly updates the probability distribution, and this model is more accurate than an incremental support vector machine algorithm for online learning when fewer than 50 data points are available.U.S. Army Research OfficeSponsor/Monitor's Report Number(s): 61414-MA-II.3W911NF-12-1-040

    Multi-Class Classification for Identifying JPEG Steganography Embedding Methods

    Get PDF
    Over 725 steganography tools are available over the Internet, each providing a method for covert transmission of secret messages. This research presents four steganalysis advancements that result in an algorithm that identifies the steganalysis tool used to embed a secret message in a JPEG image file. The algorithm includes feature generation, feature preprocessing, multi-class classification and classifier fusion. The first contribution is a new feature generation method which is based on the decomposition of discrete cosine transform (DCT) coefficients used in the JPEG image encoder. The generated features are better suited to identifying discrepancies in each area of the decomposed DCT coefficients. Second, the classification accuracy is further improved with the development of a feature ranking technique in the preprocessing stage for the kernel Fisher s discriminant (KFD) and support vector machines (SVM) classifiers in the kernel space during the training process. Third, for the KFD and SVM two-class classifiers a classification tree is designed from the kernel space to provide a multi-class classification solution for both methods. Fourth, by analyzing a set of classifiers, signature detectors, and multi-class classification methods a classifier fusion system is developed to increase the detection accuracy of identifying the embedding method used in generating the steganography images. Based on classifying stego images created from research and commercial JPEG steganography techniques, F5, JP Hide, JSteg, Model-based, Model-based Version 1.2, OutGuess, Steganos, StegHide and UTSA embedding methods, the performance of the system shows a statistically significant increase in classification accuracy of 5%. In addition, this system provides a solution for identifying steganographic fingerprints as well as the ability to include future multi-class classification tools

    Saliency guided local and global descriptors for effective action recognition

    Get PDF
    This paper presents a novel framework for human action recognition based on salient object detection and a new combination of local and global descriptors. We first detect salient objects in video frames and only extract features for such objects. We then use a simple strategy to identify and process only those video frames that contain salient objects. Processing salient objects instead of all frames not only makes the algorithm more efficient, but more importantly also suppresses the interference of background pixels. We combine this approach with a new combination of local and global descriptors, namely 3D-SIFT and histograms of oriented optical flow (HOOF), respectively. The resulting saliency guided 3D-SIFT–HOOF (SGSH) feature is used along with a multi-class support vector machine (SVM) classifier for human action recognition. Experiments conducted on the standard KTH and UCF-Sports action benchmarks show that our new method outperforms the competing state-of-the-art spatiotemporal feature-based human action recognition metho

    The combination approach of SVM and ECOC for powerful identification and classification of transcription factor

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and reduce costs, many computational methods have been developed to identify TFs from new proteins and to classify the resulted TFs. Though these methods have facilitated screening of TFs to some extent, low accuracy is still a common problem. With the fast growing number of new proteins, more precise algorithms for identifying TFs from new proteins and classifying the consequent TFs are in a high demand.</p> <p>Results</p> <p>The support vector machine (SVM) algorithm was utilized to construct an automatic detector for TF identification, where protein domains and functional sites were employed as feature vectors. Error-correcting output coding (ECOC) algorithm, which was originated from information and communication engineering fields, was introduced to combine with support vector machine (SVM) methodology for TF classification. The overall success rates of identification and classification achieved 88.22% and 97.83% respectively. Finally, a web site was constructed to let users access our tools (see Availability and requirements section for URL).</p> <p>Conclusion</p> <p>The SVM method was a valid and stable means for TFs identification with protein domains and functional sites as feature vectors. Error-correcting output coding (ECOC) algorithm is a powerful method for multi-class classification problem. When combined with SVM method, it can remarkably increase the accuracy of TF classification using protein domains and functional sites as feature vectors. In addition, our work implied that ECOC algorithm may succeed in a broad range of applications in biological data mining.</p

    Generalization Bounds for Stochastic Gradient Descent via Localized Δ\varepsilon-Covers

    Full text link
    In this paper, we propose a new covering technique localized for the trajectories of SGD. This localization provides an algorithm-specific complexity measured by the covering number, which can have dimension-independent cardinality in contrast to standard uniform covering arguments that result in exponential dimension dependency. Based on this localized construction, we show that if the objective function is a finite perturbation of a piecewise strongly convex and smooth function with PP pieces, i.e. non-convex and non-smooth in general, the generalization error can be upper bounded by O((log⁥nlog⁥(nP))/n)O(\sqrt{(\log n\log(nP))/n}), where nn is the number of data samples. In particular, this rate is independent of dimension and does not require early stopping and decaying step size. Finally, we employ these results in various contexts and derive generalization bounds for multi-index linear models, multi-class support vector machines, and KK-means clustering for both hard and soft label setups, improving the known state-of-the-art rates
    • 

    corecore