11 research outputs found

    A new model for iris data set classification based on linear support vector machine parameter's optimization

    Get PDF
    Data mining is known as the process of detection concerning patterns from essential amounts of data. As a process of knowledge discovery. Classification is a data analysis that extracts a model which describes an important data classes. One of the outstanding classifications methods in data mining is support vector machine classification (SVM). It is capable of envisaging results and mostly effective than other classification methods. The SVM is a one technique of machine learning techniques that is well known technique, learning with supervised and have been applied perfectly to a vary problems of: regression, classification, and clustering in diverse domains such as gene expression, web text mining. In this study, we proposed a newly mode for classifying iris data set using SVM classifier and genetic algorithm to optimize c and gamma parameters of linear SVM, in addition principle components analysis (PCA) algorithm was use for features reduction

    Predicting human microRNA precursors based on an optimized feature subset generated by GA–SVM

    Get PDF
    AbstractMicroRNAs (miRNAs) are non-coding RNAs that play important roles in post-transcriptional regulation. Identification of miRNAs is crucial to understanding their biological mechanism. Recently, machine-learning approaches have been employed to predict miRNA precursors (pre-miRNAs). However, features used are divergent and consequently induce different performance. Thus, feature selection is critical for pre-miRNA prediction. We generated an optimized feature subset including 13 features using a hybrid of genetic algorithm and support vector machine (GA–SVM). Based on SVM, the classification performance of the optimized feature subset is much higher than that of the two feature sets used in microPred and miPred by five-fold cross-validation. Finally, we constructed the classifier miR-SF to predict the most recently identified human pre-miRNAs in miRBase (version 16). Compared with microPred and miPred, miR-SF achieved much higher classification performance. Accuracies were 93.97%, 86.21% and 64.66% for miR-SF, microPred and miPred, respectively. Thus, miR-SF is effective for identifying pre-miRNAs

    A Robust Model for Gene Anlysis and Classification

    Full text link

    A Fast Two-Stage Classification Method of Support Vector Machines

    Get PDF
    Classification of high-dimensional data generally requires enormous processing time. In this paper, we present a fast two-stage method of support vector machines, which includes a feature reduction algorithm and a fast multiclass method. First, principal component analysis is applied to the data for feature reduction and decorrelation, and then a feature selection method is used to further reduce feature dimensionality. The criterion based on Bhattacharyya distance is revised to get rid of influence of some binary problems with large distance. Moreover, a simple method is proposed to reduce the processing time of multiclass problems, where one binary SVM with the fewest support vectors (SVs) will be selected iteratively to exclude the less similar class until the final result is obtained. Experimented with the hyperspectral data 92AV3C, the results demonstrate that the proposed method can achieve a much faster classification and preserve the high classification accuracy of SVMs

    Application of the Honeybee Mating Optimization Algorithm to Patent Document Classification in Combination with the Support Vector Machine

    Full text link

    Kernel-Based Data Mining Approach with Variable Selection for Nonlinear High-Dimensional Data

    Get PDF
    In statistical data mining research, datasets often have nonlinearity and high-dimensionality. It has become difficult to analyze such datasets in a comprehensive manner using traditional statistical methodologies. Kernel-based data mining is one of the most effective statistical methodologies to investigate a variety of problems in areas including pattern recognition, machine learning, bioinformatics, chemometrics, and statistics. In particular, statistically-sophisticated procedures that emphasize the reliability of results and computational efficiency are required for the analysis of high-dimensional data. In this dissertation, first, a novel wrapper method called SVM-ICOMP-RFE based on hybridized support vector machine (SVM) and recursive feature elimination (RFE) with information-theoretic measure of complexity (ICOMP) is introduced and developed to classify high-dimensional data sets and to carry out subset selection of the variables in the original data space for finding the best for discriminating between groups. Recursive feature elimination (RFE) ranks variables based on the information-theoretic measure of complexity (ICOMP) criterion. Second, a dual variables functional support vector machine approach is proposed. The proposed approach uses both the first and second derivatives of the degradation profiles. The modified floating search algorithm for the repeated variable selection, with newly-added degradation path points, is presented to find a few good variables while reducing the computation time for on-line implementation. Third, a two-stage scheme for the classification of near infrared (NIR) spectral data is proposed. In the first stage, the proposed multi-scale vertical energy thresholding (MSVET) procedure is used to reduce the dimension of the high-dimensional spectral data. In the second stage, a few important wavelet coefficients are selected using the proposed SVM gradient-recursive feature elimination (RFE). Fourth, a novel methodology based on a human decision making process for discriminant analysis called PDCM is proposed. The proposed methodology consists of three basic steps emulating the thinking process: perception, decision, and cognition. In these steps two concepts known as support vector machines for classification and information complexity are integrated to evaluate learning models

    Data mining methodologies for supporting engineers during system identification

    Get PDF
    Data alone are worth almost nothing. While data collection is increasing exponentially worldwide, a clear distinction between retrieving data and obtaining knowledge has to be made. Data are retrieved while measuring phenomena or gathering facts. Knowledge refers to data patterns and trends that are useful for decision making. Data interpretation creates a challenge that is particularly present in system identification, where thousands of models may explain a given set of measurements. Manually interpreting such data is not reliable. One solution is to use data mining. This thesis thus proposes an integration of techniques from data mining, a field of research where the aim is to find knowledge from data, into an existing multiple-model system identification methodology. It is shown that, within a framework for decision support, data mining techniques constitute a valuable tool for engineers performing system identification. For example, clustering techniques group similar models together in order to guide subsequent decisions since they might indicate possible states of a structure. A main issue concerns the number of clusters, which, usually, is unknown. For determining the correct number of clusters in data and estimating the quality of a clustering algorithm, a score function is proposed. The score function is a reliable index for estimating the number of clusters in a given data set, thus increasing understanding of results. Furthermore, useful information for engineers who perform system identification is achieved through the use of feature selection techniques. They allow selection of relevant parameters that explain candidate models. The core algorithm is a feature selection strategy based on global search. In addition to providing information about the candidate model space, data mining is found to be a valuable tool for supporting decisions related to subsequent sensor placement. When integrated into a methodology for iterative sensor placement, clustering is found to provide useful support through providing a rational basis for decisions related to subsequent sensor placement on existing structures. Greedy and global search strategies should be selected according to the context. Experiments show that whereas global search is more efficient for initial sensor placement, a greedy strategy is more suitable for iterative sensor placement

    FAULT DETECTION AND PREDICTION IN ELECTROMECHANICAL SYSTEMS VIA THE DISCRETIZED STATE VECTOR-BASED PATTERN ANALYSIS OF MULTI-SENSOR SIGNALS

    Get PDF
    Department of System Design and Control EngineeringIn recent decades, operation and maintenance strategies for industrial applications have evolved from corrective maintenance and preventive maintenance, to condition-based monitoring and eventually predictive maintenance. High performance sensors and data logging technologies have enabled us to monitor the operational states of systems and predict fault occurrences. Several time series analysis methods have been proposed in the literature to classify system states via multi-sensor signals. Since the time series of sensor signals is often characterized as very-short, intermittent, transient, highly nonlinear, and non-stationary random signals, they make time series analyses more complex. Therefore, time series discretization has been popularly applied to extract meaningful features from original complex signals. There are several important issues to be addressed in discretization for fault detection and prediction: (i) What is the fault pattern that represents a system???s faulty states, (ii) How can we effectively search for fault patterns, (iii) What is a symptom pattern to predict fault occurrences, and (iv) What is a systematic procedure for online fault detection and prediction. In this regard, this study proposes a fault detection and prediction framework that consists of (i) definition of system???s operational states, (ii) definitions of fault and symptom patterns, (iii) multivariate discretization, (iv) severity and criticality analyses, and (v) online detection and prediction procedures. Given the time markers of fault occurrences, we can divide a system???s operational states into fault and no-fault states. We postulate that a symptom state precedes the occurrence of a fault within a certain time period and hence a no-fault state consists of normal and symptom states. Fault patterns are therefore found only in fault states, whereas symptom patterns are either only found in the system???s symptom states (being absent in the normal states) or not found in the given time series, but similar to fault patterns. To determine the length of a symptom state, we present a symptom pattern-based iterative search method. In order to identify the distinctive behaviors of multi-sensor signals, we propose a multivariate discretization approach that consists mainly of label definition, label specification, and event codification. Discretization parameters are delicately controlled by considering the key characteristics of multi-sensor signals. We discuss how to measure the severity degrees of fault and symptom patterns, and how to assess the criticalities of fault states. We apply the fault and symptom pattern extraction and severity assessment methods to online fault detection and prediction. Finally, we demonstrate the performance of the proposed framework through the following six case studies: abnormal cylinder temperature in a marine diesel engine, automotive gasoline engine knockings, laser weld defects, buzz, squeak, and rattle (BSR) noises from a car door trim (using a typical acoustic sensor array and using acoustic emission sensors respectively), and visual stimuli cognition tests by the P300 experiment.ope
    corecore