Search CORE

11 research outputs found

A new model for iris data set classification based on linear support vector machine parameter's optimization

Author: Ali Ahmed Hussein
Alsajri Mohammad
Faiz Hussain Zahraa
Ibraheem Hind Raad
Mohd Arfian Ismail
Shahreen Kasim
Sutikno Tole
Publication venue: Institute of Advanced Engineering and Science (IAES)
Publication date: 01/01/2020
Field of study

Data mining is known as the process of detection concerning patterns from essential amounts of data. As a process of knowledge discovery. Classification is a data analysis that extracts a model which describes an important data classes. One of the outstanding classifications methods in data mining is support vector machine classification (SVM). It is capable of envisaging results and mostly effective than other classification methods. The SVM is a one technique of machine learning techniques that is well known technique, learning with supervised and have been applied perfectly to a vary problems of: regression, classification, and clustering in diverse domains such as gene expression, web text mining. In this study, we proposed a newly mode for classifying iris data set using SVM classifier and genetic algorithm to optimize c and gamma parameters of linear SVM, in addition principle components analysis (PCA) algorithm was use for features reduction

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

UMP Institutional Repository

Institute of Advanced Engineering and Science

Predicting human microRNA precursors based on an optimized feature subset generated by GA–SVM

Author: Chen Xiaowen
Jiang Wei
Li Li
Li Wei
Li Xia
Lian Baofeng
Liao Mingzhi
Lv Yingli
Wang Shiyuan
Wang Shuyuan
Wang Yanqiu
Yang Lei
Publication venue: Elsevier Inc.
Publication date: 31/08/2011
Field of study

AbstractMicroRNAs (miRNAs) are non-coding RNAs that play important roles in post-transcriptional regulation. Identification of miRNAs is crucial to understanding their biological mechanism. Recently, machine-learning approaches have been employed to predict miRNA precursors (pre-miRNAs). However, features used are divergent and consequently induce different performance. Thus, feature selection is critical for pre-miRNA prediction. We generated an optimized feature subset including 13 features using a hybrid of genetic algorithm and support vector machine (GA–SVM). Based on SVM, the classification performance of the optimized feature subset is much higher than that of the two feature sets used in microPred and miPred by five-fold cross-validation. Finally, we constructed the classifier miR-SF to predict the most recently identified human pre-miRNAs in miRBase (version 16). Compared with microPred and miPred, miR-SF achieved much higher classification performance. Accuracies were 93.97%, 86.21% and 64.66% for miR-SF, microPred and miPred, respectively. Thus, miR-SF is effective for identifying pre-miRNAs

Elsevier - Publisher Connector

A Robust Model for Gene Anlysis and Classification

Author
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date
Field of study

Crossref

Multi-View Face Recognition with Min-Max Modular Support Vector Machines

Author: Bao-Liang Lu
Zhi-Gang Fan
Publication venue: 'IntechOpen'
Publication date: 01/07/2007
Field of study

IntechOpen

A Fast Two-Stage Classification Method of Support Vector Machines

Author: Chen Jin
Wang Cheng
Wang Runsheng
王程
Publication venue
Publication date: 01/01/2008
Field of study

Classification of high-dimensional data generally requires enormous processing time. In this paper, we present a fast two-stage method of support vector machines, which includes a feature reduction algorithm and a fast multiclass method. First, principal component analysis is applied to the data for feature reduction and decorrelation, and then a feature selection method is used to further reduce feature dimensionality. The criterion based on Bhattacharyya distance is revised to get rid of influence of some binary problems with large distance. Moreover, a simple method is proposed to reduce the processing time of multiclass problems, where one binary SVM with the fewest support vectors (SVs) will be selected iteratively to exclude the less similar class until the final result is obtained. Experimented with the hyperspectral data 92AV3C, the results demonstrate that the proposed method can achieve a much faster classification and preserve the high classification accuracy of SVMs

Xiamen University Institutional Repository

Application of the Honeybee Mating Optimization Algorithm to Patent Document Classification in Combination with the Support Vector Machine

Author
Publication venue: 'International Journal of Automation and Smart Technology'
Publication date
Field of study

Crossref

A Parallel Genetic Algorithm Based Feature Selection and Parameter Optimization for Support Vector Machine

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Crossref

Kernel-Based Data Mining Approach with Variable Selection for Nonlinear High-Dimensional Data

Author: Baek Seung Hyun
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2010
Field of study

In statistical data mining research, datasets often have nonlinearity and high-dimensionality. It has become difficult to analyze such datasets in a comprehensive manner using traditional statistical methodologies. Kernel-based data mining is one of the most effective statistical methodologies to investigate a variety of problems in areas including pattern recognition, machine learning, bioinformatics, chemometrics, and statistics. In particular, statistically-sophisticated procedures that emphasize the reliability of results and computational efficiency are required for the analysis of high-dimensional data. In this dissertation, first, a novel wrapper method called SVM-ICOMP-RFE based on hybridized support vector machine (SVM) and recursive feature elimination (RFE) with information-theoretic measure of complexity (ICOMP) is introduced and developed to classify high-dimensional data sets and to carry out subset selection of the variables in the original data space for finding the best for discriminating between groups. Recursive feature elimination (RFE) ranks variables based on the information-theoretic measure of complexity (ICOMP) criterion. Second, a dual variables functional support vector machine approach is proposed. The proposed approach uses both the first and second derivatives of the degradation profiles. The modified floating search algorithm for the repeated variable selection, with newly-added degradation path points, is presented to find a few good variables while reducing the computation time for on-line implementation. Third, a two-stage scheme for the classification of near infrared (NIR) spectral data is proposed. In the first stage, the proposed multi-scale vertical energy thresholding (MSVET) procedure is used to reduce the dimension of the high-dimensional spectral data. In the second stage, a few important wavelet coefficients are selected using the proposed SVM gradient-recursive feature elimination (RFE). Fourth, a novel methodology based on a human decision making process for discriminant analysis called PDCM is proposed. The proposed methodology consists of three basic steps emulating the thinking process: perception, decision, and cognition. In these steps two concepts known as support vector machines for classification and information complexity are integrated to evaluate learning models

University of Tennessee, Knoxville: Trace

Data mining methodologies for supporting engineers during system identification

Author: Saitta Sandro
Publication venue: Lausanne, EPFL
Publication date: 07/02/2008
Field of study

Data alone are worth almost nothing. While data collection is increasing exponentially worldwide, a clear distinction between retrieving data and obtaining knowledge has to be made. Data are retrieved while measuring phenomena or gathering facts. Knowledge refers to data patterns and trends that are useful for decision making. Data interpretation creates a challenge that is particularly present in system identification, where thousands of models may explain a given set of measurements. Manually interpreting such data is not reliable. One solution is to use data mining. This thesis thus proposes an integration of techniques from data mining, a field of research where the aim is to find knowledge from data, into an existing multiple-model system identification methodology. It is shown that, within a framework for decision support, data mining techniques constitute a valuable tool for engineers performing system identification. For example, clustering techniques group similar models together in order to guide subsequent decisions since they might indicate possible states of a structure. A main issue concerns the number of clusters, which, usually, is unknown. For determining the correct number of clusters in data and estimating the quality of a clustering algorithm, a score function is proposed. The score function is a reliable index for estimating the number of clusters in a given data set, thus increasing understanding of results. Furthermore, useful information for engineers who perform system identification is achieved through the use of feature selection techniques. They allow selection of relevant parameters that explain candidate models. The core algorithm is a feature selection strategy based on global search. In addition to providing information about the candidate model space, data mining is found to be a valuable tool for supporting decisions related to subsequent sensor placement. When integrated into a methodology for iterative sensor placement, clustering is found to provide useful support through providing a rational basis for decisions related to subsequent sensor placement on existing structures. Greedy and global search strategies should be selected according to the context. Experiments show that whereas global search is more efficient for initial sensor placement, a greedy strategy is more suitable for iterative sensor placement

Infoscience - École polytechnique fédérale de Lausanne

FAULT DETECTION AND PREDICTION IN ELECTROMECHANICAL SYSTEMS VIA THE DISCRETIZED STATE VECTOR-BASED PATTERN ANALYSIS OF MULTI-SENSOR SIGNALS

Author: Baek Sujeong
Publication venue: Graduate School of UNIST
Publication date: 01/02/2018
Field of study

Department of System Design and Control EngineeringIn recent decades, operation and maintenance strategies for industrial applications have evolved from corrective maintenance and preventive maintenance, to condition-based monitoring and eventually predictive maintenance. High performance sensors and data logging technologies have enabled us to monitor the operational states of systems and predict fault occurrences. Several time series analysis methods have been proposed in the literature to classify system states via multi-sensor signals. Since the time series of sensor signals is often characterized as very-short, intermittent, transient, highly nonlinear, and non-stationary random signals, they make time series analyses more complex. Therefore, time series discretization has been popularly applied to extract meaningful features from original complex signals. There are several important issues to be addressed in discretization for fault detection and prediction: (i) What is the fault pattern that represents a system???s faulty states, (ii) How can we effectively search for fault patterns, (iii) What is a symptom pattern to predict fault occurrences, and (iv) What is a systematic procedure for online fault detection and prediction. In this regard, this study proposes a fault detection and prediction framework that consists of (i) definition of system???s operational states, (ii) definitions of fault and symptom patterns, (iii) multivariate discretization, (iv) severity and criticality analyses, and (v) online detection and prediction procedures. Given the time markers of fault occurrences, we can divide a system???s operational states into fault and no-fault states. We postulate that a symptom state precedes the occurrence of a fault within a certain time period and hence a no-fault state consists of normal and symptom states. Fault patterns are therefore found only in fault states, whereas symptom patterns are either only found in the system???s symptom states (being absent in the normal states) or not found in the given time series, but similar to fault patterns. To determine the length of a symptom state, we present a symptom pattern-based iterative search method. In order to identify the distinctive behaviors of multi-sensor signals, we propose a multivariate discretization approach that consists mainly of label definition, label specification, and event codification. Discretization parameters are delicately controlled by considering the key characteristics of multi-sensor signals. We discuss how to measure the severity degrees of fault and symptom patterns, and how to assess the criticalities of fault states. We apply the fault and symptom pattern extraction and severity assessment methods to online fault detection and prediction. Finally, we demonstrate the performance of the proposed framework through the following six case studies: abnormal cylinder temperature in a marine diesel engine, automotive gasoline engine knockings, laser weld defects, buzz, squeak, and rattle (BSR) noises from a car door trim (using a typical acoustic sensor array and using acoustic emission sensors respectively), and visual stimuli cognition tests by the P300 experiment.ope

ScholarWorks@UNIST