38,104 research outputs found
Model-Based Feature Selection Based on Radial Basis Functions and Information Measures
In this paper the development of a new embedded feature selection method is presented, based on a Radial-Basis-Function Neural-Fuzzy modelling structure. The proposed method is created to find the relative importance of features in a given dataset (or process in general), with special focus on manufacturing processes. The proposed approach evaluates the impact/importance of processes features by using information theoretic measures to measure the correlation between the process features and the modelling performance. Crucially, the proposed method acts during the training of the process model; hence it is an embedded method, achieving the modelling/classification task in parallel to the feature selection task. The latter is achieved by taking advantage of the information in the output layer of the Neural Fuzzy structure; in the presented case this is a TSK-type polynomial function. Two information measures are evaluated in this work, both based on information entropy: mutual information, and cross-sample entropy. The proposed methodology is tested against two popular datasets in the literature (IRIS - plant data, AirFoil - manufacturing/design data), and one more case study relevant to manufacturing - the heat treatment of steel. Results show the good and reliable performance of the developed modelling structure, on par with existing published work, as well as the good performance of the feature selection task in terms of correctly identifying important process features
Online Tool Condition Monitoring Based on Parsimonious Ensemble+
Accurate diagnosis of tool wear in metal turning process remains an open
challenge for both scientists and industrial practitioners because of
inhomogeneities in workpiece material, nonstationary machining settings to suit
production requirements, and nonlinear relations between measured variables and
tool wear. Common methodologies for tool condition monitoring still rely on
batch approaches which cannot cope with a fast sampling rate of metal cutting
process. Furthermore they require a retraining process to be completed from
scratch when dealing with a new set of machining parameters. This paper
presents an online tool condition monitoring approach based on Parsimonious
Ensemble+, pENsemble+. The unique feature of pENsemble+ lies in its highly
flexible principle where both ensemble structure and base-classifier structure
can automatically grow and shrink on the fly based on the characteristics of
data streams. Moreover, the online feature selection scenario is integrated to
actively sample relevant input attributes. The paper presents advancement of a
newly developed ensemble learning algorithm, pENsemble+, where online active
learning scenario is incorporated to reduce operator labelling effort. The
ensemble merging scenario is proposed which allows reduction of ensemble
complexity while retaining its diversity. Experimental studies utilising
real-world manufacturing data streams and comparisons with well known
algorithms were carried out. Furthermore, the efficacy of pENsemble was
examined using benchmark concept drift data streams. It has been found that
pENsemble+ incurs low structural complexity and results in a significant
reduction of operator labelling effort.Comment: this paper has been published by IEEE Transactions on Cybernetic
One-class classifiers based on entropic spanning graphs
One-class classifiers offer valuable tools to assess the presence of outliers
in data. In this paper, we propose a design methodology for one-class
classifiers based on entropic spanning graphs. Our approach takes into account
the possibility to process also non-numeric data by means of an embedding
procedure. The spanning graph is learned on the embedded input data and the
outcoming partition of vertices defines the classifier. The final partition is
derived by exploiting a criterion based on mutual information minimization.
Here, we compute the mutual information by using a convenient formulation
provided in terms of the -Jensen difference. Once training is
completed, in order to associate a confidence level with the classifier
decision, a graph-based fuzzy model is constructed. The fuzzification process
is based only on topological information of the vertices of the entropic
spanning graph. As such, the proposed one-class classifier is suitable also for
data characterized by complex geometric structures. We provide experiments on
well-known benchmarks containing both feature vectors and labeled graphs. In
addition, we apply the method to the protein solubility recognition problem by
considering several representations for the input samples. Experimental results
demonstrate the effectiveness and versatility of the proposed method with
respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification
Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN,
Vancouver, Canad
Neural Network and Bioinformatic Methods for Predicting HIV-1 Protease Inhibitor Resistance
This article presents a new method for predicting viral resistance to seven protease inhibitors from the HIV-1 genotype, and for identifying the positions in the protease gene at which the specific nature of the mutation affects resistance. The neural network Analog ARTMAP predicts protease inhibitor resistance from viral genotypes. A feature selection method detects genetic positions that contribute to resistance both alone and through interactions with other positions. This method has identified positions 35, 37, 62, and 77, where traditional feature selection methods have not detected a contribution to resistance.
At several positions in the protease gene, mutations confer differing degress of resistance, depending on the specific amino acid to which the sequence has mutated. To find these positions, an Amino Acid Space is introduced to represent genes in a vector space that captures the functional similarity between amino acid pairs. Feature selection identifies several new positions, including 36, 37, and 43, with amino acid-specific contributions to resistance. Analog ARTMAP networks applied to inputs that represent specific amino acids at these positions perform better than networks that use only mutation locations.Air Force Office of Scientific Research (F49620-01-1-0423); National Geospatial-Intelligence Agency (NMA 201-01-1-2016); National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
- …