4 research outputs found

    Information Theory Filters for Wavelet Packet Coefficient Selection with Application to Corrosion Type Identification from Acoustic Emission Signals

    Get PDF
    The damage caused by corrosion in chemical process installations can lead to unexpected plant shutdowns and the leakage of potentially toxic chemicals into the environment. When subjected to corrosion, structural changes in the material occur, leading to energy releases as acoustic waves. This acoustic activity can in turn be used for corrosion monitoring, and even for predicting the type of corrosion. Here we apply wavelet packet decomposition to extract features from acoustic emission signals. We then use the extracted wavelet packet coefficients for distinguishing between the most important types of corrosion processes in the chemical process industry: uniform corrosion, pitting and stress corrosion cracking. The local discriminant basis selection algorithm can be considered as a standard for the selection of the most discriminative wavelet coefficients. However, it does not take the statistical dependencies between wavelet coefficients into account. We show that, when these dependencies are ignored, a lower accuracy is obtained in predicting the corrosion type. We compare several mutual information filters to take these dependencies into account in order to arrive at a more accurate prediction

    Speeding Up Feature Subset Selection through Mutual Information Relevance Filtering

    No full text
    A relevance filter is proposed which removes features based on the mutual information between class labels and features. It is proven that both feature independence and class conditional feature independence are required for the filter to be statistically optimal. This could be shown by establishing a relationship with the conditional relative entropy framework for feature selection. Removing features at various significance levels as a preprocessing step to sequential forward search leads to a huge increase in speed, without a decrease in classification accuracy. These results are shown based on experiments with 5 high-dimensional publicly available gene expression data sets.status: publishe

    Fractal dimension for clustering and unsupervised and supervised feature selection.

    Get PDF
    Data mining refers to the automation of data analysis to extract patterns from large amounts of data. A major breakthrough in modelling natural patterns is the recognition that nature is fractal, not Euclidean. Fractals are capable of modelling self-similarity, infinite details, infinite length and the absence of smoothness. This research was aimed at simplifying the discovery and detection of groups in data using fractal dimension. These data mining tasks were addressed efficiently. The first task defines groups of instances (clustering), the second selects useful features from non-defined (unsupervised) groups of instances and the third selects useful features from pre-defined (supervised) groups of instances. Improvements are shown on two data mining classification models: hierarchical clustering and Artificial Neural Networks (ANN). For clustering tasks, a new two-phase clustering algorithm based on the Fractal Dimension (FD), compactness and closeness of clusters is presented. The proposed method, uses self-similarity properties of the data, first divides the data into sufficiently large sub-clusters with high compactness. In the second stage, the algorithm merges the sub-clusters that are close to each other and have similar complexity. The final clusters are obtained through a very natural and fully deterministic way. The selection of different feature subspaces leads to different cluster interpretations. An unsupervised embedded feature selection algorithm, able to detect relevant and redundant features, is presented. This algorithm is based on the concept of fractal dimension. The level of relevance in the features is quantified using a new proposed entropy measure, which is less complex than the current state-of-the-art technology. The proposed algorithm is able to maintain and in some cases improve the quality of the clusters in reduced feature spaces. For supervised feature selection, for classification purposes, a new algorithm is proposed that maximises the relevance and minimises the redundancy of the features simultaneously. This algorithm makes use of the FD and the Mutual Information (MI) techniques, and combines them to create a new measure of feature usefulness and to produce a simpler and non-heuristic algorithm. The similar nature of the two techniques, FD and MI, makes the proposed algorithm more suitable for a straightforward global analysis of the data
    corecore