45,946 research outputs found

    Characterization and Robust Classification of EEG Signal from Image RSVP Events with Independent Time-Frequency Features

    Get PDF
    This paper considers the problem of automatic characterization and detection of target images in a rapid serial visual presentation (RSVP) task based on EEG data. A novel method that aims to identify single-trial event-related potentials (ERPs) in time-frequency is proposed, and a robust classifier with feature clustering is developed to better utilize the correlated ERP features. The method is applied to EEG recordings of a RSVP experiment with multiple sessions and subjects. The results show that the target image events are mainly characterized by 3 distinct patterns in the time-frequency domain, i.e., a theta band (4.3 Hz) power boosting 300–700 ms after the target image onset, an alpha band (12 Hz) power boosting 500–1000 ms after the stimulus onset, and a delta band (2 Hz) power boosting after 500 ms. The most discriminant time-frequency features are power boosting and are relatively consistent among multiple sessions and subjects. Since the original discriminant time-frequency features are highly correlated, we constructed the uncorrelated features using hierarchical clustering for better classification of target and non-target images. With feature clustering, performance (area under ROC) improved from 0.85 to 0.89 on within-session tests, and from 0.76 to 0.84 on cross-subject tests. The constructed uncorrelated features were more robust than the original discriminant features and corresponded to a number of local regions on the time-frequency plane

    Modularity-Guided Graph Topology Optimization And Self-Boosting Clustering

    Full text link
    Existing modularity-based community detection methods attempt to find community memberships which can lead to the maximum of modularity in a fixed graph topology. In this work, we propose to optimize the graph topology through the modularity maximization process. We introduce a modularity-guided graph optimization approach for learning sparse high modularity graph from algorithmically generated clustering results by iterative pruning edges between two distant clusters. To the best of our knowledge, this represents a first attempt for using modularity to guide graph topology learning. Extensive experiments conducted on various real-world data sets show that our method outperforms the state-of-the-art graph construction methods by a large margin. Our experiments show that with increasing modularity, the accuracy of graph-based clustering algorithm is simultaneously increased, demonstrating the validity of modularity theory through numerical experimental results of real-world data sets. From clustering perspective, our method can also be seen as a self-boosting clustering method

    Interpretable Sequence Clustering

    Full text link
    Categorical sequence clustering plays a crucial role in various fields, but the lack of interpretability in cluster assignments poses significant challenges. Sequences inherently lack explicit features, and existing sequence clustering algorithms heavily rely on complex representations, making it difficult to explain their results. To address this issue, we propose a method called Interpretable Sequence Clustering Tree (ISCT), which combines sequential patterns with a concise and interpretable tree structure. ISCT leverages k-1 patterns to generate k leaf nodes, corresponding to k clusters, which provides an intuitive explanation on how each cluster is formed. More precisely, ISCT first projects sequences into random subspaces and then utilizes the k-means algorithm to obtain high-quality initial cluster assignments. Subsequently, it constructs a pattern-based decision tree using a boosting-based construction strategy in which sequences are re-projected and re-clustered at each node before mining the top-1 discriminative splitting pattern. Experimental results on 14 real-world data sets demonstrate that our proposed method provides an interpretable tree structure while delivering fast and accurate cluster assignments.Comment: 11 pages, 6 figure

    Clustering of variables for enhanced interpretability of predictive models

    Full text link
    A new strategy is proposed for building easy to interpret predictive models in the context of a high-dimensional dataset, with a large number of highly correlated explanatory variables. The strategy is based on a first step of variables clustering using the CLustering of Variables around Latent Variables (CLV) method. The exploration of the hierarchical clustering dendrogram is undertaken in order to sequentially select the explanatory variables in a group-wise fashion. For model setting implementation, the dendrogram is used as the base-learner in an L2-boosting procedure. The proposed approach, named lmCLV, is illustrated on the basis of a toy-simulated example when the clusters and predictive equation are already known, and on a real case study dealing with the authentication of orange juices based on 1H-NMR spectroscopic analysis. In both illustrative examples, this procedure was shown to have similar predictive efficiency to other methods, with additional interpretability capacity. It is available in the R package ClustVarLV.Comment: 24 pages, 7 figure

    CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

    Full text link
    Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater interest than the majority class instances in real-life applications. Recently, several techniques based on sampling methods (under-sampling of the majority class and over-sampling the minority class), cost-sensitive learning methods, and ensemble learning have been used in the literature for classifying imbalanced datasets. In this paper, we introduce a new clustering-based under-sampling approach with boosting (AdaBoost) algorithm, called CUSBoost, for effective imbalanced classification. The proposed algorithm provides an alternative to RUSBoost (random under-sampling with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost) algorithms. We evaluated the performance of CUSBoost algorithm with the state-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost, SMOTEBoost on 13 imbalance binary and multi-class datasets with various imbalance ratios. The experimental results show that the CUSBoost is a promising and effective approach for dealing with highly imbalanced datasets.Comment: CSITSS-201
    • …
    corecore