45,946 research outputs found
Characterization and Robust Classification of EEG Signal from Image RSVP Events with Independent Time-Frequency Features
This paper considers the problem of automatic characterization and detection of target images in a rapid serial visual presentation (RSVP) task based on EEG data. A novel method that aims to identify single-trial event-related potentials (ERPs) in time-frequency is proposed, and a robust classifier with feature clustering is developed to better utilize the correlated ERP features. The method is applied to EEG recordings of a RSVP experiment with multiple sessions and subjects.
The results show that the target image events are mainly characterized by 3 distinct patterns in the time-frequency domain, i.e., a theta band (4.3 Hz) power boosting 300–700 ms after the target image onset, an alpha band (12 Hz) power boosting 500–1000 ms after the stimulus onset, and a delta band (2 Hz) power boosting after 500 ms. The most discriminant time-frequency features are power boosting and are relatively consistent among multiple sessions and subjects.
Since the original discriminant time-frequency features are highly correlated, we constructed the uncorrelated features using hierarchical clustering for better classification of target and non-target images. With feature clustering, performance (area under ROC) improved from 0.85 to 0.89 on within-session tests, and from 0.76 to 0.84 on cross-subject tests. The constructed uncorrelated features were more robust than the original discriminant features and corresponded to a number of local regions on the time-frequency plane
Modularity-Guided Graph Topology Optimization And Self-Boosting Clustering
Existing modularity-based community detection methods attempt to find
community memberships which can lead to the maximum of modularity in a fixed
graph topology. In this work, we propose to optimize the graph topology through
the modularity maximization process. We introduce a modularity-guided graph
optimization approach for learning sparse high modularity graph from
algorithmically generated clustering results by iterative pruning edges between
two distant clusters. To the best of our knowledge, this represents a first
attempt for using modularity to guide graph topology learning. Extensive
experiments conducted on various real-world data sets show that our method
outperforms the state-of-the-art graph construction methods by a large margin.
Our experiments show that with increasing modularity, the accuracy of
graph-based clustering algorithm is simultaneously increased, demonstrating the
validity of modularity theory through numerical experimental results of
real-world data sets. From clustering perspective, our method can also be seen
as a self-boosting clustering method
Interpretable Sequence Clustering
Categorical sequence clustering plays a crucial role in various fields, but
the lack of interpretability in cluster assignments poses significant
challenges. Sequences inherently lack explicit features, and existing sequence
clustering algorithms heavily rely on complex representations, making it
difficult to explain their results. To address this issue, we propose a method
called Interpretable Sequence Clustering Tree (ISCT), which combines sequential
patterns with a concise and interpretable tree structure. ISCT leverages k-1
patterns to generate k leaf nodes, corresponding to k clusters, which provides
an intuitive explanation on how each cluster is formed. More precisely, ISCT
first projects sequences into random subspaces and then utilizes the k-means
algorithm to obtain high-quality initial cluster assignments. Subsequently, it
constructs a pattern-based decision tree using a boosting-based construction
strategy in which sequences are re-projected and re-clustered at each node
before mining the top-1 discriminative splitting pattern. Experimental results
on 14 real-world data sets demonstrate that our proposed method provides an
interpretable tree structure while delivering fast and accurate cluster
assignments.Comment: 11 pages, 6 figure
Clustering of variables for enhanced interpretability of predictive models
A new strategy is proposed for building easy to interpret predictive models
in the context of a high-dimensional dataset, with a large number of highly
correlated explanatory variables. The strategy is based on a first step of
variables clustering using the CLustering of Variables around Latent Variables
(CLV) method. The exploration of the hierarchical clustering dendrogram is
undertaken in order to sequentially select the explanatory variables in a
group-wise fashion. For model setting implementation, the dendrogram is used as
the base-learner in an L2-boosting procedure. The proposed approach, named
lmCLV, is illustrated on the basis of a toy-simulated example when the clusters
and predictive equation are already known, and on a real case study dealing
with the authentication of orange juices based on 1H-NMR spectroscopic
analysis. In both illustrative examples, this procedure was shown to have
similar predictive efficiency to other methods, with additional
interpretability capacity. It is available in the R package ClustVarLV.Comment: 24 pages, 7 figure
CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
Class imbalance classification is a challenging research problem in data
mining and machine learning, as most of the real-life datasets are often
imbalanced in nature. Existing learning algorithms maximise the classification
accuracy by correctly classifying the majority class, but misclassify the
minority class. However, the minority class instances are representing the
concept with greater interest than the majority class instances in real-life
applications. Recently, several techniques based on sampling methods
(under-sampling of the majority class and over-sampling the minority class),
cost-sensitive learning methods, and ensemble learning have been used in the
literature for classifying imbalanced datasets. In this paper, we introduce a
new clustering-based under-sampling approach with boosting (AdaBoost)
algorithm, called CUSBoost, for effective imbalanced classification. The
proposed algorithm provides an alternative to RUSBoost (random under-sampling
with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost)
algorithms. We evaluated the performance of CUSBoost algorithm with the
state-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost,
SMOTEBoost on 13 imbalance binary and multi-class datasets with various
imbalance ratios. The experimental results show that the CUSBoost is a
promising and effective approach for dealing with highly imbalanced datasets.Comment: CSITSS-201
- …