60,397 research outputs found
A novel ensemble clustering for operational transients classification with application to a nuclear power plant turbine
International audienceThe objective of the present work is to develop a novel approach for combining in an ensemble multiple base clusterings of operational transients of industrial equipment, when the number of clusters in the final consensus clustering is unknown. A measure of pairwise similarity is used to quantify the co-association matrix that describes the similarity among the different base clusterings. Then, a Spectral Clustering technique of literature, embedding the unsupervised K-Means algorithm, is applied to the co-association matrix for finding the optimum number of clusters of the final consensus clustering, based on Silhouette validity index calculation. The proposed approach is developed with reference to an artificial case study, properly designed to mimic the signal trend behavior of a Nuclear Power Plant (NPP) turbine during shutdown. The results of the artificial case have been compared with those achieved by a state-of-art approach, known as Cluster-based Similarity Partitioning and Serial Graph Partitioning and Fill-reducing Matrix Ordering Algorithms (CSPA-METIS). The comparison shows that the proposed approach is able to identify a final consensus clustering that classifies the transients with better accuracy and robustness compared to the CSPA-METIS approach. The approach is, then, validated on an industrial case concerning 149 shutdown transients of a NPP turbine
An algebraic approach to ensemble clustering
International audienceIn clustering, consensus clustering aims at providing a single partition fitting a consensus from a set of independently generated. Common procedures, which are mainly statistical and graph-based, are recognized for their robustness and ability to scale-up. In this paper, we provide a complementary and original viewpoint over consensus clustering, by means of algebraic definitions which allow to ascertain the nature of available inferences in a systematic approach (e.g. a knowledge base). We found our approach on the lattice of partitions, for which we shall disclose how some operators can be added with the aim to express a formula representing the consensus. We show that adopting an incremental approach may assist to retain significant amount of aggregate data which fits well with the set of input clusterings. Beyond that ability to model formulae, we also note that its potential cannot be easily captured through such a logical system. It is due to the volatile nature of handling partitions which finally impacts on ability to draw some valuable conclusions
Contribution to Graph-based Multi-view Clustering: Algorithms and Applications
185 p.In this thesis, we study unsupervised learning, specifically, clustering methods for dividing data into meaningful groups. One major challenge is how to find an efficient algorithm with low computational complexity to deal with different types and sizes of datasets.For this purpose, we propose two approaches. The first approach is named "Multi-view Clustering via Kernelized Graph and Nonnegative Embedding" (MKGNE), and the second approach is called "Multi-view Clustering via Consensus Graph Learning and Nonnegative Embedding" (MVCGE). These two approaches jointly solve four tasks. They jointly estimate the unified similarity matrix over all views using the kernel tricks, the unified spectral projection of the data, the clusterindicator matrix, and the weight of each view without additional parameters. With these two approaches, there is no need for any postprocessing such as k-means clustering.In a further study, we propose a method named "Multi-view Spectral Clustering via Constrained Nonnegative Embedding" (CNESE). This method can overcome the drawbacks of the spectral clustering approaches, since they only provide a nonlinear projection of the data, on which an additional step of clustering is required. This can degrade the quality of the final clustering due to various factors such as the initialization process or outliers. Overcoming these drawbacks can be done by introducing a nonnegative embedding matrix which gives the final clustering assignment. In addition, some constraints are added to the targeted matrix to enhance the clustering performance.In accordance with the above methods, a new method called "Multi-view Spectral Clustering with a self-taught Robust Graph Learning" (MCSRGL) has been developed. Different from other approaches, this method integrates two main paradigms into the one-step multi-view clustering model. First, we construct an additional graph by using the cluster label space in addition to the graphs associated with the data space. Second, a smoothness constraint is exploited to constrain the cluster-label matrix and make it more consistent with the data views and the label view.Moreover, we propose two unified frameworks for multi-view clustering in Chapter 9. In these frameworks, we attempt to determine a view-based graphs, the consensus graph, the consensus spectral representation, and the soft clustering assignments. These methods retain the main advantages of the aforementioned methods and integrate the concepts of consensus and unified matrices. By using the unified matrices, we enforce the matrices of different views to be similar, and thus the problem of noise and inconsistency between different views will be reduced.Extensive experiments were conducted on several public datasets with different types and sizes, varying from face image datasets, to document datasets, handwritten datasets, and synthetics datasets. We provide several analyses of the proposed algorithms, including ablation studies, hyper-parameter sensitivity analyses, and computational costs. The experimental results show that the developed algorithms through this thesis are relevant and outperform several competing methods
A Robust Unified Graph Model Based on Molecular Data Binning for Subtype Discovery in High-dimensional Spaces
Machine learning (ML) is a subfield of artificial intelligence (AI) that has already revolutionised the world around us. It is a widely employed process for discovering patterns and groups within datasets. It has a wide range of applications including disease subtyping, which aims to discover intrinsic subtypes of disease in large-scale unlabelled data. Whilst the groups discovered in multi-view high-dimensional data by ML algorithms are promising, their capacity to identify pertinent and meaningful groups is limited by the presence of data variability and outliers. Since outlier values represent potential but unlikely outcomes, they are statistically and philosophically fascinating.
Therefore, the primary aim of this thesis was to propose a robust approach that discovers meaningful groups while considering the presence of data variability and outliers in the data. To achieve this aim, a novel robust approach (ROMDEX) was developed that utilised the proposed intermediate graph models (IMGs) for robust computation of proximity between observations in the data. Finally, a robust multi-view graph-based clustering approach was developed based on ROMDEX that improved the discovery of meaningful groups that were hidden behind the noise in the data.
The proposed approach was validated on real-world, and synthetic data for disease subtyping. Additionally, the stability of the approach was assessed by evaluating its performance across different levels of noise in clustering data. The results were evaluated through Kaplan-Meier survival time analysis for disease subtyping. Also, the concordance index (CI) and normalised mutual information (NMI) are used to evaluate the predictive ability of the proposed clustering model. Additionally, the accuracy, Kappa statistic and rand index are computed to evaluate the clustering stability against various levels of Gaussian noise. The proposed approach outperformed the existing state-of-the-art approaches MRGC, PINS, SNF, Consensus Clustering, and Icluster+ on these datasets. The findings for all datasets were outstanding, demonstrating the predictive ability of the proposed unsupervised graph-based clustering approach
DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep Neural Networks
Deep clustering has recently emerged as a promising technique for complex
data clustering. Despite the considerable progress, previous deep clustering
works mostly build or learn the final clustering by only utilizing a single
layer of representation, e.g., by performing the K-means clustering on the last
fully-connected layer or by associating some clustering loss to a specific
layer, which neglect the possibilities of jointly leveraging multi-layer
representations for enhancing the deep clustering performance. In view of this,
this paper presents a Deep Clustering via Ensembles (DeepCluE) approach, which
bridges the gap between deep clustering and ensemble clustering by harnessing
the power of multiple layers in deep neural networks. In particular, we utilize
a weight-sharing convolutional neural network as the backbone, which is trained
with both the instance-level contrastive learning (via an instance projector)
and the cluster-level contrastive learning (via a cluster projector) in an
unsupervised manner. Thereafter, multiple layers of feature representations are
extracted from the trained network, upon which the ensemble clustering process
is further conducted. Specifically, a set of diversified base clusterings are
generated from the multi-layer representations via a highly efficient
clusterer. Then the reliability of clusters in multiple base clusterings is
automatically estimated by exploiting an entropy-based criterion, based on
which the set of base clusterings are re-formulated into a weighted-cluster
bipartite graph. By partitioning this bipartite graph via transfer cut, the
final consensus clustering can be obtained. Experimental results on six image
datasets confirm the advantages of DeepCluE over the state-of-the-art deep
clustering approaches.Comment: To appear in IEEE Transactions on Emerging Topics in Computational
Intelligenc
Structure-Preserving Model Reduction of Physical Network Systems
This paper considers physical network systems where the energy storage is naturally associated to the nodes of the graph, while the edges of the graph correspond to static couplings. The first sections deal with the linear case, covering examples such as mass-damper and hydraulic systems, which have a structure that is similar to symmetric consensus dynamics. The last section is concerned with a specific class of nonlinear physical network systems; namely detailed-balanced chemical reaction networks governed by mass action kinetics. In both cases, linear and nonlinear, the structure of the dynamics is similar, and is based on a weighted Laplacian matrix, together with an energy function capturing the energy storage at the nodes. We discuss two methods for structure-preserving model reduction. The first one is clustering; aggregating the nodes of the underlying graph to obtain a reduced graph. The second approach is based on neglecting the energy storage at some of the nodes, and subsequently eliminating those nodes (called Kron reduction).</p
- …