28 research outputs found
Multilabel Consensus Classification
In the era of big data, a large amount of noisy and incomplete data can be
collected from multiple sources for prediction tasks. Combining multiple models
or data sources helps to counteract the effects of low data quality and the
bias of any single model or data source, and thus can improve the robustness
and the performance of predictive models. Out of privacy, storage and bandwidth
considerations, in certain circumstances one has to combine the predictions
from multiple models or data sources to obtain the final predictions without
accessing the raw data. Consensus-based prediction combination algorithms are
effective for such situations. However, current research on prediction
combination focuses on the single label setting, where an instance can have one
and only one label. Nonetheless, data nowadays are usually multilabeled, such
that more than one label have to be predicted at the same time. Direct
applications of existing prediction combination methods to multilabel settings
can lead to degenerated performance. In this paper, we address the challenges
of combining predictions from multiple multilabel classifiers and propose two
novel algorithms, MLCM-r (MultiLabel Consensus Maximization for ranking) and
MLCM-a (MLCM for microAUC). These algorithms can capture label correlations
that are common in multilabel classifications, and optimize corresponding
performance metrics. Experimental results on popular multilabel classification
tasks verify the theoretical analysis and effectiveness of the proposed
methods
A META CLUSTERING APPROACH FOR ENSEMBLE PROBLEM
A critical problem in cluster ensemble research is how to combine multiple clustering to yield a superior clustering result. Leveraging advanced graph partitioning techniques, we solve this problem by reducing it to a graph partitioning problem. We introduce a new reduction method that constructs a bipartite graph from a given cluster ensemble. The resulting graph models both instances and clusters of the ensemble simultaneously as vertices in the graph. Our approach retains all of the information provided by a given ensemble, allowing the similarity among instances and the similarity among clusters to be considered collectively in forming the clustering. Further, the resulting graph partitioning problem can be solved efficiently. We empirically evaluate the proposed approach against two commonly used graph formulations and show that it is more robust and achieves comparable or better performance in comparison to its competitors
Consensus clustering with differential evolution
summary:Consensus clustering algorithms are used to improve properties of traditional clustering methods, especially their accuracy and robustness. In this article, we introduce our approach that is based on a refinement of the set of initial partitions and uses differential evolution algorithm in order to find the most valid solution. Properties of the algorithm are demonstrated on four benchmark datasets
ЗАСТОСУВАННЯ АНСАМБЛІВ АЛГОРИТМІВ ДЛЯ ПІДВИЩЕННЯ СТІЙКОСТІ РЕЗУЛЬТАТІВ КЛАСТЕРИЗАЦІЇ
A review of existing cluster ensembles techniques has been conducted. The information technology of enhance the stability of the clustering results of medical examinations of patients has been offered.Осуществлен обзор существующих подходов применения ансамблей алгоритмов в кластерном анализе. Предложена информационная технология повышения устойчивости результатов кластеризации данных медицинского обследования пациентов.Здійснено огляд існуючих підходів застосування ансамблів алгоритмів у кластерному аналізі. Запропоновано інформаційну технологію підвищення стійкості результатів кластеризації даних медичного обстеження пацієнтів
Directionally Dependent Multi-View Clustering Using Copula Model
In recent biomedical scientific problems, it is a fundamental issue to
integratively cluster a set of objects from multiple sources of datasets. Such
problems are mostly encountered in genomics, where data is collected from
various sources, and typically represent distinct yet complementary
information. Integrating these data sources for multi-source clustering is
challenging due to their complex dependence structure including directional
dependency. Particularly in genomics studies, it is known that there is certain
directional dependence between DNA expression, DNA methylation, and RNA
expression, widely called The Central Dogma.
Most of the existing multi-view clustering methods either assume an
independent structure or pair-wise (non-directional) dependency, thereby
ignoring the directional relationship. Motivated by this, we propose a
copula-based multi-view clustering model where a copula enables the model to
accommodate the directional dependence existing in the datasets. We conduct a
simulation experiment where the simulated datasets exhibiting inherent
directional dependence: it turns out that ignoring the directional dependence
negatively affects the clustering performance. As a real application, we
applied our model to the breast cancer tumor samples collected from The Cancer
Genome Altas (TCGA)