20,689 research outputs found
SMART: Unique splitting-while-merging framework for gene clustering
Copyright @ 2014 Fa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author and source are credited.Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms.National Institute for Health Researc
Benchmarking in cluster analysis: A white paper
To achieve scientific progress in terms of building a cumulative body of
knowledge, careful attention to benchmarking is of the utmost importance. This
means that proposals of new methods of data pre-processing, new data-analytic
techniques, and new methods of output post-processing, should be extensively
and carefully compared with existing alternatives, and that existing methods
should be subjected to neutral comparison studies. To date, benchmarking and
recommendations for benchmarking have been frequently seen in the context of
supervised learning. Unfortunately, there has been a dearth of guidelines for
benchmarking in an unsupervised setting, with the area of clustering as an
important subdomain. To address this problem, discussion is given to the
theoretical conceptual underpinnings of benchmarking in the field of cluster
analysis by means of simulated as well as empirical data. Subsequently, the
practicalities of how to address benchmarking questions in clustering are dealt
with, and foundational recommendations are made
Opaque Service Virtualisation: A Practical Tool for Emulating Endpoint Systems
Large enterprise software systems make many complex interactions with other
services in their environment. Developing and testing for production-like
conditions is therefore a very challenging task. Current approaches include
emulation of dependent services using either explicit modelling or
record-and-replay approaches. Models require deep knowledge of the target
services while record-and-replay is limited in accuracy. Both face
developmental and scaling issues. We present a new technique that improves the
accuracy of record-and-replay approaches, without requiring prior knowledge of
the service protocols. The approach uses Multiple Sequence Alignment to derive
message prototypes from recorded system interactions and a scheme to match
incoming request messages against prototypes to generate response messages. We
use a modified Needleman-Wunsch algorithm for distance calculation during
message matching. Our approach has shown greater than 99% accuracy for four
evaluated enterprise system messaging protocols. The approach has been
successfully integrated into the CA Service Virtualization commercial product
to complement its existing techniques.Comment: In Proceedings of the 38th International Conference on Software
Engineering Companion (pp. 202-211). arXiv admin note: text overlap with
arXiv:1510.0142
Many-to-Many Graph Matching: a Continuous Relaxation Approach
Graphs provide an efficient tool for object representation in various
computer vision applications. Once graph-based representations are constructed,
an important question is how to compare graphs. This problem is often
formulated as a graph matching problem where one seeks a mapping between
vertices of two graphs which optimally aligns their structure. In the classical
formulation of graph matching, only one-to-one correspondences between vertices
are considered. However, in many applications, graphs cannot be matched
perfectly and it is more interesting to consider many-to-many correspondences
where clusters of vertices in one graph are matched to clusters of vertices in
the other graph. In this paper, we formulate the many-to-many graph matching
problem as a discrete optimization problem and propose an approximate algorithm
based on a continuous relaxation of the combinatorial problem. We compare our
method with other existing methods on several benchmark computer vision
datasets.Comment: 1
- …