4,117 research outputs found
Identifying meaningful clusters in malware data
Finding meaningful clusters in drive-by-download malware data is a particularly difficult task. Malware data tends to contain overlapping clusters with wide variations of cardinality. This happens because there can be considerable similarity between malware samples (some are even said to belong to the same family), and these tend to appear in bursts. Clustering algorithms are usually applied to normalised data sets. However, the process of normalisation aims at setting features with different range values to have a similar contribution to the clustering. It does not favour more meaningful features over those that are less meaningful, an effect one should perhaps expect of the data pre-processing stage.
In this paper we introduce a method to deal precisely with the problem above. This is an iterative data pre-processing method capable of aiding to increase the separation between clusters. It does so by calculating the within-cluster degree of relevance of each feature, and then it uses these as a data rescaling factor. By repeating this until convergence our malware data was separated in clear clusters, leading to a higher average silhouette width
Low-rank Similarity Measure for Role Model Extraction
Computing meaningful clusters of nodes is crucial to analyze large networks.
In this paper, we present a pairwise node similarity measure that allows to
extract roles, i.e. group of nodes sharing similar flow patterns within a
network. We propose a low rank iterative scheme to approximate the similarity
measure for very large networks. Finally, we show that our low rank similarity
score successfully extracts the different roles in random graphs and that its
performances are similar to the pairwise similarity measure.Comment: 7 pages, 2 columns, 4 figures, conference paper for MTNS201
A unified framework for detecting groups and application to shape recognition
A unified a contrario detection method is proposed to solve three classical problems in clustering analysis. The first one is to evaluate the validity of a cluster candidate. The second problem is that meaningful clusters can contain or be contained in other meaningful clusters. A rule is needed to define locally optimal clusters by inclusion. The third problem is the definition of a correct merging rule between meaningful clusters, permitting to decide whether they should stay separate or unit. The motivation of this theory is shape recognition. Matching algorithms usually compute correspondences between more or less local features (called shape elements) between images to be compared. This paper intends to form spatially coherent groups between matching shape elements into a shape. Each pair of matching shape elements indeed leads to a unique transformation (similarity or affine map.) As an application, the present theory on the choice of the right clusters is used to group these shape elements into shapes by detecting clusters in the transformation space
Exploring the similarity of medical imaging classification problems
Supervised learning is ubiquitous in medical image analysis. In this paper we
consider the problem of meta-learning -- predicting which methods will perform
well in an unseen classification problem, given previous experience with other
classification problems. We investigate the first step of such an approach: how
to quantify the similarity of different classification problems. We
characterize datasets sampled from six classification problems by performance
ranks of simple classifiers, and define the similarity by the inverse of
Euclidean distance in this meta-feature space. We visualize the similarities in
a 2D space, where meaningful clusters start to emerge, and show that the
proposed representation can be used to classify datasets according to their
origin with 89.3\% accuracy. These findings, together with the observations of
recent trends in machine learning, suggest that meta-learning could be a
valuable tool for the medical imaging community
- …