Search CORE

4,117 research outputs found

Identifying meaningful clusters in malware data

Author: Amorim Renato
Lopez Ruiz Carlos D
Publication venue: 'Elsevier BV'
Publication date: 01/09/2021
Field of study

Finding meaningful clusters in drive-by-download malware data is a particularly difficult task. Malware data tends to contain overlapping clusters with wide variations of cardinality. This happens because there can be considerable similarity between malware samples (some are even said to belong to the same family), and these tend to appear in bursts. Clustering algorithms are usually applied to normalised data sets. However, the process of normalisation aims at setting features with different range values to have a similar contribution to the clustering. It does not favour more meaningful features over those that are less meaningful, an effect one should perhaps expect of the data pre-processing stage. In this paper we introduce a method to deal precisely with the problem above. This is an iterative data pre-processing method capable of aiding to increase the separation between clusters. It does so by calculating the within-cluster degree of relevance of each feature, and then it uses these as a data rescaling factor. By repeating this until convergence our malware data was separated in clear clusters, leading to a higher average silhouette width

University of Essex Research Repository

Low-rank Similarity Measure for Role Model Extraction

Author: Arnaud Browet
Arnaud Browet
Paul Michel
Paul Van Dooren
Van Dooren
Publication venue
Publication date: 24/07/2014
Field of study

Computing meaningful clusters of nodes is crucial to analyze large networks. In this paper, we present a pairwise node similarity measure that allows to extract roles, i.e. group of nodes sharing similar flow patterns within a network. We propose a low rank iterative scheme to approximate the similarity measure for very large networks. Finally, we show that our low rank similarity score successfully extracts the different roles in random graphs and that its performances are similar to the pairwise similarity measure.Comment: 7 pages, 2 columns, 4 figures, conference paper for MTNS201

arXiv.org e-Print Archive

CiteSeerX

A unified framework for detecting groups and application to shape recognition

Author: Cao Frédéric
Delon Julie
Desolneux Agnès
Musé Pablo
Sur Frédéric
Publication venue: HAL CCSD
Publication date: 01/01/2005
Field of study

A unified a contrario detection method is proposed to solve three classical problems in clustering analysis. The first one is to evaluate the validity of a cluster candidate. The second problem is that meaningful clusters can contain or be contained in other meaningful clusters. A rule is needed to define locally optimal clusters by inclusion. The third problem is the definition of a correct merging rule between meaningful clusters, permitting to decide whether they should stay separate or unit. The motivation of this theory is shape recognition. Matching algorithms usually compute correspondences between more or less local features (called shape elements) between images to be compared. This paper intends to form spatially coherent groups between matching shape elements into a shape. Each pair of matching shape elements indeed leads to a unique transformation (similarity or affine map.) As an application, the present theory on the choice of the right clusters is used to group these shape elements into shapes by detecting clusters in the transformation space

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Rennes 1

Exploring the similarity of medical imaging classification problems

Author: Bozorg Behdad Dasht
Cheplygina Veronika
Moeskops Pim
Pluim Josien
Veta Mitko
Publication venue
Publication date: 01/01/2017
Field of study

Supervised learning is ubiquitous in medical image analysis. In this paper we consider the problem of meta-learning -- predicting which methods will perform well in an unseen classification problem, given previous experience with other classification problems. We investigate the first step of such an approach: how to quantify the similarity of different classification problems. We characterize datasets sampled from six classification problems by performance ranks of simple classifiers, and define the similarity by the inverse of Euclidean distance in this meta-feature space. We visualize the similarities in a 2D space, where meaningful clusters start to emerge, and show that the proposed representation can be used to classify datasets according to their origin with 89.3\% accuracy. These findings, together with the observations of recent trends in machine learning, suggest that meta-learning could be a valuable tool for the medical imaging community

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Erasmus University Digital Repository

Utrecht University Repository