31,291 research outputs found
A similarity-based community detection method with multiple prototype representation
Communities are of great importance for understanding graph structures in
social networks. Some existing community detection algorithms use a single
prototype to represent each group. In real applications, this may not
adequately model the different types of communities and hence limits the
clustering performance on social networks. To address this problem, a
Similarity-based Multi-Prototype (SMP) community detection approach is proposed
in this paper. In SMP, vertices in each community carry various weights to
describe their degree of representativeness. This mechanism enables each
community to be represented by more than one node. The centrality of nodes is
used to calculate prototype weights, while similarity is utilized to guide us
to partitioning the graph. Experimental results on computer generated and
real-world networks clearly show that SMP performs well for detecting
communities. Moreover, the method could provide richer information for the
inner structure of the detected communities with the help of prototype weights
compared with the existing community detection models
Semantic Concept Co-Occurrence Patterns for Image Annotation and Retrieval.
Describing visual image contents by semantic concepts is an effective and straightforward way to facilitate various high level applications. Inferring semantic concepts from low-level pictorial feature analysis is challenging due to the semantic gap problem, while manually labeling concepts is unwise because of a large number of images in both online and offline collections. In this paper, we present a novel approach to automatically generate intermediate image descriptors by exploiting concept co-occurrence patterns in the pre-labeled training set that renders it possible to depict complex scene images semantically. Our work is motivated by the fact that multiple concepts that frequently co-occur across images form patterns which could provide contextual cues for individual concept inference. We discover the co-occurrence patterns as hierarchical communities by graph modularity maximization in a network with nodes and edges representing concepts and co-occurrence relationships separately. A random walk process working on the inferred concept probabilities with the discovered co-occurrence patterns is applied to acquire the refined concept signature representation. Through experiments in automatic image annotation and semantic image retrieval on several challenging datasets, we demonstrate the effectiveness of the proposed concept co-occurrence patterns as well as the concept signature representation in comparison with state-of-the-art approaches
Android Malware Clustering through Malicious Payload Mining
Clustering has been well studied for desktop malware analysis as an effective
triage method. Conventional similarity-based clustering techniques, however,
cannot be immediately applied to Android malware analysis due to the excessive
use of third-party libraries in Android application development and the
widespread use of repackaging in malware development. We design and implement
an Android malware clustering system through iterative mining of malicious
payload and checking whether malware samples share the same version of
malicious payload. Our system utilizes a hierarchical clustering technique and
an efficient bit-vector format to represent Android apps. Experimental results
demonstrate that our clustering approach achieves precision of 0.90 and recall
of 0.75 for Android Genome malware dataset, and average precision of 0.98 and
recall of 0.96 with respect to manually verified ground-truth.Comment: Proceedings of the 20th International Symposium on Research in
Attacks, Intrusions and Defenses (RAID 2017
Evidential relational clustering using medoids
In real clustering applications, proximity data, in which only pairwise
similarities or dissimilarities are known, is more general than object data, in
which each pattern is described explicitly by a list of attributes.
Medoid-based clustering algorithms, which assume the prototypes of classes are
objects, are of great value for partitioning relational data sets. In this
paper a new prototype-based clustering method, named Evidential C-Medoids
(ECMdd), which is an extension of Fuzzy C-Medoids (FCMdd) on the theoretical
framework of belief functions is proposed. In ECMdd, medoids are utilized as
the prototypes to represent the detected classes, including specific classes
and imprecise classes. Specific classes are for the data which are distinctly
far from the prototypes of other classes, while imprecise classes accept the
objects that may be close to the prototypes of more than one class. This soft
decision mechanism could make the clustering results more cautious and reduce
the misclassification rates. Experiments in synthetic and real data sets are
used to illustrate the performance of ECMdd. The results show that ECMdd could
capture well the uncertainty in the internal data structure. Moreover, it is
more robust to the initializations compared with FCMdd.Comment: in The 18th International Conference on Information Fusion, July
2015, Washington, DC, USA , Jul 2015, Washington, United State
GOGGLES: Automatic Image Labeling with Affinity Coding
Generating large labeled training data is becoming the biggest bottleneck in
building and deploying supervised machine learning models. Recently, the data
programming paradigm has been proposed to reduce the human cost in labeling
training data. However, data programming relies on designing labeling functions
which still requires significant domain expertise. Also, it is prohibitively
difficult to write labeling functions for image datasets as it is hard to
express domain knowledge using raw features for images (pixels).
We propose affinity coding, a new domain-agnostic paradigm for automated
training data labeling. The core premise of affinity coding is that the
affinity scores of instance pairs belonging to the same class on average should
be higher than those of pairs belonging to different classes, according to some
affinity functions. We build the GOGGLES system that implements affinity coding
for labeling image datasets by designing a novel set of reusable affinity
functions for images, and propose a novel hierarchical generative model for
class inference using a small development set.
We compare GOGGLES with existing data programming systems on 5 image labeling
tasks from diverse domains. GOGGLES achieves labeling accuracies ranging from a
minimum of 71% to a maximum of 98% without requiring any extensive human
annotation. In terms of end-to-end performance, GOGGLES outperforms the
state-of-the-art data programming system Snuba by 21% and a state-of-the-art
few-shot learning technique by 5%, and is only 7% away from the fully
supervised upper bound.Comment: Published at 2020 ACM SIGMOD International Conference on Management
of Dat
- …