34,552 research outputs found
apk2vec: Semi-supervised multi-view representation learning for profiling Android applications
Building behavior profiles of Android applications (apps) with holistic, rich
and multi-view information (e.g., incorporating several semantic views of an
app such as API sequences, system calls, etc.) would help catering downstream
analytics tasks such as app categorization, recommendation and malware analysis
significantly better. Towards this goal, we design a semi-supervised
Representation Learning (RL) framework named apk2vec to automatically generate
a compact representation (aka profile/embedding) for a given app. More
specifically, apk2vec has the three following unique characteristics which make
it an excellent choice for largescale app profiling: (1) it encompasses
information from multiple semantic views such as API sequences, permissions,
etc., (2) being a semi-supervised embedding technique, it can make use of
labels associated with apps (e.g., malware family or app category labels) to
build high quality app profiles, and (3) it combines RL and feature hashing
which allows it to efficiently build profiles of apps that stream over time
(i.e., online learning). The resulting semi-supervised multi-view hash
embeddings of apps could then be used for a wide variety of downstream tasks
such as the ones mentioned above. Our extensive evaluations with more than
42,000 apps demonstrate that apk2vec's app profiles could significantly
outperform state-of-the-art techniques in four app analytics tasks namely,
malware detection, familial clustering, app clone detection and app
recommendation.Comment: International Conference on Data Mining, 201
Statistical Mechanics of Semi-Supervised Clustering in Sparse Graphs
We theoretically study semi-supervised clustering in sparse graphs in the
presence of pairwise constraints on the cluster assignments of nodes. We focus
on bi-cluster graphs, and study the impact of semi-supervision for varying
constraint density and overlap between the clusters. Recent results for
unsupervised clustering in sparse graphs indicate that there is a critical
ratio of within-cluster and between-cluster connectivities below which clusters
cannot be recovered with better than random accuracy. The goal of this paper is
to examine the impact of pairwise constraints on the clustering accuracy. Our
results suggests that the addition of constraints does not provide automatic
improvement over the unsupervised case. When the density of the constraints is
sufficiently small, their only impact is to shift the detection threshold while
preserving the criticality. Conversely, if the density of (hard) constraints is
above the percolation threshold, the criticality is suppressed and the
detection threshold disappears.Comment: 8 pages, 4 figure
Artificial neural networks in geospatial analysis
Artificial neural networks are computational models widely used in geospatial analysis for data classification, change detection, clustering, function approximation, and forecasting or prediction. There are many types of neural networks based on learning paradigm and network architectures. Their use is expected to grow with increasing availability of massive data from remote sensing and mobile platforms
Low-shot learning with large-scale diffusion
This paper considers the problem of inferring image labels from images when
only a few annotated examples are available at training time. This setup is
often referred to as low-shot learning, where a standard approach is to
re-train the last few layers of a convolutional neural network learned on
separate classes for which training examples are abundant. We consider a
semi-supervised setting based on a large collection of images to support label
propagation. This is possible by leveraging the recent advances on large-scale
similarity graph construction.
We show that despite its conceptual simplicity, scaling label propagation up
to hundred millions of images leads to state of the art accuracy in the
low-shot learning regime
- …