940 research outputs found
apk2vec: Semi-supervised multi-view representation learning for profiling Android applications
Building behavior profiles of Android applications (apps) with holistic, rich
and multi-view information (e.g., incorporating several semantic views of an
app such as API sequences, system calls, etc.) would help catering downstream
analytics tasks such as app categorization, recommendation and malware analysis
significantly better. Towards this goal, we design a semi-supervised
Representation Learning (RL) framework named apk2vec to automatically generate
a compact representation (aka profile/embedding) for a given app. More
specifically, apk2vec has the three following unique characteristics which make
it an excellent choice for largescale app profiling: (1) it encompasses
information from multiple semantic views such as API sequences, permissions,
etc., (2) being a semi-supervised embedding technique, it can make use of
labels associated with apps (e.g., malware family or app category labels) to
build high quality app profiles, and (3) it combines RL and feature hashing
which allows it to efficiently build profiles of apps that stream over time
(i.e., online learning). The resulting semi-supervised multi-view hash
embeddings of apps could then be used for a wide variety of downstream tasks
such as the ones mentioned above. Our extensive evaluations with more than
42,000 apps demonstrate that apk2vec's app profiles could significantly
outperform state-of-the-art techniques in four app analytics tasks namely,
malware detection, familial clustering, app clone detection and app
recommendation.Comment: International Conference on Data Mining, 201
Deep Learning para BigData
We live in a world where data is becoming increasingly valuable and increasingly abundant in volume. Every company produces data, be it from sales, sensors, and various other sources. Since the dawn of the smartphone, virtually every person in the world is connected to the internet and contributes to data generation. Social networks are big contributors to this Big Data boom. How do we extract insight from such a rich data environment? Is Deep Learning capable of circumventing Big Data’s challenges? This is what we intend to understand. To reach a conclusion, Social Network data is used as a case study for predicting sentiment changes in the Stock Market. The objective of this dissertation is to develop a computational study and analyse its performance. The outputs will contribute to understand Deep Learning’s usage with Big Data and how it acts in Sentiment analysis.Vivemos num mundo onde dados são cada vez mais valiosos e abundantes. Todas as empresas produzem dados, sejam eles provenientes de valores de vendas, parâmetros de sensores bem como de outras diversas fontes. Desde que os smartphones se tornaram pessoais, o mundo tornou-se mais conectado, já que virtualmente todas as pessoas passaram a ter a internet na ponta dos dedos. Esta explosão tecnológica foi acompanhada por uma explosão de dados. As redes sociais têm um grande contributo para a quantidade de dados produzida. Mas como se analisam estes dados? Será que Deep Learning poderá dar a volta aos desafios que Big Data traz inerentemente? É isso se pretende perceber. Para chegar a uma conclusão, foi utilizado um caso de estudo de redes sociais para previsão de alterações nas ações de mercados financeiros relacionadas com as opiniões dos utilizadores destas. O objetivo desta dissertação é o desenvolvimento de um estudo computacional e a análise da sua performance. Os resultados contribuirão para entender o uso de Deep Learning com Big Data, com especial foco em análise de sentimento. The objective of this dissertation is to develop a computational study and analyse its performance. The outputs will contribute to understand Deep Learning’s usage with Big Data and how it acts in Sentiment analysis
edge2vec: Representation learning using edge semantics for biomedical knowledge discovery
Representation learning provides new and powerful graph analytical approaches
and tools for the highly valued data science challenge of mining knowledge
graphs. Since previous graph analytical methods have mostly focused on
homogeneous graphs, an important current challenge is extending this
methodology for richly heterogeneous graphs and knowledge domains. The
biomedical sciences are such a domain, reflecting the complexity of biology,
with entities such as genes, proteins, drugs, diseases, and phenotypes, and
relationships such as gene co-expression, biochemical regulation, and
biomolecular inhibition or activation. Therefore, the semantics of edges and
nodes are critical for representation learning and knowledge discovery in real
world biomedical problems. In this paper, we propose the edge2vec model, which
represents graphs considering edge semantics. An edge-type transition matrix is
trained by an Expectation-Maximization approach, and a stochastic gradient
descent model is employed to learn node embedding on a heterogeneous graph via
the trained transition matrix. edge2vec is validated on three biomedical domain
tasks: biomedical entity classification, compound-gene bioactivity prediction,
and biomedical information retrieval. Results show that by considering
edge-types into node embedding learning in heterogeneous graphs,
\textbf{edge2vec}\ significantly outperforms state-of-the-art models on all
three tasks. We propose this method for its added value relative to existing
graph analytical methodology, and in the real world context of biomedical
knowledge discovery applicability.Comment: 10 page
- …