2,738 research outputs found

    AiDroid: When Heterogeneous Information Network Marries Deep Neural Network for Real-time Android Malware Detection

    Full text link
    The explosive growth and increasing sophistication of Android malware call for new defensive techniques that are capable of protecting mobile users against novel threats. In this paper, we first extract the runtime Application Programming Interface (API) call sequences from Android apps, and then analyze higher-level semantic relations within the ecosystem to comprehensively characterize the apps. To model different types of entities (i.e., app, API, IMEI, signature, affiliation) and the rich semantic relations among them, we then construct a structural heterogeneous information network (HIN) and present meta-path based approach to depict the relatedness over apps. To efficiently classify nodes (e.g., apps) in the constructed HIN, we propose the HinLearning method to first obtain in-sample node embeddings and then learn representations of out-of-sample nodes without rerunning/adjusting HIN embeddings at the first attempt. Afterwards, we design a deep neural network (DNN) classifier taking the learned HIN representations as inputs for Android malware detection. A comprehensive experimental study on the large-scale real sample collections from Tencent Security Lab is performed to compare various baselines. Promising experimental results demonstrate that our developed system AiDroid which integrates our proposed method outperforms others in real-time Android malware detection. AiDroid has already been incorporated into Tencent Mobile Security product that serves millions of users worldwide.Comment: The revised version will be published in IJCAI'2019 entitled "Out-of-sample Node Representation Learning for Heterogeneous Graph in Real-time Android Malware Detection

    Motif-based Convolutional Neural Network on Graphs

    Full text link
    This paper introduces a generalization of Convolutional Neural Networks (CNNs) to graphs with irregular linkage structures, especially heterogeneous graphs with typed nodes and schemas. We propose a novel spatial convolution operation to model the key properties of local connectivity and translation invariance, using high-order connection patterns or motifs. We develop a novel deep architecture Motif-CNN that employs an attention model to combine the features extracted from multiple patterns, thus effectively capturing high-order structural and feature information. Our experiments on semi-supervised node classification on real-world social networks and multiple representative heterogeneous graph datasets indicate significant gains of 6-21% over existing graph CNNs and other state-of-the-art techniques

    A Survey on Embedding Dynamic Graphs

    Full text link
    Embedding static graphs in low-dimensional vector spaces plays a key role in network analytics and inference, supporting applications like node classification, link prediction, and graph visualization. However, many real-world networks present dynamic behavior, including topological evolution, feature evolution, and diffusion. Therefore, several methods for embedding dynamic graphs have been proposed to learn network representations over time, facing novel challenges, such as time-domain modeling, temporal features to be captured, and the temporal granularity to be embedded. In this survey, we overview dynamic graph embedding, discussing its fundamentals and the recent advances developed so far. We introduce the formal definition of dynamic graph embedding, focusing on the problem setting and introducing a novel taxonomy for dynamic graph embedding input and output. We further explore different dynamic behaviors that may be encompassed by embeddings, classifying by topological evolution, feature evolution, and processes on networks. Afterward, we describe existing techniques and propose a taxonomy for dynamic graph embedding techniques based on algorithmic approaches, from matrix and tensor factorization to deep learning, random walks, and temporal point processes. We also elucidate main applications, including dynamic link prediction, anomaly detection, and diffusion prediction, and we further state some promising research directions in the area.Comment: 41 pages, 10 figure

    Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

    Full text link
    Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

    A Comprehensive Survey on Graph Neural Networks

    Full text link
    Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.Comment: Minor revision (updated tables and references

    Attention-based Graph Neural Network for Semi-supervised Learning

    Full text link
    Recently popularized graph neural networks achieve the state-of-the-art accuracy on a number of standard benchmark datasets for graph-based semi-supervised learning, improving significantly over existing approaches. These architectures alternate between a propagation layer that aggregates the hidden states of the local neighborhood and a fully-connected layer. Perhaps surprisingly, we show that a linear model, that removes all the intermediate fully-connected layers, is still able to achieve a performance comparable to the state-of-the-art models. This significantly reduces the number of parameters, which is critical for semi-supervised learning where number of labeled examples are small. This in turn allows a room for designing more innovative propagation layers. Based on this insight, we propose a novel graph neural network that removes all the intermediate fully-connected layers, and replaces the propagation layers with attention mechanisms that respect the structure of the graph. The attention mechanism allows us to learn a dynamic and adaptive local summary of the neighborhood to achieve more accurate predictions. In a number of experiments on benchmark citation networks datasets, we demonstrate that our approach outperforms competing methods. By examining the attention weights among neighbors, we show that our model provides some interesting insights on how neighbors influence each other

    Unsupervised learning of latent edge types from multi-relational data

    Get PDF
    Many relational datasets, including relational databases, feature links of different types (e.g., actors act in movies, users rate movies), known as multi-relational, heterogeneous, or multilayer networks. Edge types/network layers are often not explicitly labeled, even when they influence the underlying graph generation process. For example, IMDb lists Tom Cruise as a cast member of Mission Impossible, but not as its star. Inferring latent layers is useful for relational prediction tasks (e.g., predict Tom Cruise’s salary or his presence in other movies). This thesis discusses Latent Layer Generative Framework - LLGF, a generative framework for learning latent layers that generalizes Variational Graph Auto-Encoders (VGAEs) with arbitrary node representation encoders and link generation decoders. The decoder treats the observed edge type signal as a linear combination of latent layer decoders. The encoder infers parallel node representations, one for each latent layer. We evaluate our proposed framework, LLGF, on eight benchmark graph learning datasets for this study. Four of the datasets are heterogeneous (originally labeled with edge types); we apply LLGF after removing the edge labels to assess how well it recovers ground-truth layers. LLGF increases link prediction accuracy, especially for heterogeneous datasets (up to 5% AUC), and recovers the ground-truth layers exceptionally well

    Physical Attribute Prediction Using Deep Residual Neural Networks

    Full text link
    Images taken from the Internet have been used alongside Deep Learning for many different tasks such as: smile detection, ethnicity, hair style, hair colour, gender and age prediction. After witnessing these usages, we were wondering what other attributes can be predicted from facial images available on the Internet. In this paper we tackle the prediction of physical attributes from face images using Convolutional Neural Networks trained on our dataset named FIRW. We crawled around 61, 000 images from the web, then use face detection to crop faces from these real world images. We choose ResNet-50 as our base network architecture. This network was pretrained for the task of face recognition by using the VGG-Face dataset, and we finetune it by using our own dataset to predict physical attributes. Separate networks are trained for the prediction of body type, ethnicity, gender, height and weight; our models achieve the following accuracies for theses tasks, respectively: 84.58%, 87.34%, 97.97%, 70.51%, 63.99%. To validate our choice of ResNet-50 as the base architecture, we also tackle the famous CelebA dataset. Our models achieve an averagy accuracy of 91.19% on CelebA, which is comparable to state-of-the-art approaches

    Multimodal Deep Network Embedding with Integrated Structure and Attribute Information

    Full text link
    Network embedding is the process of learning low-dimensional representations for nodes in a network, while preserving node features. Existing studies only leverage network structure information and focus on preserving structural features. However, nodes in real-world networks often have a rich set of attributes providing extra semantic information. It has been demonstrated that both structural and attribute features are important for network analysis tasks. To preserve both features, we investigate the problem of integrating structure and attribute information to perform network embedding and propose a Multimodal Deep Network Embedding (MDNE) method. MDNE captures the non-linear network structures and the complex interactions among structures and attributes, using a deep model consisting of multiple layers of non-linear functions. Since structures and attributes are two different types of information, a multimodal learning method is adopted to pre-process them and help the model to better capture the correlations between node structure and attribute information. We employ both structural proximity and attribute proximity in the loss function to preserve the respective features and the representations are obtained by minimizing the loss function. Results of extensive experiments on four real-world datasets show that the proposed method performs significantly better than baselines on a variety of tasks, which demonstrate the effectiveness and generality of our method.Comment: 15 pages, 10 figure

    COSINE: Compressive Network Embedding on Large-scale Information Networks

    Full text link
    There is recently a surge in approaches that learn low-dimensional embeddings of nodes in networks. As there are many large-scale real-world networks, it's inefficient for existing approaches to store amounts of parameters in memory and update them edge after edge. With the knowledge that nodes having similar neighborhood will be close to each other in embedding space, we propose COSINE (COmpresSIve NE) algorithm which reduces the memory footprint and accelerates the training process by parameters sharing among similar nodes. COSINE applies graph partitioning algorithms to networks and builds parameter sharing dependency of nodes based on the result of partitioning. With parameters sharing among similar nodes, COSINE injects prior knowledge about higher structural information into training process which makes network embedding more efficient and effective. COSINE can be applied to any embedding lookup method and learn high-quality embeddings with limited memory and shorter training time. We conduct experiments of multi-label classification and link prediction, where baselines and our model have the same memory usage. Experimental results show that COSINE gives baselines up to 23% increase on classification and up to 25% increase on link prediction. Moreover, time of all representation learning methods using COSINE decreases from 30% to 70%
    • …
    corecore