1,418 research outputs found
Mask Combination of Multi-layer Graphs for Global Structure Inference
Structure inference is an important task for network data processing and
analysis in data science. In recent years, quite a few approaches have been
developed to learn the graph structure underlying a set of observations
captured in a data space. Although real-world data is often acquired in
settings where relationships are influenced by a priori known rules, such
domain knowledge is still not well exploited in structure inference problems.
In this paper, we identify the structure of signals defined in a data space
whose inner relationships are encoded by multi-layer graphs. We aim at properly
exploiting the information originating from each layer to infer the global
structure underlying the signals. We thus present a novel method for combining
the multiple graphs into a global graph using mask matrices, which are
estimated through an optimization problem that accommodates the multi-layer
graph information and a signal representation model. The proposed mask
combination method also estimates the contribution of each graph layer in the
structure of signals. The experiments conducted both on synthetic and
real-world data suggest that integrating the multi-layer graph representation
of the data in the structure inference framework enhances the learning
procedure considerably by adapting to the quality and the quantity of the input
data
Intelligent Malware Detection Using File-to-file Relations and Enhancing its Security against Adversarial Attacks
With computing devices and the Internet being indispensable in people\u27s everyday life, malware has posed serious threats to their security, making its detection of utmost concern. To protect legitimate users from the evolving malware attacks, machine learning-based systems have been successfully deployed and offer unparalleled flexibility in automatic malware detection. In most of these systems, resting on the analysis of different content-based features either statically or dynamically extracted from the file samples, various kinds of classifiers are constructed to detect malware. However, besides content-based features, file-to-file relations, such as file co-existence, can provide valuable information in malware detection and make evasion harder. To better understand the properties of file-to-file relations, we construct the file co-existence graph. Resting on the constructed graph, we investigate the semantic relatedness among files, and leverage graph inference, active learning and graph representation learning for malware detection. Comprehensive experimental results on the real sample collections from Comodo Cloud Security Center demonstrate the effectiveness of our proposed learning paradigms.
As machine learning-based detection systems become more widely deployed, the incentive for defeating them increases. Therefore, we go further insight into the arms race between adversarial malware attack and defense, and aim to enhance the security of machine learning-based malware detection systems. In particular, we first explore the adversarial attacks under different scenarios (i.e., different levels of knowledge the attackers might have about the targeted learning system), and define a general attack strategy to thoroughly assess the adversarial behaviors. Then, considering different skills and capabilities of the attackers, we propose the corresponding secure-learning paradigms to counter the adversarial attacks and enhance the security of the learning systems while not compromising the detection accuracy. We conduct a series of comprehensive experimental studies based on the real sample collections from Comodo Cloud Security Center and the promising results demonstrate the effectiveness of our proposed secure-learning models, which can be readily applied to other detection tasks
Network Representation Learning: From Traditional Feature Learning to Deep Learning
Network representation learning (NRL) is an effective graph analytics
technique and promotes users to deeply understand the hidden characteristics of
graph data. It has been successfully applied in many real-world tasks related
to network science, such as social network data processing, biological
information processing, and recommender systems. Deep Learning is a powerful
tool to learn data features. However, it is non-trivial to generalize deep
learning to graph-structured data since it is different from the regular data
such as pictures having spatial information and sounds having temporal
information. Recently, researchers proposed many deep learning-based methods in
the area of NRL. In this survey, we investigate classical NRL from traditional
feature learning method to the deep learning-based model, analyze relationships
between them, and summarize the latest progress. Finally, we discuss open
issues considering NRL and point out the future directions in this field
A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions
In recent decades, social network anonymization has become a crucial research
field due to its pivotal role in preserving users' privacy. However, the high
diversity of approaches introduced in relevant studies poses a challenge to
gaining a profound understanding of the field. In response to this, the current
study presents an exhaustive and well-structured bibliometric analysis of the
social network anonymization field. To begin our research, related studies from
the period of 2007-2022 were collected from the Scopus Database then
pre-processed. Following this, the VOSviewer was used to visualize the network
of authors' keywords. Subsequently, extensive statistical and network analyses
were performed to identify the most prominent keywords and trending topics.
Additionally, the application of co-word analysis through SciMAT and the
Alluvial diagram allowed us to explore the themes of social network
anonymization and scrutinize their evolution over time. These analyses
culminated in an innovative taxonomy of the existing approaches and
anticipation of potential trends in this domain. To the best of our knowledge,
this is the first bibliometric analysis in the social network anonymization
field, which offers a deeper understanding of the current state and an
insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure
Robust graph neural networks via ensemble learning
Graph neural networks (GNNs) have demonstrated a remarkable ability in the task of semi-supervised node classification. However, most existing GNNs suffer from the nonrobustness issues, which poses a great challenge for applying GNNs into sensitive scenarios. Some researchers concentrate on constructing an ensemble model to mitigate the nonrobustness issues. Nevertheless, these methods ignore the interaction among base models, leading to similar graph representations. Moreover, due to the deterministic propagation applied in most existing GNNs, each node highly relies on its neighbors, leaving the nodes to be sensitive to perturbations. Therefore, in this paper, we propose a novel framework of graph ensemble learning based on knowledge passing (called GEL) to address the above issues. In order to achieve interaction, we consider the predictions of prior models as knowledge to obtain more reliable predictions. Moreover, we design a multilayer DropNode propagation strategy to reduce each node’s dependence on particular neighbors. This strategy also empowers each node to aggregate information from diverse neighbors, alleviating oversmoothing issues. We conduct experiments on three benchmark datasets, including Cora, Citeseer, and Pubmed. GEL outperforms GCN by more than 5% in terms of accuracy across all three datasets and also performs better than other state-of-the-art baselines. Extensive experimental results also show that the GEL alleviates the nonrobustness and oversmoothing issues. © 2022 by the authors. Licensee MDPI, Basel, Switzerland
Network representation learning: From traditional feature learning to deep learning
Network representation learning (NRL) is an effective graph analytics technique and promotes users to deeply understand the hidden characteristics of graph data. It has been successfully applied in many real-world tasks related to network science, such as social network data processing, biological information processing, and recommender systems. Deep Learning is a powerful tool to learn data features. However, it is non-trivial to generalize deep learning to graph-structured data since it is different from the regular data such as pictures having spatial information and sounds having temporal information. Recently, researchers proposed many deep learning-based methods in the area of NRL. In this survey, we investigate classical NRL from traditional feature learning method to the deep learning-based model, analyze relationships between them, and summarize the latest progress. Finally, we discuss open issues considering NRL and point out the future directions in this field. © 2020 Institute of Electrical and Electronics Engineers Inc.. All rights reserved
Development and evaluation of machine learning algorithms for biomedical applications
Gene network inference and drug response prediction are two important problems in computational biomedicine. The former helps scientists better understand the functional elements and regulatory circuits of cells. The latter helps a physician gain full understanding of the effective treatment on patients. Both problems have been widely studied, though current solutions are far from perfect. More research is needed to improve the accuracy of existing approaches.
This dissertation develops machine learning and data mining algorithms, and applies these algorithms to solve the two important biomedical problems. Specifically, to tackle the gene network inference problem, the dissertation proposes (i) new techniques for selecting topological features suitable for link prediction in gene networks; a graph sparsification method for network sampling; (iii) combined supervised and unsupervised methods to infer gene networks; and (iv) sampling and boosting techniques for reverse engineering gene networks. For drug sensitivity prediction problem, the dissertation presents (i) an instance selection technique and hybrid method for drug sensitivity prediction; (ii) a link prediction approach to drug sensitivity prediction; a noise-filtering method for drug sensitivity prediction; and (iv) transfer learning approaches for enhancing the performance of drug sensitivity prediction. Substantial experiments are conducted to evaluate the effectiveness and efficiency of the proposed algorithms. Experimental results demonstrate the feasibility of the algorithms and their superiority over the existing approaches
- …