1,717 research outputs found
Hypergraph Neural Networks
In this paper, we present a hypergraph neural networks (HGNN) framework for
data representation learning, which can encode high-order data correlation in a
hypergraph structure. Confronting the challenges of learning representation for
complex data in real practice, we propose to incorporate such data structure in
a hypergraph, which is more flexible on data modeling, especially when dealing
with complex data. In this method, a hyperedge convolution operation is
designed to handle the data correlation during representation learning. In this
way, traditional hypergraph learning procedure can be conducted using hyperedge
convolution operations efficiently. HGNN is able to learn the hidden layer
representation considering the high-order data structure, which is a general
framework considering the complex data correlations. We have conducted
experiments on citation network classification and visual object recognition
tasks and compared HGNN with graph convolutional networks and other traditional
methods. Experimental results demonstrate that the proposed HGNN method
outperforms recent state-of-the-art methods. We can also reveal from the
results that the proposed HGNN is superior when dealing with multi-modal data
compared with existing methods.Comment: Accepted in AAAI'201
Efficient Query-Based Attack against ML-Based Android Malware Detection under Zero Knowledge Setting
The widespread adoption of the Android operating system has made malicious
Android applications an appealing target for attackers. Machine learning-based
(ML-based) Android malware detection (AMD) methods are crucial in addressing
this problem; however, their vulnerability to adversarial examples raises
concerns. Current attacks against ML-based AMD methods demonstrate remarkable
performance but rely on strong assumptions that may not be realistic in
real-world scenarios, e.g., the knowledge requirements about feature space,
model parameters, and training dataset. To address this limitation, we
introduce AdvDroidZero, an efficient query-based attack framework against
ML-based AMD methods that operates under the zero knowledge setting. Our
extensive evaluation shows that AdvDroidZero is effective against various
mainstream ML-based AMD methods, in particular, state-of-the-art such methods
and real-world antivirus solutions.Comment: To Appear in the ACM Conference on Computer and Communications
Security, November, 202
Static Semantics Reconstruction for Enhancing JavaScript-WebAssembly Multilingual Malware Detection
The emergence of WebAssembly allows attackers to hide the malicious
functionalities of JavaScript malware in cross-language interoperations, termed
JavaScript-WebAssembly multilingual malware (JWMM). However, existing
anti-virus solutions based on static program analysis are still limited to
monolingual code. As a result, their detection effectiveness decreases
significantly against JWMM. The detection of JWMM is challenging due to the
complex interoperations and semantic diversity between JavaScript and
WebAssembly. To bridge this gap, we present JWBinder, the first technique aimed
at enhancing the static detection of JWMM. JWBinder performs a
language-specific data-flow analysis to capture the cross-language
interoperations and then characterizes the functionalities of JWMM through a
unified high-level structure called Inter-language Program Dependency Graph.
The extensive evaluation on one of the most representative real-world
anti-virus platforms, VirusTotal, shows that \system effectively enhances
anti-virus systems from various vendors and increases the overall successful
detection rate against JWMM from 49.1\% to 86.2\%. Additionally, we assess the
side effects and runtime overhead of JWBinder, corroborating its practical
viability in real-world applications.Comment: Accepted to ESORICS 202
OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving
Visual Odometry (VO) plays a pivotal role in autonomous systems, with a
principal challenge being the lack of depth information in camera images. This
paper introduces OCC-VO, a novel framework that capitalizes on recent advances
in deep learning to transform 2D camera images into 3D semantic occupancy,
thereby circumventing the traditional need for concurrent estimation of ego
poses and landmark locations. Within this framework, we utilize the TPV-Former
to convert surround view cameras' images into 3D semantic occupancy. Addressing
the challenges presented by this transformation, we have specifically tailored
a pose estimation and mapping algorithm that incorporates Semantic Label
Filter, Dynamic Object Filter, and finally, utilizes Voxel PFilter for
maintaining a consistent global semantic map. Evaluations on the Occ3D-nuScenes
not only showcase a 20.6% improvement in Success Ratio and a 29.6% enhancement
in trajectory accuracy against ORB-SLAM3, but also emphasize our ability to
construct a comprehensive map. Our implementation is open-sourced and available
at: https://github.com/USTCLH/OCC-VO.Comment: 7pages, 3 figure
Linear-Communication Asynchronous Complete Secret Sharing with Optimal Resilience
Secure multiparty computation (MPC) allows a set of parties to jointly compute a function on their private inputs. In this work, we focus on the information-theoretic MPC in the \emph{asynchronous network} setting with optimal resilience (). The best-known result in this setting is achieved by Choudhury and Patra [J. Cryptol \u2723], which requires bits per multiplication gate, where is the size of a field element.
An asynchronous complete secret sharing (ACSS) protocol allows a dealer to share a batch of Shamir sharings such that all parties eventually receive their shares. ACSS is an important building block in AMPC. The best-known result of ACSS is due to Choudhury and Patra [J. Cryptol \u2723], which requires bits per sharing. On the other hand, in the synchronous setting, it is known that distributing Shamir sharings can be achieved with bits per sharing. There is a gap of in the communication between the synchronous setting and the asynchronous setting.
Our work closes this gap by presenting the first ACSS protocol that achieves bits per sharing. When combined with the compiler from ACSS to AMPC by Choudhury and Patra [IEEE Trans. Inf. Theory \u2717], we obtain an AMPC with bits per multiplication gate, improving the previously best-known result by a factor of . Moreover, with a concurrent work that improves the compiler by Choudhury and Patra by a factor of , we obtain the first AMPC with bits per multiplication gate
: Transferring Visual Representations for Reinforcement Learning via Prompting
It is important for deep reinforcement learning (DRL) algorithms to transfer
their learned policies to new environments that have different visual inputs.
In this paper, we introduce Prompt based Proximal Policy Optimization
(), a three-stage DRL algorithm that transfers visual representations
from a target to a source environment by applying prompting. The process of
consists of three stages: pre-training, prompting, and predicting. In
particular, we specify a prompt-transformer for representation conversion and
propose a two-step training process to train the prompt-transformer for the
target environment, while the rest of the DRL pipeline remains unchanged. We
implement and evaluate it on the OpenAI CarRacing video game. The
experimental results show that outperforms the state-of-the-art visual
transferring schemes. In particular, allows the learned policies to
perform well in environments with different visual inputs, which is much more
effective than retraining the policies in these environments.Comment: This paper has been accepted to be presented at the upcoming IEEE
International Conference on Multimedia & Expo (ICME) in 202
Learning to Imagine: Visually-Augmented Natural Language Generation
People often imagine relevant scenes to aid in the writing process. In this
work, we aim to utilize visual information for composition in the same manner
as humans. We propose a method, LIVE, that makes pre-trained language models
(PLMs) Learn to Imagine for Visuallyaugmented natural language gEneration.
First, we imagine the scene based on the text: we use a diffusion model to
synthesize high-quality images conditioned on the input texts. Second, we use
CLIP to determine whether the text can evoke the imagination in a posterior
way. Finally, our imagination is dynamic, and we conduct synthesis for each
sentence rather than generate only one image for an entire paragraph.
Technically, we propose a novel plug-and-play fusion layer to obtain
visually-augmented representations for each text. Our vision-text fusion layer
is compatible with Transformerbased architecture. We have conducted extensive
experiments on four generation tasks using BART and T5, and the automatic
results and human evaluation demonstrate the effectiveness of our proposed
method. We will release the code, model, and data at the link:
https://github.com/RUCAIBox/LIVE.Comment: Accepted by ACL 202
- …