1,717 research outputs found

    Hypergraph Neural Networks

    Full text link
    In this paper, we present a hypergraph neural networks (HGNN) framework for data representation learning, which can encode high-order data correlation in a hypergraph structure. Confronting the challenges of learning representation for complex data in real practice, we propose to incorporate such data structure in a hypergraph, which is more flexible on data modeling, especially when dealing with complex data. In this method, a hyperedge convolution operation is designed to handle the data correlation during representation learning. In this way, traditional hypergraph learning procedure can be conducted using hyperedge convolution operations efficiently. HGNN is able to learn the hidden layer representation considering the high-order data structure, which is a general framework considering the complex data correlations. We have conducted experiments on citation network classification and visual object recognition tasks and compared HGNN with graph convolutional networks and other traditional methods. Experimental results demonstrate that the proposed HGNN method outperforms recent state-of-the-art methods. We can also reveal from the results that the proposed HGNN is superior when dealing with multi-modal data compared with existing methods.Comment: Accepted in AAAI'201

    Efficient Query-Based Attack against ML-Based Android Malware Detection under Zero Knowledge Setting

    Full text link
    The widespread adoption of the Android operating system has made malicious Android applications an appealing target for attackers. Machine learning-based (ML-based) Android malware detection (AMD) methods are crucial in addressing this problem; however, their vulnerability to adversarial examples raises concerns. Current attacks against ML-based AMD methods demonstrate remarkable performance but rely on strong assumptions that may not be realistic in real-world scenarios, e.g., the knowledge requirements about feature space, model parameters, and training dataset. To address this limitation, we introduce AdvDroidZero, an efficient query-based attack framework against ML-based AMD methods that operates under the zero knowledge setting. Our extensive evaluation shows that AdvDroidZero is effective against various mainstream ML-based AMD methods, in particular, state-of-the-art such methods and real-world antivirus solutions.Comment: To Appear in the ACM Conference on Computer and Communications Security, November, 202

    Static Semantics Reconstruction for Enhancing JavaScript-WebAssembly Multilingual Malware Detection

    Full text link
    The emergence of WebAssembly allows attackers to hide the malicious functionalities of JavaScript malware in cross-language interoperations, termed JavaScript-WebAssembly multilingual malware (JWMM). However, existing anti-virus solutions based on static program analysis are still limited to monolingual code. As a result, their detection effectiveness decreases significantly against JWMM. The detection of JWMM is challenging due to the complex interoperations and semantic diversity between JavaScript and WebAssembly. To bridge this gap, we present JWBinder, the first technique aimed at enhancing the static detection of JWMM. JWBinder performs a language-specific data-flow analysis to capture the cross-language interoperations and then characterizes the functionalities of JWMM through a unified high-level structure called Inter-language Program Dependency Graph. The extensive evaluation on one of the most representative real-world anti-virus platforms, VirusTotal, shows that \system effectively enhances anti-virus systems from various vendors and increases the overall successful detection rate against JWMM from 49.1\% to 86.2\%. Additionally, we assess the side effects and runtime overhead of JWBinder, corroborating its practical viability in real-world applications.Comment: Accepted to ESORICS 202

    OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving

    Full text link
    Visual Odometry (VO) plays a pivotal role in autonomous systems, with a principal challenge being the lack of depth information in camera images. This paper introduces OCC-VO, a novel framework that capitalizes on recent advances in deep learning to transform 2D camera images into 3D semantic occupancy, thereby circumventing the traditional need for concurrent estimation of ego poses and landmark locations. Within this framework, we utilize the TPV-Former to convert surround view cameras' images into 3D semantic occupancy. Addressing the challenges presented by this transformation, we have specifically tailored a pose estimation and mapping algorithm that incorporates Semantic Label Filter, Dynamic Object Filter, and finally, utilizes Voxel PFilter for maintaining a consistent global semantic map. Evaluations on the Occ3D-nuScenes not only showcase a 20.6% improvement in Success Ratio and a 29.6% enhancement in trajectory accuracy against ORB-SLAM3, but also emphasize our ability to construct a comprehensive map. Our implementation is open-sourced and available at: https://github.com/USTCLH/OCC-VO.Comment: 7pages, 3 figure

    Linear-Communication Asynchronous Complete Secret Sharing with Optimal Resilience

    Get PDF
    Secure multiparty computation (MPC) allows a set of nn parties to jointly compute a function on their private inputs. In this work, we focus on the information-theoretic MPC in the \emph{asynchronous network} setting with optimal resilience (t<n/3t<n/3). The best-known result in this setting is achieved by Choudhury and Patra [J. Cryptol \u2723], which requires O(n4κ)O(n^4\kappa) bits per multiplication gate, where κ\kappa is the size of a field element. An asynchronous complete secret sharing (ACSS) protocol allows a dealer to share a batch of Shamir sharings such that all parties eventually receive their shares. ACSS is an important building block in AMPC. The best-known result of ACSS is due to Choudhury and Patra [J. Cryptol \u2723], which requires O(n3κ)O(n^3\kappa) bits per sharing. On the other hand, in the synchronous setting, it is known that distributing Shamir sharings can be achieved with O(nκ)O(n\kappa) bits per sharing. There is a gap of n2n^2 in the communication between the synchronous setting and the asynchronous setting. Our work closes this gap by presenting the first ACSS protocol that achieves O(nκ)O(n\kappa) bits per sharing. When combined with the compiler from ACSS to AMPC by Choudhury and Patra [IEEE Trans. Inf. Theory \u2717], we obtain an AMPC with O(n2κ)O(n^2\kappa) bits per multiplication gate, improving the previously best-known result by a factor of n2n^2. Moreover, with a concurrent work that improves the compiler by Choudhury and Patra by a factor of nn, we obtain the first AMPC with O(nκ)O(n\kappa) bits per multiplication gate

    P3OP^{3}O: Transferring Visual Representations for Reinforcement Learning via Prompting

    Full text link
    It is important for deep reinforcement learning (DRL) algorithms to transfer their learned policies to new environments that have different visual inputs. In this paper, we introduce Prompt based Proximal Policy Optimization (P3OP^{3}O), a three-stage DRL algorithm that transfers visual representations from a target to a source environment by applying prompting. The process of P3OP^{3}O consists of three stages: pre-training, prompting, and predicting. In particular, we specify a prompt-transformer for representation conversion and propose a two-step training process to train the prompt-transformer for the target environment, while the rest of the DRL pipeline remains unchanged. We implement P3OP^{3}O and evaluate it on the OpenAI CarRacing video game. The experimental results show that P3OP^{3}O outperforms the state-of-the-art visual transferring schemes. In particular, P3OP^{3}O allows the learned policies to perform well in environments with different visual inputs, which is much more effective than retraining the policies in these environments.Comment: This paper has been accepted to be presented at the upcoming IEEE International Conference on Multimedia & Expo (ICME) in 202

    Learning to Imagine: Visually-Augmented Natural Language Generation

    Full text link
    People often imagine relevant scenes to aid in the writing process. In this work, we aim to utilize visual information for composition in the same manner as humans. We propose a method, LIVE, that makes pre-trained language models (PLMs) Learn to Imagine for Visuallyaugmented natural language gEneration. First, we imagine the scene based on the text: we use a diffusion model to synthesize high-quality images conditioned on the input texts. Second, we use CLIP to determine whether the text can evoke the imagination in a posterior way. Finally, our imagination is dynamic, and we conduct synthesis for each sentence rather than generate only one image for an entire paragraph. Technically, we propose a novel plug-and-play fusion layer to obtain visually-augmented representations for each text. Our vision-text fusion layer is compatible with Transformerbased architecture. We have conducted extensive experiments on four generation tasks using BART and T5, and the automatic results and human evaluation demonstrate the effectiveness of our proposed method. We will release the code, model, and data at the link: https://github.com/RUCAIBox/LIVE.Comment: Accepted by ACL 202
    corecore