249 research outputs found

    A-JEPA: Joint-Embedding Predictive Architecture Can Listen

    Full text link
    This paper presents that the masked-modeling principle driving the success of large foundational vision models can be effectively applied to audio by making predictions in a latent space. We introduce Audio-based Joint-Embedding Predictive Architecture (A-JEPA), a simple extension method for self-supervised learning from the audio spectrum. Following the design of I-JEPA, our A-JEPA encodes visible audio spectrogram patches with a curriculum masking strategy via context encoder, and predicts the representations of regions sampled at well-designed locations. The target representations of those regions are extracted by the exponential moving average of context encoder, \emph{i.e.}, target encoder, on the whole spectrogram. We find it beneficial to transfer random block masking into time-frequency aware masking in a curriculum manner, considering the complexity of highly correlated in local time and frequency in audio spectrograms. To enhance contextual semantic understanding and robustness, we fine-tune the encoder with a regularized masking on target datasets, instead of input dropping or zero. Empirically, when built with Vision Transformers structure, we find A-JEPA to be highly scalable and sets new state-of-the-art performance on multiple audio and speech classification tasks, outperforming other recent models that use externally supervised pre-training.Comment: arXiv admin note: text overlap with arXiv:2207.06405 by other author

    Electronic-Mechanical Coupling in Graphene from in situ Nanoindentation Experiments and Multiscale Atomistic Simulations

    Get PDF
    We present the in situ nanoindentation experiments performed on suspended graphene devices to introduce homogeneous tensile strain, while simultaneously carrying out electrical measurements. We find that the electrical resistance shows only a marginal change even under severe strain, and the electronic transport measurement confirms that there is no band gap opening for graphene under moderate uniform strain, which is consistent with our results from the first-principles informed molecular dynamics simulation

    Uncover the Premeditated Attacks: Detecting Exploitable Reentrancy Vulnerabilities by Identifying Attacker Contracts

    Full text link
    Reentrancy, a notorious vulnerability in smart contracts, has led to millions of dollars in financial loss. However, current smart contract vulnerability detection tools suffer from a high false positive rate in identifying contracts with reentrancy vulnerabilities. Moreover, only a small portion of the detected reentrant contracts can actually be exploited by hackers, making these tools less effective in securing the Ethereum ecosystem in practice. In this paper, we propose BlockWatchdog, a tool that focuses on detecting reentrancy vulnerabilities by identifying attacker contracts. These attacker contracts are deployed by hackers to exploit vulnerable contracts automatically. By focusing on attacker contracts, BlockWatchdog effectively detects truly exploitable reentrancy vulnerabilities by identifying reentrant call flow. Additionally, BlockWatchdog is capable of detecting new types of reentrancy vulnerabilities caused by poor designs when using ERC tokens or user-defined interfaces, which cannot be detected by current rule-based tools. We implement BlockWatchdog using cross-contract static dataflow techniques based on attack logic obtained from an empirical study that analyzes attacker contracts from 281 attack incidents. BlockWatchdog is evaluated on 421,889 Ethereum contract bytecodes and identifies 113 attacker contracts that target 159 victim contracts, leading to the theft of Ether and tokens valued at approximately 908.6 million USD. Notably, only 18 of the identified 159 victim contracts can be reported by current reentrancy detection tools.Comment: Accepted by ICSE 202

    Scalable Diffusion Models with State Space Backbone

    Full text link
    This paper presents a new exploration into a category of diffusion models built upon state space architecture. We endeavor to train diffusion models for image data, wherein the traditional U-Net backbone is supplanted by a state space backbone, functioning on raw patches or latent space. Given its notable efficacy in accommodating long-range dependencies, Diffusion State Space Models (DiS) are distinguished by treating all inputs including time, condition, and noisy image patches as tokens. Our assessment of DiS encompasses both unconditional and class-conditional image generation scenarios, revealing that DiS exhibits comparable, if not superior, performance to CNN-based or Transformer-based U-Net architectures of commensurate size. Furthermore, we analyze the scalability of DiS, gauged by the forward pass complexity quantified in Gflops. DiS models with higher Gflops, achieved through augmentation of depth/width or augmentation of input tokens, consistently demonstrate lower FID. In addition to demonstrating commendable scalability characteristics, DiS-H/2 models in latent space achieve performance levels akin to prior diffusion models on class-conditional ImageNet benchmarks at the resolution of 256×\times256 and 512×\times512, while significantly reducing the computational burden. The code and models are available at: https://github.com/feizc/DiS

    On the Trustworthiness Landscape of State-of-the-art Generative Models: A Comprehensive Survey

    Full text link
    Diffusion models and large language models have emerged as leading-edge generative models and have sparked a revolutionary impact on various aspects of human life. However, the practical implementation of these models has also exposed inherent risks, highlighting their dual nature and raising concerns regarding their trustworthiness. Despite the abundance of literature on this subject, a comprehensive survey specifically delving into the intersection of large-scale generative models and their trustworthiness remains largely absent. To bridge this gap, This paper investigates both the long-standing and emerging threats associated with these models across four fundamental dimensions: privacy, security, fairness, and responsibility. In this way, we construct an extensive map outlining the trustworthiness of these models, while also providing practical recommendations and identifying future directions. These efforts are crucial for promoting the trustworthy deployment of these models, ultimately benefiting society as a whole.Comment: draft versio

    Interfacial energy release rates of SiN/GaAs film/substrate systems determined using a cyclic loading dual-indentation method

    Get PDF
    Our previous study developed a dual-indentation method for testing the interfacial energy release rate, Gin, of the SiN/GaAs film/substrate systems. However, for the film/substrate systems with relatively high interfacial toughness, the dual-indentation method was unable to generate interfacial delamination. In this study, a cyclic loading dual-indentation method was proposed, in which the first monotonic loading in the dual-indentation method was replaced by cyclic loading. It was demonstrated that cyclic loading was effective at inducing delamination in relatively "tough" SiN/GaAs interfaces that were unable to be delaminated by dual-indentation method. The Gin values obtained from the cyclic loading indentation were in good agreement with those obtained from the dual-indentation method for the less tough interfaces. The delamination mechanism in the cyclic loading indentation was attributed to the hardening effect on the films induced by cyclic loading, permitting sufficient elastic strain energy to be accumulated to initiate the delamination

    To Healthier Ethereum: A Comprehensive and Iterative Smart Contract Weakness Enumeration

    Full text link
    With the increasing popularity of cryptocurrencies and blockchain technology, smart contracts have become a prominent feature in developing decentralized applications. However, these smart contracts are susceptible to vulnerabilities that hackers can exploit, resulting in significant financial losses. In response to this growing concern, various initiatives have emerged. Notably, the SWC vulnerability list played an important role in raising awareness and understanding of smart contract weaknesses. However, the SWC list lacks maintenance and has not been updated with new vulnerabilities since 2020. To address this gap, this paper introduces the Smart Contract Weakness Enumeration (SWE), a comprehensive and practical vulnerability list up until 2023. We collect 273 vulnerability descriptions from 86 top conference papers and journal papers, employing open card sorting techniques to deduplicate and categorize these descriptions. This process results in the identification of 40 common contract weaknesses, which are further classified into 20 sub-research fields through thorough discussion and analysis. SWE provides a systematic and comprehensive list of smart contract vulnerabilities, covering existing and emerging vulnerabilities in the last few years. Moreover, SWE is a scalable, continuously iterative program. We propose two update mechanisms for the maintenance of SWE. Regular updates involve the inclusion of new vulnerabilities from future top papers, while irregular updates enable individuals to report new weaknesses for review and potential addition to SWE

    On the Robustness of Split Learning against Adversarial Attacks

    Full text link
    Split learning enables collaborative deep learning model training while preserving data privacy and model security by avoiding direct sharing of raw data and model details (i.e., sever and clients only hold partial sub-networks and exchange intermediate computations). However, existing research has mainly focused on examining its reliability for privacy protection, with little investigation into model security. Specifically, by exploring full models, attackers can launch adversarial attacks, and split learning can mitigate this severe threat by only disclosing part of models to untrusted servers.This paper aims to evaluate the robustness of split learning against adversarial attacks, particularly in the most challenging setting where untrusted servers only have access to the intermediate layers of the model.Existing adversarial attacks mostly focus on the centralized setting instead of the collaborative setting, thus, to better evaluate the robustness of split learning, we develop a tailored attack called SPADV, which comprises two stages: 1) shadow model training that addresses the issue of lacking part of the model and 2) local adversarial attack that produces adversarial examples to evaluate.The first stage only requires a few unlabeled non-IID data, and, in the second stage, SPADV perturbs the intermediate output of natural samples to craft the adversarial ones. The overall cost of the proposed attack process is relatively low, yet the empirical attack effectiveness is significantly high, demonstrating the surprising vulnerability of split learning to adversarial attacks.Comment: accepted by ECAI 2023, camera-ready versio

    Progressive Denoising Model for Fine-Grained Text-to-Image Generation

    Full text link
    Recently, vector quantized autoregressive (VQ-AR) models have shown remarkable results in text-to-image synthesis by equally predicting discrete image tokens from the top left to bottom right in the latent space. Although the simple generative process surprisingly works well, is this the best way to generate the image? For instance, human creation is more inclined to the outline-to-fine of an image, while VQ-AR models themselves do not consider any relative importance of each component. In this paper, we present a progressive denoising model for high-fidelity text-to-image image generation. The proposed method takes effect by creating new image tokens from coarse to fine based on the existing context in a parallel manner and this procedure is recursively applied until an image sequence is completed. The resulting coarse-to-fine hierarchy makes the image generation process intuitive and interpretable. Extensive experiments demonstrate that the progressive model produces significantly better results when compared with the previous VQ-AR method in FID score across a wide variety of categories and aspects. Moreover, the text-to-image generation time of traditional AR increases linearly with the output image resolution and hence is quite time-consuming even for normal-size images. In contrast, our approach allows achieving a better trade-off between generation quality and speed.Comment: Technique report. arXiv admin note: text overlap with arXiv:2206.10789 by other author
    • …
    corecore