249 research outputs found
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
This paper presents that the masked-modeling principle driving the success of
large foundational vision models can be effectively applied to audio by making
predictions in a latent space. We introduce Audio-based Joint-Embedding
Predictive Architecture (A-JEPA), a simple extension method for self-supervised
learning from the audio spectrum. Following the design of I-JEPA, our A-JEPA
encodes visible audio spectrogram patches with a curriculum masking strategy
via context encoder, and predicts the representations of regions sampled at
well-designed locations. The target representations of those regions are
extracted by the exponential moving average of context encoder, \emph{i.e.},
target encoder, on the whole spectrogram. We find it beneficial to transfer
random block masking into time-frequency aware masking in a curriculum manner,
considering the complexity of highly correlated in local time and frequency in
audio spectrograms. To enhance contextual semantic understanding and
robustness, we fine-tune the encoder with a regularized masking on target
datasets, instead of input dropping or zero. Empirically, when built with
Vision Transformers structure, we find A-JEPA to be highly scalable and sets
new state-of-the-art performance on multiple audio and speech classification
tasks, outperforming other recent models that use externally supervised
pre-training.Comment: arXiv admin note: text overlap with arXiv:2207.06405 by other author
Electronic-Mechanical Coupling in Graphene from in situ Nanoindentation Experiments and Multiscale Atomistic Simulations
We present the in situ nanoindentation experiments performed on suspended graphene devices to introduce homogeneous tensile strain,
while simultaneously carrying out electrical measurements. We find that
the electrical resistance shows only a marginal change even under
severe strain, and the electronic transport measurement confirms that
there is no band gap opening for graphene under moderate uniform
strain, which is consistent with our results from the first-principles
informed molecular dynamics simulation
Uncover the Premeditated Attacks: Detecting Exploitable Reentrancy Vulnerabilities by Identifying Attacker Contracts
Reentrancy, a notorious vulnerability in smart contracts, has led to millions
of dollars in financial loss. However, current smart contract vulnerability
detection tools suffer from a high false positive rate in identifying contracts
with reentrancy vulnerabilities. Moreover, only a small portion of the detected
reentrant contracts can actually be exploited by hackers, making these tools
less effective in securing the Ethereum ecosystem in practice.
In this paper, we propose BlockWatchdog, a tool that focuses on detecting
reentrancy vulnerabilities by identifying attacker contracts. These attacker
contracts are deployed by hackers to exploit vulnerable contracts
automatically. By focusing on attacker contracts, BlockWatchdog effectively
detects truly exploitable reentrancy vulnerabilities by identifying reentrant
call flow. Additionally, BlockWatchdog is capable of detecting new types of
reentrancy vulnerabilities caused by poor designs when using ERC tokens or
user-defined interfaces, which cannot be detected by current rule-based tools.
We implement BlockWatchdog using cross-contract static dataflow techniques
based on attack logic obtained from an empirical study that analyzes attacker
contracts from 281 attack incidents. BlockWatchdog is evaluated on 421,889
Ethereum contract bytecodes and identifies 113 attacker contracts that target
159 victim contracts, leading to the theft of Ether and tokens valued at
approximately 908.6 million USD. Notably, only 18 of the identified 159 victim
contracts can be reported by current reentrancy detection tools.Comment: Accepted by ICSE 202
Scalable Diffusion Models with State Space Backbone
This paper presents a new exploration into a category of diffusion models
built upon state space architecture. We endeavor to train diffusion models for
image data, wherein the traditional U-Net backbone is supplanted by a state
space backbone, functioning on raw patches or latent space. Given its notable
efficacy in accommodating long-range dependencies, Diffusion State Space Models
(DiS) are distinguished by treating all inputs including time, condition, and
noisy image patches as tokens. Our assessment of DiS encompasses both
unconditional and class-conditional image generation scenarios, revealing that
DiS exhibits comparable, if not superior, performance to CNN-based or
Transformer-based U-Net architectures of commensurate size. Furthermore, we
analyze the scalability of DiS, gauged by the forward pass complexity
quantified in Gflops. DiS models with higher Gflops, achieved through
augmentation of depth/width or augmentation of input tokens, consistently
demonstrate lower FID. In addition to demonstrating commendable scalability
characteristics, DiS-H/2 models in latent space achieve performance levels akin
to prior diffusion models on class-conditional ImageNet benchmarks at the
resolution of 256256 and 512512, while significantly reducing
the computational burden. The code and models are available at:
https://github.com/feizc/DiS
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Comprehensive Survey
Diffusion models and large language models have emerged as leading-edge
generative models and have sparked a revolutionary impact on various aspects of
human life. However, the practical implementation of these models has also
exposed inherent risks, highlighting their dual nature and raising concerns
regarding their trustworthiness. Despite the abundance of literature on this
subject, a comprehensive survey specifically delving into the intersection of
large-scale generative models and their trustworthiness remains largely absent.
To bridge this gap, This paper investigates both the long-standing and emerging
threats associated with these models across four fundamental dimensions:
privacy, security, fairness, and responsibility. In this way, we construct an
extensive map outlining the trustworthiness of these models, while also
providing practical recommendations and identifying future directions. These
efforts are crucial for promoting the trustworthy deployment of these models,
ultimately benefiting society as a whole.Comment: draft versio
Interfacial energy release rates of SiN/GaAs film/substrate systems determined using a cyclic loading dual-indentation method
Our previous study developed a dual-indentation method for testing the interfacial energy release rate, Gin, of the SiN/GaAs film/substrate systems. However, for the film/substrate systems with relatively high interfacial toughness, the dual-indentation method was unable to generate interfacial delamination. In this study, a cyclic loading dual-indentation method was proposed, in which the first monotonic loading in the dual-indentation method was replaced by cyclic loading. It was demonstrated that cyclic loading was effective at inducing delamination in relatively "tough" SiN/GaAs interfaces that were unable to be delaminated by dual-indentation method. The Gin values obtained from the cyclic loading indentation were in good agreement with those obtained from the dual-indentation method for the less tough interfaces. The delamination mechanism in the cyclic loading indentation was attributed to the hardening effect on the films induced by cyclic loading, permitting sufficient elastic strain energy to be accumulated to initiate the delamination
To Healthier Ethereum: A Comprehensive and Iterative Smart Contract Weakness Enumeration
With the increasing popularity of cryptocurrencies and blockchain technology,
smart contracts have become a prominent feature in developing decentralized
applications. However, these smart contracts are susceptible to vulnerabilities
that hackers can exploit, resulting in significant financial losses. In
response to this growing concern, various initiatives have emerged. Notably,
the SWC vulnerability list played an important role in raising awareness and
understanding of smart contract weaknesses. However, the SWC list lacks
maintenance and has not been updated with new vulnerabilities since 2020. To
address this gap, this paper introduces the Smart Contract Weakness Enumeration
(SWE), a comprehensive and practical vulnerability list up until 2023. We
collect 273 vulnerability descriptions from 86 top conference papers and
journal papers, employing open card sorting techniques to deduplicate and
categorize these descriptions. This process results in the identification of 40
common contract weaknesses, which are further classified into 20 sub-research
fields through thorough discussion and analysis. SWE provides a systematic and
comprehensive list of smart contract vulnerabilities, covering existing and
emerging vulnerabilities in the last few years. Moreover, SWE is a scalable,
continuously iterative program. We propose two update mechanisms for the
maintenance of SWE. Regular updates involve the inclusion of new
vulnerabilities from future top papers, while irregular updates enable
individuals to report new weaknesses for review and potential addition to SWE
On the Robustness of Split Learning against Adversarial Attacks
Split learning enables collaborative deep learning model training while
preserving data privacy and model security by avoiding direct sharing of raw
data and model details (i.e., sever and clients only hold partial sub-networks
and exchange intermediate computations). However, existing research has mainly
focused on examining its reliability for privacy protection, with little
investigation into model security. Specifically, by exploring full models,
attackers can launch adversarial attacks, and split learning can mitigate this
severe threat by only disclosing part of models to untrusted servers.This paper
aims to evaluate the robustness of split learning against adversarial attacks,
particularly in the most challenging setting where untrusted servers only have
access to the intermediate layers of the model.Existing adversarial attacks
mostly focus on the centralized setting instead of the collaborative setting,
thus, to better evaluate the robustness of split learning, we develop a
tailored attack called SPADV, which comprises two stages: 1) shadow model
training that addresses the issue of lacking part of the model and 2) local
adversarial attack that produces adversarial examples to evaluate.The first
stage only requires a few unlabeled non-IID data, and, in the second stage,
SPADV perturbs the intermediate output of natural samples to craft the
adversarial ones. The overall cost of the proposed attack process is relatively
low, yet the empirical attack effectiveness is significantly high,
demonstrating the surprising vulnerability of split learning to adversarial
attacks.Comment: accepted by ECAI 2023, camera-ready versio
Progressive Denoising Model for Fine-Grained Text-to-Image Generation
Recently, vector quantized autoregressive (VQ-AR) models have shown
remarkable results in text-to-image synthesis by equally predicting discrete
image tokens from the top left to bottom right in the latent space. Although
the simple generative process surprisingly works well, is this the best way to
generate the image? For instance, human creation is more inclined to the
outline-to-fine of an image, while VQ-AR models themselves do not consider any
relative importance of each component. In this paper, we present a progressive
denoising model for high-fidelity text-to-image image generation. The proposed
method takes effect by creating new image tokens from coarse to fine based on
the existing context in a parallel manner and this procedure is recursively
applied until an image sequence is completed. The resulting coarse-to-fine
hierarchy makes the image generation process intuitive and interpretable.
Extensive experiments demonstrate that the progressive model produces
significantly better results when compared with the previous VQ-AR method in
FID score across a wide variety of categories and aspects. Moreover, the
text-to-image generation time of traditional AR increases linearly with the
output image resolution and hence is quite time-consuming even for normal-size
images. In contrast, our approach allows achieving a better trade-off between
generation quality and speed.Comment: Technique report. arXiv admin note: text overlap with
arXiv:2206.10789 by other author
- …