79 research outputs found
Face Alignment Assisted by Head Pose Estimation
In this paper we propose a supervised initialization scheme for cascaded face
alignment based on explicit head pose estimation. We first investigate the
failure cases of most state of the art face alignment approaches and observe
that these failures often share one common global property, i.e. the head pose
variation is usually large. Inspired by this, we propose a deep convolutional
network model for reliable and accurate head pose estimation. Instead of using
a mean face shape, or randomly selected shapes for cascaded face alignment
initialisation, we propose two schemes for generating initialisation: the first
one relies on projecting a mean 3D face shape (represented by 3D facial
landmarks) onto 2D image under the estimated head pose; the second one searches
nearest neighbour shapes from the training set according to head pose distance.
By doing so, the initialisation gets closer to the actual shape, which enhances
the possibility of convergence and in turn improves the face alignment
performance. We demonstrate the proposed method on the benchmark 300W dataset
and show very competitive performance in both head pose estimation and face
alignment.Comment: Accepted by BMVC201
Text Alignment Is An Efficient Unified Model for Massive NLP Tasks
Large language models (LLMs), typically designed as a function of next-word
prediction, have excelled across extensive NLP tasks. Despite the generality,
next-word prediction is often not an efficient formulation for many of the
tasks, demanding an extreme scale of model parameters (10s or 100s of billions)
and sometimes yielding suboptimal performance. In practice, it is often
desirable to build more efficient models -- despite being less versatile, they
still apply to a substantial subset of problems, delivering on par or even
superior performance with much smaller model sizes. In this paper, we propose
text alignment as an efficient unified model for a wide range of crucial tasks
involving text entailment, similarity, question answering (and answerability),
factual consistency, and so forth. Given a pair of texts, the model measures
the degree of alignment between their information. We instantiate an alignment
model (Align) through lightweight finetuning of RoBERTa (355M parameters) using
5.9M examples from 28 datasets. Despite its compact size, extensive experiments
show the model's efficiency and strong performance: (1) On over 20 datasets of
aforementioned diverse tasks, the model matches or surpasses FLAN-T5 models
that have around 2x or 10x more parameters; the single unified model also
outperforms task-specific models finetuned on individual datasets; (2) When
applied to evaluate factual consistency of language generation on 23 datasets,
our model improves over various baselines, including the much larger GPT-3.5
(ChatGPT) and sometimes even GPT-4; (3) The lightweight model can also serve as
an add-on component for LLMs such as GPT-3.5 in question answering tasks,
improving the average exact match (EM) score by 17.94 and F1 score by 15.05
through identifying unanswerable questions.Comment: NeurIPS 2023 Camera Ready, Code available at
https://github.com/yuh-zha/Alig
Revisiting the Evaluation of Image Synthesis with GANs
A good metric, which promises a reliable comparison between solutions, is
essential for any well-defined task. Unlike most vision tasks that have
per-sample ground-truth, image synthesis tasks target generating unseen data
and hence are usually evaluated through a distributional distance between one
set of real samples and another set of generated samples. This study presents
an empirical investigation into the evaluation of synthesis performance, with
generative adversarial networks (GANs) as a representative of generative
models. In particular, we make in-depth analyses of various factors, including
how to represent a data point in the representation space, how to calculate a
fair distance using selected samples, and how many instances to use from each
set. Extensive experiments conducted on multiple datasets and settings reveal
several important findings. Firstly, a group of models that include both
CNN-based and ViT-based architectures serve as reliable and robust feature
extractors for measurement evaluation. Secondly, Centered Kernel Alignment
(CKA) provides a better comparison across various extractors and hierarchical
layers in one model. Finally, CKA is more sample-efficient and enjoys better
agreement with human judgment in characterizing the similarity between two
internal data correlations. These findings contribute to the development of a
new measurement system, which enables a consistent and reliable re-evaluation
of current state-of-the-art generative models.Comment: NeurIPS 2023 datasets and benchmarks trac
Rethinking Model Ensemble in Transfer-based Adversarial Attacks
It is widely recognized that deep learning models lack robustness to
adversarial examples. An intriguing property of adversarial examples is that
they can transfer across different models, which enables black-box attacks
without any knowledge of the victim model. An effective strategy to improve the
transferability is attacking an ensemble of models. However, previous works
simply average the outputs of different models, lacking an in-depth analysis on
how and why model ensemble methods can strongly improve the transferability. In
this paper, we rethink the ensemble in adversarial attacks and define the
common weakness of model ensemble with two properties: 1) the flatness of loss
landscape; and 2) the closeness to the local optimum of each model. We
empirically and theoretically show that both properties are strongly correlated
with the transferability and propose a Common Weakness Attack (CWA) to generate
more transferable adversarial examples by promoting these two properties.
Experimental results on both image classification and object detection tasks
validate the effectiveness of our approach to improving the adversarial
transferability, especially when attacking adversarially trained models. We
also successfully apply our method to attack a black-box large vision-language
model -- Google's Bard, showing the practical effectiveness. Code is available
at \url{https://github.com/huanranchen/AdversarialAttacks}
Federated Learning with Quantum Secure Aggregation
This article illustrates a novel Quantum Secure Aggregation (QSA) scheme that
is designed to provide highly secure and efficient aggregation of local model
parameters for federated learning. The scheme is secure in protecting private
model parameters from being disclosed to semi-honest attackers by utilizing
quantum bits i.e. qubits to represent model parameters. The proposed security
mechanism ensures that any attempts to eavesdrop private model parameters can
be immediately detected and stopped. The scheme is also efficient in terms of
the low computational complexity of transmitting and aggregating model
parameters through entangled qubits. Benefits of the proposed QSA scheme are
showcased in a horizontal federated learning setting in which both a
centralized and decentralized architectures are taken into account. It was
empirically demonstrated that the proposed QSA can be readily applied to
aggregate different types of local models including logistic regression (LR),
convolutional neural networks (CNN) as well as quantum neural network (QNN),
indicating the versatility of the QSA scheme. Performances of global models are
improved to various extents with respect to local models obtained by individual
participants, while no private model parameters are disclosed to semi-honest
adversaries
How Robust is Google's Bard to Adversarial Image Attacks?
Multimodal Large Language Models (MLLMs) that integrate text and other
modalities (especially vision) have achieved unprecedented performance in
various multimodal tasks. However, due to the unsolved adversarial robustness
problem of vision models, MLLMs can have more severe safety and security risks
by introducing the vision inputs. In this work, we study the adversarial
robustness of Google's Bard, a competitive chatbot to ChatGPT that released its
multimodal capability recently, to better understand the vulnerabilities of
commercial MLLMs. By attacking white-box surrogate vision encoders or MLLMs,
the generated adversarial examples can mislead Bard to output wrong image
descriptions with a 22% success rate based solely on the transferability. We
show that the adversarial examples can also attack other MLLMs, e.g., a 26%
attack success rate against Bing Chat and a 86% attack success rate against
ERNIE bot. Moreover, we identify two defense mechanisms of Bard, including face
detection and toxicity detection of images. We design corresponding attacks to
evade these defenses, demonstrating that the current defenses of Bard are also
vulnerable. We hope this work can deepen our understanding on the robustness of
MLLMs and facilitate future research on defenses. Our code is available at
https://github.com/thu-ml/Attack-Bard.
Update: GPT-4V is available at October 2023. We further evaluate its
robustness under the same set of adversarial examples, achieving a 45% attack
success rate.Comment: Technical repor
Overcoming the Size Limit of First Principles Molecular Dynamics Simulations with an In-Distribution Substructure Embedding Active Learner
Large-scale first principles molecular dynamics are crucial for simulating
complex processes in chemical, biomedical, and materials sciences. However, the
unfavorable time complexity with respect to system sizes leads to prohibitive
computational costs when the simulation contains over a few hundred atoms in
practice. We present an In-Distribution substructure Embedding Active Learner
(IDEAL) to enable efficient simulation of large complex systems with quantum
accuracy by maintaining a machine learning force field (MLFF) as an accurate
surrogate to the first principles methods. By extracting high-uncertainty
substructures into low-uncertainty atom environments, the active learner is
allowed to concentrate on and learn from small substructures of interest rather
than carrying out intractable quantum chemical computations on large
structures. IDEAL is benchmarked on various systems and shows sub-linear
complexity, accelerating the simulation thousands of times compared with
conventional active learning and millions of times compared with pure first
principles simulations. To demonstrate the capability of IDEAL in practical
applications, we simulated a polycrystalline lithium system composed of one
million atoms and the full ammonia formation process in a Haber-Bosch reaction
on a 3-nm Iridium nanoparticle catalyst on a computing node comprising one
single A100 GPU and 24 CPU cores
Construction of a serum diagnostic signature based on m5C-related miRNAs for cancer detection
Currently, no clinically relevant non-invasive biomarkers are available for screening of multiple cancer types. In this study, we developed a serum diagnostic signature based on 5-methylcytosine (m5C)-related miRNAs (m5C-miRNAs) for multiple-cancer detection. Serum miRNA expression data and the corresponding clinical information of patients were collected from the Gene Expression Omnibus database. Serum samples were then randomly assigned to the training or validation cohort at a 1:1 ratio. Using the identified m5C-miRNAs, an m5C-miRNA signature for cancer detection was established using a support vector machine algorithm. The constructed m5C-miRNA signature displayed excellent accuracy, and its areas under the curve were 0.977, 0.934, and 0.965 in the training cohort, validation cohort, and combined training and validation cohort, respectively. Moreover, the diagnostic capability of the m5C-miRNA signature was unaffected by patient age or sex or the presence of noncancerous disease. The m5C-miRNA signature also displayed satisfactory performance for distinguishing tumor types. Importantly, in the detection of early-stage cancers, the diagnostic performance of the m5C-miRNA signature was obviously superior to that of conventional tumor biomarkers. In summary, this work revealed the value of serum m5C-miRNAs in cancer detection and provided a new strategy for developing non-invasive and cost effective tools for large-scale cancer screening
Electrical 180o switching of N\'eel vector in spin-splitting antiferromagnet
Antiferromagnetic spintronics have attracted wide attention due to its great
potential in constructing ultra-dense and ultra-fast antiferromagnetic memory
that suits modern high-performance information technology. The electrical 180o
switching of N\'eel vector is a long-term goal for developing
electrical-controllable antiferromagnetic memory with opposite N\'eel vectors
as binary "0" and "1". However, the state-of-art antiferromagnetic switching
mechanisms have long been limited for 90o or 120o switching of N\'eel vector,
which unavoidably require multiple writing channels that contradicts
ultra-dense integration. Here, we propose a deterministic switching mechanism
based on spin-orbit torque with asymmetric energy barrier, and experimentally
achieve electrical 180o switching of spin-splitting antiferromagnet Mn5Si3.
Such a 180o switching is read out by the N\'eel vector-induced anomalous Hall
effect. Based on our writing and readout methods, we fabricate an
antiferromagnet device with electrical-controllable high and low resistance
states that accomplishes robust write and read cycles. Besides fundamental
advance, our work promotes practical spin-splitting antiferromagnetic devices
based on spin-splitting antiferromagnet.Comment: 19 pages, 4 figure
- …