665 research outputs found
Deep Neural Networks Guided Ensemble Learning for Point Estimation
In modern statistics, interests shift from pursuing the uniformly minimum
variance unbiased estimator to reducing mean squared error (MSE) or residual
squared error. Shrinkage based estimation and regression methods offer better
prediction accuracy and improved interpretation. However, the characterization
of such optimal statistics in terms of minimizing MSE remains open and
challenging in many problems, for example estimating treatment effect in
adaptive clinical trials with pre-planned modifications to design aspects based
on accumulated data. From an alternative perspective, we propose a deep neural
network based automatic method to construct an improved estimator from existing
ones. Theoretical properties are studied to provide guidance on applicability
of our estimator to seek potential improvement. Simulation studies demonstrate
that the proposed method has considerable finite-sample efficiency gain as
compared with several common estimators. In the Adaptive COVID-19 Treatment
Trial (ACTT) as an important application, our ensemble estimator essentially
contributes to a more ethical and efficient adaptive clinical trial with fewer
patients enrolled. The proposed framework can be generally applied to various
statistical problems, and can be served as a reference measure to guide
statistical research
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Attention layers -- which map a sequence of inputs to a sequence of outputs
-- are core building blocks of the Transformer architecture which has achieved
significant breakthroughs in modern artificial intelligence. This paper
presents a rigorous theoretical study on the learning and generalization of a
single multi-head attention layer, with a sequence of key vectors and a
separate query vector as input. We consider the random feature setting where
the attention layer has a large number of heads, with randomly sampled frozen
query and key matrices, and trainable value matrices. We show that such a
random-feature attention layer can express a broad class of target functions
that are permutation invariant to the key vectors. We further provide
quantitative excess risk bounds for learning these target functions from finite
samples, using random feature attention with finitely many heads.
Our results feature several implications unique to the attention structure
compared with existing random features theory for neural networks, such as (1)
Advantages in the sample complexity over standard two-layer random-feature
networks; (2) Concrete and natural classes of functions that can be learned
efficiently by a random-feature attention layer; and (3) The effect of the
sampling distribution of the query-key weight matrix (the product of the query
and key matrix), where Gaussian random weights with a non-zero mean result in
better sample complexities over the zero-mean counterpart for learning certain
natural target functions. Experiments on simulated data corroborate our
theoretical findings and further illustrate the interplay between the sample
size and the complexity of the target function.Comment: 41pages, 5 figure
DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting
Subgraph counting is the problem of counting the occurrences of a given query
graph in a large target graph. Large-scale subgraph counting is useful in
various domains, such as motif counting for social network analysis and loop
counting for money laundering detection on transaction networks. Recently, to
address the exponential runtime complexity of scalable subgraph counting,
neural methods are proposed. However, existing neural counting approaches fall
short in three aspects. Firstly, the counts of the same query can vary from
zero to millions on different target graphs, posing a much larger challenge
than most graph regression tasks. Secondly, current scalable graph neural
networks have limited expressive power and fail to efficiently distinguish
graphs in count prediction. Furthermore, existing neural approaches cannot
predict the occurrence position of queries in the target graph.
Here we design DeSCo, a scalable neural deep subgraph counting pipeline,
which aims to accurately predict the query count and occurrence position on any
target graph after one-time training. Firstly, DeSCo uses a novel canonical
partition and divides the large target graph into small neighborhood graphs.
The technique greatly reduces the count variation while guaranteeing no missing
or double-counting. Secondly, neighborhood counting uses an expressive
subgraph-based heterogeneous graph neural network to accurately perform
counting in each neighborhood. Finally, gossip propagation propagates
neighborhood counts with learnable gates to harness the inductive biases of
motif counts. DeSCo is evaluated on eight real-world datasets from various
domains. It outperforms state-of-the-art neural methods with 137x improvement
in the mean squared error of count prediction, while maintaining the polynomial
runtime complexity.Comment: 8 pages main text, 10 pages appendi
Tight Collision Probability for UAV Motion Planning in Uncertain Environment
Operating unmanned aerial vehicles (UAVs) in complex environments that
feature dynamic obstacles and external disturbances poses significant
challenges, primarily due to the inherent uncertainty in such scenarios.
Additionally, inaccurate robot localization and modeling errors further
exacerbate these challenges. Recent research on UAV motion planning in static
environments has been unable to cope with the rapidly changing surroundings,
resulting in trajectories that may not be feasible. Moreover, previous
approaches that have addressed dynamic obstacles or external disturbances in
isolation are insufficient to handle the complexities of such environments.
This paper proposes a reliable motion planning framework for UAVs, integrating
various uncertainties into a chance constraint that characterizes the
uncertainty in a probabilistic manner. The chance constraint provides a
probabilistic safety certificate by calculating the collision probability
between the robot's Gaussian-distributed forward reachable set and states of
obstacles. To reduce the conservatism of the planned trajectory, we propose a
tight upper bound of the collision probability and evaluate it both exactly and
approximately. The approximated solution is used to generate motion primitives
as a reference trajectory, while the exact solution is leveraged to iteratively
optimize the trajectory for better results. Our method is thoroughly tested in
simulation and real-world experiments, verifying its reliability and
effectiveness in uncertain environments.Comment: Paper Accepted by IROS 202
Learning Meta Model for Zero- and Few-shot Face Anti-spoofing
Face anti-spoofing is crucial to the security of face recognition systems.
Most previous methods formulate face anti-spoofing as a supervised learning
problem to detect various predefined presentation attacks, which need large
scale training data to cover as many attacks as possible. However, the trained
model is easy to overfit several common attacks and is still vulnerable to
unseen attacks. To overcome this challenge, the detector should: 1) learn
discriminative features that can generalize to unseen spoofing types from
predefined presentation attacks; 2) quickly adapt to new spoofing types by
learning from both the predefined attacks and a few examples of the new
spoofing types. Therefore, we define face anti-spoofing as a zero- and few-shot
learning problem. In this paper, we propose a novel Adaptive Inner-update Meta
Face Anti-Spoofing (AIM-FAS) method to tackle this problem through
meta-learning. Specifically, AIM-FAS trains a meta-learner focusing on the task
of detecting unseen spoofing types by learning from predefined living and
spoofing faces and a few examples of new attacks. To assess the proposed
approach, we propose several benchmarks for zero- and few-shot FAS. Experiments
show its superior performances on the presented benchmarks to existing methods
in existing zero-shot FAS protocols.Comment: Accepted by AAAI202
Differentially Private Learning with Per-Sample Adaptive Clipping
Privacy in AI remains a topic that draws attention from researchers and the
general public in recent years. As one way to implement privacy-preserving AI,
differentially private learning is a framework that enables AI models to use
differential privacy (DP). To achieve DP in the learning process, existing
algorithms typically limit the magnitude of gradients with a constant clipping,
which requires carefully tuned due to its significant impact on model
performance. As a solution to this issue, latest works NSGD and Auto-S
innovatively propose to use normalization instead of clipping to avoid
hyperparameter tuning. However, normalization-based approaches like NSGD and
Auto-S rely on a monotonic weight function, which imposes excessive weight on
small gradient samples and introduces extra deviation to the update. In this
paper, we propose a Differentially Private Per-Sample Adaptive Clipping
(DP-PSAC) algorithm based on a non-monotonic adaptive weight function, which
guarantees privacy without the typical hyperparameter tuning process of using a
constant clipping while significantly reducing the deviation between the update
and true batch-averaged gradient. We provide a rigorous theoretical convergence
analysis and show that with convergence rate at the same order, the proposed
algorithm achieves a lower non-vanishing bound, which is maintained over
training iterations, compared with NSGD/Auto-S. In addition, through extensive
experimental evaluation, we show that DP-PSAC outperforms or matches the
state-of-the-art methods on multiple main-stream vision and language tasks.Comment: To appear in AAAI 2023, Revised acknowledgments and citation
- …