    Deep Neural Networks Guided Ensemble Learning for Point Estimation

    In modern statistics, interests shift from pursuing the uniformly minimum variance unbiased estimator to reducing mean squared error (MSE) or residual squared error. Shrinkage based estimation and regression methods offer better prediction accuracy and improved interpretation. However, the characterization of such optimal statistics in terms of minimizing MSE remains open and challenging in many problems, for example estimating treatment effect in adaptive clinical trials with pre-planned modifications to design aspects based on accumulated data. From an alternative perspective, we propose a deep neural network based automatic method to construct an improved estimator from existing ones. Theoretical properties are studied to provide guidance on applicability of our estimator to seek potential improvement. Simulation studies demonstrate that the proposed method has considerable finite-sample efficiency gain as compared with several common estimators. In the Adaptive COVID-19 Treatment Trial (ACTT) as an important application, our ensemble estimator essentially contributes to a more ethical and efficient adaptive clinical trial with fewer patients enrolled. The proposed framework can be generally applied to various statistical problems, and can be served as a reference measure to guide statistical research

    What can a Single Attention Layer Learn? A Study Through the Random Features Lens

    Attention layers -- which map a sequence of inputs to a sequence of outputs -- are core building blocks of the Transformer architecture which has achieved significant breakthroughs in modern artificial intelligence. This paper presents a rigorous theoretical study on the learning and generalization of a single multi-head attention layer, with a sequence of key vectors and a separate query vector as input. We consider the random feature setting where the attention layer has a large number of heads, with randomly sampled frozen query and key matrices, and trainable value matrices. We show that such a random-feature attention layer can express a broad class of target functions that are permutation invariant to the key vectors. We further provide quantitative excess risk bounds for learning these target functions from finite samples, using random feature attention with finitely many heads. Our results feature several implications unique to the attention structure compared with existing random features theory for neural networks, such as (1) Advantages in the sample complexity over standard two-layer random-feature networks; (2) Concrete and natural classes of functions that can be learned efficiently by a random-feature attention layer; and (3) The effect of the sampling distribution of the query-key weight matrix (the product of the query and key matrix), where Gaussian random weights with a non-zero mean result in better sample complexities over the zero-mean counterpart for learning certain natural target functions. Experiments on simulated data corroborate our theoretical findings and further illustrate the interplay between the sample size and the complexity of the target function.Comment: 41pages, 5 figure

    DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting

    Subgraph counting is the problem of counting the occurrences of a given query graph in a large target graph. Large-scale subgraph counting is useful in various domains, such as motif counting for social network analysis and loop counting for money laundering detection on transaction networks. Recently, to address the exponential runtime complexity of scalable subgraph counting, neural methods are proposed. However, existing neural counting approaches fall short in three aspects. Firstly, the counts of the same query can vary from zero to millions on different target graphs, posing a much larger challenge than most graph regression tasks. Secondly, current scalable graph neural networks have limited expressive power and fail to efficiently distinguish graphs in count prediction. Furthermore, existing neural approaches cannot predict the occurrence position of queries in the target graph. Here we design DeSCo, a scalable neural deep subgraph counting pipeline, which aims to accurately predict the query count and occurrence position on any target graph after one-time training. Firstly, DeSCo uses a novel canonical partition and divides the large target graph into small neighborhood graphs. The technique greatly reduces the count variation while guaranteeing no missing or double-counting. Secondly, neighborhood counting uses an expressive subgraph-based heterogeneous graph neural network to accurately perform counting in each neighborhood. Finally, gossip propagation propagates neighborhood counts with learnable gates to harness the inductive biases of motif counts. DeSCo is evaluated on eight real-world datasets from various domains. It outperforms state-of-the-art neural methods with 137x improvement in the mean squared error of count prediction, while maintaining the polynomial runtime complexity.Comment: 8 pages main text, 10 pages appendi

    Tight Collision Probability for UAV Motion Planning in Uncertain Environment

    Operating unmanned aerial vehicles (UAVs) in complex environments that feature dynamic obstacles and external disturbances poses significant challenges, primarily due to the inherent uncertainty in such scenarios. Additionally, inaccurate robot localization and modeling errors further exacerbate these challenges. Recent research on UAV motion planning in static environments has been unable to cope with the rapidly changing surroundings, resulting in trajectories that may not be feasible. Moreover, previous approaches that have addressed dynamic obstacles or external disturbances in isolation are insufficient to handle the complexities of such environments. This paper proposes a reliable motion planning framework for UAVs, integrating various uncertainties into a chance constraint that characterizes the uncertainty in a probabilistic manner. The chance constraint provides a probabilistic safety certificate by calculating the collision probability between the robot's Gaussian-distributed forward reachable set and states of obstacles. To reduce the conservatism of the planned trajectory, we propose a tight upper bound of the collision probability and evaluate it both exactly and approximately. The approximated solution is used to generate motion primitives as a reference trajectory, while the exact solution is leveraged to iteratively optimize the trajectory for better results. Our method is thoroughly tested in simulation and real-world experiments, verifying its reliability and effectiveness in uncertain environments.Comment: Paper Accepted by IROS 202

    Learning Meta Model for Zero- and Few-shot Face Anti-spoofing

    Face anti-spoofing is crucial to the security of face recognition systems. Most previous methods formulate face anti-spoofing as a supervised learning problem to detect various predefined presentation attacks, which need large scale training data to cover as many attacks as possible. However, the trained model is easy to overfit several common attacks and is still vulnerable to unseen attacks. To overcome this challenge, the detector should: 1) learn discriminative features that can generalize to unseen spoofing types from predefined presentation attacks; 2) quickly adapt to new spoofing types by learning from both the predefined attacks and a few examples of the new spoofing types. Therefore, we define face anti-spoofing as a zero- and few-shot learning problem. In this paper, we propose a novel Adaptive Inner-update Meta Face Anti-Spoofing (AIM-FAS) method to tackle this problem through meta-learning. Specifically, AIM-FAS trains a meta-learner focusing on the task of detecting unseen spoofing types by learning from predefined living and spoofing faces and a few examples of new attacks. To assess the proposed approach, we propose several benchmarks for zero- and few-shot FAS. Experiments show its superior performances on the presented benchmarks to existing methods in existing zero-shot FAS protocols.Comment: Accepted by AAAI202

    Differentially Private Learning with Per-Sample Adaptive Clipping

    Privacy in AI remains a topic that draws attention from researchers and the general public in recent years. As one way to implement privacy-preserving AI, differentially private learning is a framework that enables AI models to use differential privacy (DP). To achieve DP in the learning process, existing algorithms typically limit the magnitude of gradients with a constant clipping, which requires carefully tuned due to its significant impact on model performance. As a solution to this issue, latest works NSGD and Auto-S innovatively propose to use normalization instead of clipping to avoid hyperparameter tuning. However, normalization-based approaches like NSGD and Auto-S rely on a monotonic weight function, which imposes excessive weight on small gradient samples and introduces extra deviation to the update. In this paper, we propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function, which guarantees privacy without the typical hyperparameter tuning process of using a constant clipping while significantly reducing the deviation between the update and true batch-averaged gradient. We provide a rigorous theoretical convergence analysis and show that with convergence rate at the same order, the proposed algorithm achieves a lower non-vanishing bound, which is maintained over training iterations, compared with NSGD/Auto-S. In addition, through extensive experimental evaluation, we show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks.Comment: To appear in AAAI 2023, Revised acknowledgments and citation
