2,062 research outputs found

    Emerging technologies for learning report (volume 3)

    Get PDF

    Adversarial Robustness and Robust Meta-Learning for Neural Networks

    Get PDF
    Despite the overwhelming success of neural networks for pattern recognition, these models behave categorically different from humans. Adversarial examples, small perturbations which are often undetectable to the human eye, easily fool neural networks, demonstrating that neural networks lack the robustness of human classifiers. This thesis comprises a sequence of three parts. First, we motivate the study of defense against adversarial examples with a case study on algorithmic trading in which robustness may be critical for security reasons. Second, we develop methods for hardening neural networks against an adversary, especially in the low-data regime, where meta-learning methods achieve state-of-the-art results. Finally, we discuss several properties of the neural network models we use. These properties are of interest beyond robustness to adversarial examples, and they extend to the broad setting of deep learning

    Practical Robust Learning Under Domain Shifts

    Get PDF
    With the constantly upgraded devices, the data we capture is shifting with time. Despite the domain shifts among the images, we as humans can put aside the difference and still recognize the content. However, these shifts are a bigger challenge for machines. It is widely known that humans are naturally adaptive to the visual changes in the environment, without learning all over again. However, to make machines work in the changed environment we need new annotations from human. The fundamental question is: can we make machines as adaptive as humans? In this thesis, we have worked towards addressing this question through advances in the study of robust learning under domain shifts via domain adaptation. Our goal is to facilitate the transfer of information of the machines while minimizing the need for human supervision. To enable real systems with demonstrated robustness, the study of domain adaptation needs to move from ideals to realities. In current domain adaptation research, there are few ideals that are not consistent with reality: i) The assumption that domains are perfectly sliced and that domain labels are available. ii) The assumption that the annotations from the target domain should be treated equally as those of the source domain. iii) The assumption that the samples of target domains are constantly accessible. In this thesis, we try to address the issue that true domain labels are hard to obtain, the target domain labels have better ways to exploited, and that in reality the target domain is often time-sensitive. In the scope of problem settings, this thesis has covered the following scenarios with practical values. Unsupervised multi-source domain adaptation, semi-supervised domain adaptation and online domain adaptation. Three completed works are reviewed corresponding to each problem setting. The first work proposes an adversarial learning strategy that learns a dynamic curriculum for source samples to maximize the utility of source labels of multiple domains. The model iteratively learns which domains or samples are best suited for aligning to the target. The intuition is to force the adversarial agent to constantly re-measure the transferability of latent domains over time to adversarially raise the error rate of the domain discriminator. The method has removed the need of domain labels, yet it outperforms other methods on four well-known benchmarks by significant margins. The second work aims to address the problem that current methods have not effectively used the target supervision by treating source and target supervision without distinction. The work points out that the labeled target data needs to be distinguished from the source, and propose to explicitly decompose the task into two sub-tasks: a semi-supervised learning task in the target domain and an unsupervised domain adaptation task across domains. By doing so, the two sub-tasks can better leverage the corresponding supervision and thus yield very different classifiers. The third work is proposed in the context of online privacy, i.e. each online sample of the target domain is permanently deleted after processed. The proposed framework utilizes the labels from the public data and predicts on the unlabeled sensitive private data. To tackle the inevitable distribution shift from the public data to the private data, the work proposes a novel domain adaptation algorithm that directly aims at the fundamental challenge of this online setting--the lack of diverse source-target data pairs

    Cognitive Architectures for Language Agents

    Full text link
    Recent efforts have incorporated large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or reasoning. However, these efforts have largely been piecemeal, lacking a systematic framework for constructing a fully-fledged language agent. To address this challenge, we draw on the rich history of agent design in symbolic artificial intelligence to develop a blueprint for a new wave of cognitive language agents. We first show that LLMs have many of the same properties as production systems, and recent efforts to improve their grounding or reasoning mirror the development of cognitive architectures built around production systems. We then propose Cognitive Architectures for Language Agents (CoALA), a conceptual framework to systematize diverse methods for LLM-based reasoning, grounding, learning, and decision making as instantiations of language agents in the framework. Finally, we use the CoALA framework to highlight gaps and propose actionable directions toward more capable language agents in the future.Comment: 16 pages of main content, 10 pages of references, 5 figures. Equal contribution among the first two authors, order decided by coin flip. A CoALA-based repo of recent work on language agents: https://github.com/ysymyth/awesome-language-agent

    DPL: Decoupled Prompt Learning for Vision-Language Models

    Full text link
    Prompt learning has emerged as an efficient and effective approach for transferring foundational Vision-Language Models (e.g., CLIP) to downstream tasks. However, current methods tend to overfit to seen categories, thereby limiting their generalization ability for unseen classes. In this paper, we propose a new method, Decoupled Prompt Learning (DPL), which reformulates the attention in prompt learning to alleviate this problem. Specifically, we theoretically investigate the collaborative process between prompts and instances (i.e., image patches/text tokens) by reformulating the original self-attention into four separate sub-processes. Through detailed analysis, we observe that certain sub-processes can be strengthened to bolster robustness and generalizability by some approximation techniques. Furthermore, we introduce language-conditioned textual prompting based on decoupled attention to naturally preserve the generalization of text input. Our approach is flexible for both visual and textual modalities, making it easily extendable to multi-modal prompt learning. By combining the proposed techniques, our approach achieves state-of-the-art performance on three representative benchmarks encompassing 15 image recognition datasets, while maintaining parameter-efficient. Moreover, our DPL does not rely on any auxiliary regularization task or extra training data, further demonstrating its remarkable generalization ability.Comment: 11 pages, 5 figures, 8 table

    Meta learning for few shot learning

    Get PDF
    Few-shot learning aims to scale visual recognition to open-ended growth of new classes with limited labelled examples, thus alleviating data and computation bottleneck of conventional deep learning. This thesis proposes a meta learning (a.k.a. learning to learn), paradigm to tackle the real-world few shot learning challenges. Firstly, we present a parameterized multi-metric based meta learning algorithm (RelationNet2). Existing metric learning algorithms are always based on training a global deep embedding and metric to support image similarity matching, but we propose a deep comparison network comprised of embedding and relation modules learning multiple non-linear distance metrics based on different levels of features simultaneously. Furthermore, images are represented as \todo{a} distribution rather than vectors via learning parameterized Gaussian noise regularization, reducing overfitting and enable the use of deeper embeddings. We next consider the fact that several recent competitors develop effective few-shot learners through strong conventional representations in combination with very simple classifiers, questioning whether “meta-learning” is necessary or highly effective features are sufficient. To defend meta-learning, we take an approach agnostic to the off-the-shelf features, and focus exclusively on meta-learning the final classifier layer. Specifically, we introduce MetaQDA, a Bayesian meta-learning extension of quadratic discriminant analysis classifier, that is complementary to advances in feature representations, leading to high accuracy and state-of-the-art uncertainty calibration performance in predictions. Finally, we investigate the extension of MetaQDA to more generalized real-world scenarios beyond the narrow standard few-shot benchmarks. Our model achieves both many-shot and few-shot classification accuracy in generalized few-shot learning. In terms of few-shot class-incremental learning, MetaQDA is inherently suitable to novel classes growing \todo{scenarios}. As for open-set recognition, we calculate the probability belonging to novel class by Bayes' Rule, maintaining high accuracy in both close-set recognition and open-set rejection. Overall, our contributions in few-shot meta-learning advance state of the art under both accuracy and calibration metrics, explore a series of increasingly realistic problem settings, to support more researchers and practitioners in future exploration

    Conditional Positional Encodings for Vision Transformers

    Full text link
    We propose a conditional positional encoding (CPE) scheme for vision Transformers. Unlike previous fixed or learnable positional encodings, which are pre-defined and independent of input tokens, CPE is dynamically generated and conditioned on the local neighborhood of the input tokens. As a result, CPE can easily generalize to the input sequences that are longer than what the model has ever seen during training. Besides, CPE can keep the desired translation-invariance in the image classification task, resulting in improved classification accuracy. CPE can be effortlessly implemented with a simple Position Encoding Generator (PEG), and it can be seamlessly incorporated into the current Transformer framework. Built on PEG, we present Conditional Position encoding Vision Transformer (CPVT). We demonstrate that CPVT has visually similar attention maps compared to those with learned positional encodings. Benefit from the conditional positional encoding scheme, we obtain state-of-the-art results on the ImageNet classification task compared with vision Transformers to date. Our code will be made available at https://github.com/Meituan-AutoML/CPVT .Comment: A general purpose conditional position encoding for vision transformer
    corecore