872 research outputs found
A Cookbook of Self-Supervised Learning
Self-supervised learning, dubbed the dark matter of intelligence, is a
promising path to advance machine learning. Yet, much like cooking, training
SSL methods is a delicate art with a high barrier to entry. While many
components are familiar, successfully training a SSL method involves a dizzying
set of choices from the pretext tasks to training hyper-parameters. Our goal is
to lower the barrier to entry into SSL research by laying the foundations and
latest SSL recipes in the style of a cookbook. We hope to empower the curious
researcher to navigate the terrain of methods, understand the role of the
various knobs, and gain the know-how required to explore how delicious SSL can
be
Learning Global Additive Explanations for Neural Nets Using Model Distillation
Interpretability has largely focused on local explanations, i.e. explaining
why a model made a particular prediction for a sample. These explanations are
appealing due to their simplicity and local fidelity. However, they do not
provide information about the general behavior of the model. We propose to
leverage model distillation to learn global additive explanations that describe
the relationship between input features and model predictions. These global
explanations take the form of feature shapes, which are more expressive than
feature attributions. Through careful experimentation, we show qualitatively
and quantitatively that global additive explanations are able to describe model
behavior and yield insights about models such as neural nets. A visualization
of our approach applied to a neural net as it is trained is available at
https://youtu.be/ErQYwNqzEdc.Comment: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018
arXiv:1811.0721
Rethinking Interpretability in the Era of Large Language Models
Interpretable machine learning has exploded as an area of interest over the
last decade, sparked by the rise of increasingly large datasets and deep neural
networks. Simultaneously, large language models (LLMs) have demonstrated
remarkable capabilities across a wide array of tasks, offering a chance to
rethink opportunities in interpretable machine learning. Notably, the
capability to explain in natural language allows LLMs to expand the scale and
complexity of patterns that can be given to a human. However, these new
capabilities raise new challenges, such as hallucinated explanations and
immense computational costs.
In this position paper, we start by reviewing existing methods to evaluate
the emerging field of LLM interpretation (both interpreting LLMs and using LLMs
for explanation). We contend that, despite their limitations, LLMs hold the
opportunity to redefine interpretability with a more ambitious scope across
many applications, including in auditing LLMs themselves. We highlight two
emerging research priorities for LLM interpretation: using LLMs to directly
analyze new datasets and to generate interactive explanations.Comment: 7 page
OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System
Automated machine learning (AutoML) seeks to build ML models with minimal
human effort. While considerable research has been conducted in the area of
AutoML in general, aiming to take humans out of the loop when building
artificial intelligence (AI) applications, scant literature has focused on how
AutoML works well in open-environment scenarios such as the process of training
and updating large models, industrial supply chains or the industrial
metaverse, where people often face open-loop problems during the search
process: they must continuously collect data, update data and models, satisfy
the requirements of the development and deployment environment, support massive
devices, modify evaluation metrics, etc. Addressing the open-environment issue
with pure data-driven approaches requires considerable data, computing
resources, and effort from dedicated data engineers, making current AutoML
systems and platforms inefficient and computationally intractable.
Human-computer interaction is a practical and feasible way to tackle the
problem of open-environment AI. In this paper, we introduce OmniForce, a
human-centered AutoML (HAML) system that yields both human-assisted ML and
ML-assisted human techniques, to put an AutoML system into practice and build
adaptive AI in open-environment scenarios. Specifically, we present OmniForce
in terms of ML version management; pipeline-driven development and deployment
collaborations; a flexible search strategy framework; and widely provisioned
and crowdsourced application algorithms, including large models. Furthermore,
the (large) models constructed by OmniForce can be automatically turned into
remote services in a few minutes; this process is dubbed model as a service
(MaaS). Experimental results obtained in multiple search spaces and real-world
use cases demonstrate the efficacy and efficiency of OmniForce
A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
Chain-of-thought reasoning, a cognitive process fundamental to human
intelligence, has garnered significant attention in the realm of artificial
intelligence and natural language processing. However, there still remains a
lack of a comprehensive survey for this arena. To this end, we take the first
step and present a thorough survey of this research field carefully and widely.
We use X-of-Thought to refer to Chain-of-Thought in a broad sense. In detail,
we systematically organize the current research according to the taxonomies of
methods, including XoT construction, XoT structure variants, and enhanced XoT.
Additionally, we describe XoT with frontier applications, covering planning,
tool use, and distillation. Furthermore, we address challenges and discuss some
future directions, including faithfulness, multi-modal, and theory. We hope
this survey serves as a valuable resource for researchers seeking to innovate
within the domain of chain-of-thought reasoning.Comment: 26 pages. Resources are available at
https://github.com/zchuz/CoT-Reasoning-Surve
Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement
We propose Dataset Reinforcement, a strategy to improve a dataset once such
that the accuracy of any model architecture trained on the reinforced dataset
is improved at no additional training cost for users. We propose a Dataset
Reinforcement strategy based on data augmentation and knowledge distillation.
Our generic strategy is designed based on extensive analysis across CNN- and
transformer-based models and performing large-scale study of distillation with
state-of-the-art models with various data augmentations. We create a reinforced
version of the ImageNet training dataset, called ImageNet+, as well as
reinforced datasets CIFAR-100+, Flowers-102+, and Food-101+. Models trained
with ImageNet+ are more accurate, robust, and calibrated, and transfer well to
downstream tasks (e.g., segmentation and detection). As an example, the
accuracy of ResNet-50 improves by 1.7% on the ImageNet validation set, 3.5% on
ImageNetV2, and 10.0% on ImageNet-R. Expected Calibration Error (ECE) on the
ImageNet validation set is also reduced by 9.9%. Using this backbone with
Mask-RCNN for object detection on MS-COCO, the mean average precision improves
by 0.8%. We reach similar gains for MobileNets, ViTs, and Swin-Transformers.
For MobileNetV3 and Swin-Tiny, we observe significant improvements on
ImageNet-R/A/C of up to 20% improved robustness. Models pretrained on ImageNet+
and fine-tuned on CIFAR-100+, Flowers-102+, and Food-101+, reach up to 3.4%
improved accuracy. The code, datasets, and pretrained models are available at
https://github.com/apple/ml-dr.Comment: Accepted at International Conference on Computer Vision (ICCV) 2023.
Camera-ready version with new Tables 9 and 1
Realistic adversarial machine learning to improve network intrusion detection
Modern organizations can significantly benefit from the use of Artificial Intelligence (AI), and more specifically Machine Learning (ML), to tackle the growing number and increasing sophistication of cyber-attacks targeting their business processes. However, there are several technological and ethical challenges that undermine the trustworthiness of AI. One of the main challenges is the lack of robustness, which is an essential property to ensure that ML is used in a secure way. Improving robustness is no easy task because ML is inherently susceptible to adversarial examples: data samples with subtle perturbations that cause unexpected behaviors in ML models. ML engineers and security practitioners still lack the knowledge and tools to prevent such disruptions, so adversarial examples pose a major threat to ML and to the intelligent Network Intrusion Detection (NID) systems that rely on it. This thesis presents a methodology for a trustworthy adversarial robustness analysis of multiple ML models, and an intelligent method for the generation of realistic adversarial examples in complex tabular data domains like the NID domain: Adaptative Perturbation Pattern Method (A2PM). It is demonstrated that a successful adversarial attack is not guaranteed to be a successful cyber-attack, and that adversarial data perturbations can only be realistic if they are simultaneously valid and coherent, complying with the domain constraints of a real communication network and the class-specific constraints of a certain cyber-attack class. A2PM can be used for adversarial attacks, to iteratively cause misclassifications, and adversarial training, to perform data augmentation with slightly perturbed data samples. Two case studies were conducted to evaluate its suitability for the NID domain. The first verified that the generated perturbations preserved both validity and coherence in Enterprise and Internet-of Things (IoT) network scenarios, achieving realism. The second verified that adversarial training with simple perturbations enables the models to retain a good generalization to regular IoT network traffic flows, in addition to being more robust to adversarial examples. The key takeaway of this thesis is: ML models can be incredibly valuable to improve a cybersecurity system, but their own vulnerabilities must not be disregarded. It is essential to continue the research efforts to improve the security and trustworthiness of ML and of the intelligent systems that rely on it.Organizações modernas podem beneficiar significativamente do uso de Inteligência Artificial (AI), e mais especificamente Aprendizagem Automática (ML), para enfrentar a crescente quantidade e sofisticação de ciberataques direcionados aos seus processos de negócio. No entanto, há vários desafios tecnológicos e éticos que comprometem a confiabilidade da AI. Um dos maiores desafios é a falta de robustez, que é uma propriedade essencial para garantir que se usa ML de forma segura. Melhorar a robustez não é uma tarefa fácil porque ML é inerentemente suscetível a exemplos adversos: amostras de dados com perturbações subtis que causam comportamentos inesperados em modelos ML. Engenheiros de ML e profissionais de segurança ainda não têm o conhecimento nem asferramentas necessárias para prevenir tais disrupções, por isso os exemplos adversos representam uma grande ameaça a ML e aos sistemas de Deteção de Intrusões de Rede (NID) que dependem de ML. Esta tese apresenta uma metodologia para uma análise da robustez de múltiplos modelos ML, e um método inteligente para a geração de exemplos adversos realistas em domínios de dados tabulares complexos como o domínio NID: Método de Perturbação com Padrões Adaptativos (A2PM). É demonstrado que um ataque adverso bem-sucedido não é garantidamente um ciberataque bem-sucedido, e que as perturbações adversas só são realistas se forem simultaneamente válidas e coerentes, cumprindo as restrições de domínio de uma rede de computadores real e as restrições específicas de uma certa classe de ciberataque. A2PM pode ser usado para ataques adversos, para iterativamente causar erros de classificação, e para treino adverso, para realizar aumento de dados com amostras ligeiramente perturbadas. Foram efetuados dois casos de estudo para avaliar a sua adequação ao domínio NID. O primeiro verificou que as perturbações preservaram tanto a validade como a coerência em cenários de redes Empresariais e Internet-das-Coisas (IoT), alcançando o realismo. O segundo verificou que o treino adverso com perturbações simples permitiu aos modelos reter uma boa generalização a fluxos de tráfego de rede IoT, para além de serem mais robustos contra exemplos adversos. A principal conclusão desta tese é: os modelos ML podem ser incrivelmente valiosos para melhorar um sistema de cibersegurança, mas as suas próprias vulnerabilidades não devem ser negligenciadas. É essencial continuar os esforços de investigação para melhorar a segurança e a confiabilidade de ML e dos sistemas inteligentes que dependem de ML
Generalization through the lens of learning dynamics
A machine learning (ML) system must learn not only to match the output of a target function on a training set, but also to generalize to novel situations in order to yield accurate predictions at deployment. In most practical applications, the user cannot exhaustively enumerate every possible input to the model; strong generalization performance is therefore crucial to the development of ML systems which are performant and reliable enough to be deployed in the real world. While generalization is well-understood theoretically in a number of hypothesis classes, the impressive generalization performance of deep neural networks has stymied theoreticians. In deep reinforcement learning (RL), our understanding of generalization is further complicated by the conflict between generalization and stability in widely-used RL algorithms. This thesis will provide insight into generalization by studying the learning dynamics of deep neural networks in both supervised and reinforcement learning tasks.
We begin with a study of generalization in supervised learning. We propose new PAC-Bayes generalization bounds for invariant models and for models trained with data augmentation. We go on to consider more general forms of inductive bias, connecting a notion of training speed with Bayesian model selection. This connection yields a family of marginal likelihood estimators which require only sampled losses from an iterative gradient descent trajectory, and analogous performance estimators for neural networks. We then turn our attention to reinforcement learning, laying out the learning dynamics framework for the RL setting which will be leveraged throughout the remainder of the thesis. We identify a new phenomenon which we term capacity loss, whereby neural networks lose their ability to adapt to new target functions over the course of training in deep RL problems, for which we propose a novel regularization approach. Follow-up analysis studying more subtle forms of capacity loss reveals that deep RL agents are prone to memorization due to the unstructured form of early prediction targets, and highlights a solution in the form of distillation. We conclude by calling back to a different notion of invariance to that which started this thesis, presenting a novel representation learning method which promotes invariance to spurious factors of variation in the environment
- …