51 research outputs found

    On the Generalization Effects of Linear Transformations in Data Augmentation

    Full text link
    Data augmentation is a powerful technique to improve performance in applications such as image and text classification tasks. Yet, there is little rigorous understanding of why and how various augmentations work. In this work, we consider a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting. First, we show that transformations which preserve the labels of the data can improve estimation by enlarging the span of the training data. Second, we show that transformations which mix data can improve estimation by playing a regularization effect. Finally, we validate our theoretical insights on MNIST. Based on the insights, we propose an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data. We validate our proposed scheme on image and text datasets. For example, our method outperforms RandAugment by 1.24% on CIFAR-100 using Wide-ResNet-28-10. Furthermore, we achieve comparable accuracy to the SoTA Adversarial AutoAugment on CIFAR datasets.Comment: International Conference on Machine learning (ICML) 2020. Added experimental results on ImageNe

    Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees

    Full text link
    We consider transfer learning approaches that fine-tune a pretrained deep neural network on a target task. We study generalization properties of fine-tuning to understand the problem of overfitting, which commonly occurs in practice. Previous works have shown that constraining the distance from the initialization of fine-tuning improves generalization. Using a PAC-Bayesian analysis, we observe that besides distance from initialization, Hessians affect generalization through the noise stability of deep neural networks against noise injections. Motivated by the observation, we develop Hessian distance-based generalization bounds for a wide range of fine-tuning methods. Additionally, we study the robustness of fine-tuning in the presence of noisy labels. Motivated by our theory, we design an algorithm that incorporates consistent losses and distance-based regularization for fine-tuning, along with a generalization error guarantee under class conditional independent noise in the training set labels. We perform a detailed empirical study of our algorithm on various noisy environments and architectures. On six image classification tasks whose training labels are generated with programmatic labeling, we find a 3.26% accuracy gain over prior fine-tuning methods. Meanwhile, the Hessian distance measure of the fine-tuned model decreases by six times more than existing approaches.Comment: 36 pages, 5 figures, 8 tables; ICML 202

    Generalization in Graph Neural Networks: Improved PAC-Bayesian Bounds on Graph Diffusion

    Full text link
    Graph neural networks are widely used tools for graph prediction tasks. Motivated by their empirical performance, prior works have developed generalization bounds for graph neural networks, which scale with graph structures in terms of the maximum degree. In this paper, we present generalization bounds that instead scale with the largest singular value of the graph neural network's feature diffusion matrix. These bounds are numerically much smaller than prior bounds for real-world graphs. We also construct a lower bound of the generalization gap that matches our upper bound asymptotically. To achieve these results, we analyze a unified model that includes prior works' settings (i.e., convolutional and message-passing networks) and new settings (i.e., graph isomorphism networks). Our key idea is to measure the stability of graph neural networks against noise perturbations using Hessians. Empirically, we find that Hessian-based measurements correlate with the observed generalization gaps of graph neural networks accurately. Optimizing noise stability properties for fine-tuning pretrained graph neural networks also improves test performance on several graph-level classification tasks.Comment: 36 pages, 2 tables, 3 figures. Appeared in AISTATS 202

    Improved Worst-Group Robustness via Classifier Retraining on Independent Splits

    Full text link
    High-capacity deep neural networks (DNNs) trained with Empirical Risk Minimization (ERM) often suffer from poor worst-group accuracy despite good on-average performance, where worst-group accuracy measures a model's robustness towards certain subpopulations of the input space. Spurious correlations and memorization behaviors of ERM trained DNNs are typically attributed to this degradation in performance. We develop a method, called CRIS, that address these issues by performing robust classifier retraining on independent splits of the dataset. This results in a simple method that improves upon state-of-the-art methods, such as Group DRO, on standard datasets while relying on much fewer group labels and little additional hyperparameter tuning

    Role of Human-Mediated Dispersal in the Spread of the Pinewood Nematode in China

    Get PDF
    Background: Intensification of world trade is responsible for an increase in the number of alien species introductions. Human-mediated dispersal promotes not only introductions but also expansion of the species distribution via long-distance dispersal. Thus, understanding the role of anthropogenic pathways in the spread of invading species has become one of the most important challenges nowadays. Methodology/Principal Findings: We analysed the invasion pattern of the pinewood nematode in China based on invasion data from 1982 to 2005 and monitoring data on 7 locations over 15 years. Short distance spread mediated by long-horned beetles was estimated at 7.5 km per year. Infested sites located further away represented more than 90% of observations and the mean long distance spread was estimated at 111–339 km. Railways, river ports, and lakes had significant effects on the spread pattern. Human population density levels explained 87% of the variation in the invasion probability (P,0.05).Since 2001, the number of new records of the nematode was multiplied by a factor of 5 and the spread distance by a factor of 2. We combined a diffusion model to describe the short distance spread with a stochastic,individual based model to describe the long distance jumps. This combined model generated an error of only 13% when used to predict the presence of the nematode. Under two climate scenarios (stable climate or moderate warming), projections of the invasion probability suggest that this pest could expand its distribution 40–55% by 2025. Conclusions/Significance: This study provides evidence that human-induced dispersal plays a fundamental role in the spread of the pinewood nematode, and appropriate control measures should be taken to stop or slow its expansion. This model can be applied to Europe, where the nematode had been introduced later, and is currently expanding its distribution. Similar models could also be derived for other species that could be accidentally transported by humans

    AI is a viable alternative to high throughput screening: a 318-target study

    Get PDF
    : High throughput screening (HTS) is routinely used to identify bioactive small molecules. This requires physical compounds, which limits coverage of accessible chemical space. Computational approaches combined with vast on-demand chemical libraries can access far greater chemical space, provided that the predictive accuracy is sufficient to identify useful molecules. Through the largest and most diverse virtual HTS campaign reported to date, comprising 318 individual projects, we demonstrate that our AtomNet® convolutional neural network successfully finds novel hits across every major therapeutic area and protein class. We address historical limitations of computational screening by demonstrating success for target proteins without known binders, high-quality X-ray crystal structures, or manual cherry-picking of compounds. We show that the molecules selected by the AtomNet® model are novel drug-like scaffolds rather than minor modifications to known bioactive compounds. Our empirical results suggest that computational methods can substantially replace HTS as the first step of small-molecule drug discovery

    Information Transfer in Multitask Learning, Data Augmentation, and Beyond

    No full text
    A hallmark of human intelligence is that we continue to learn new information and then extrapolate the learned information onto new tasks and domains (see, e.g., Thrun and Pratt (1998)). While this is a fairly intuitive observation, formulating such ideas has proved to be a challenging research problem and continues to inspire new studies. Recently, there has been increasing interest in AI/ML about building models that generalize across tasks, even when they have some form of distribution shifts. How can we ground this research in a solid framework to develop principled methods for better practice? This talk will present my recent works addressing this research question. My talk will involve three parts: revisiting multitask learning from the lens of deep learning theory, designing principled methods for robust transfer, and algorithmic implications for data augmentation
    corecore