18 research outputs found
Tied-Augment: Controlling Representation Similarity Improves Data Augmentation
Data augmentation methods have played an important role in the recent advance
of deep learning models, and have become an indispensable component of
state-of-the-art models in semi-supervised, self-supervised, and supervised
training for vision. Despite incurring no additional latency at test time, data
augmentation often requires more epochs of training to be effective. For
example, even the simple flips-and-crops augmentation requires training for
more than 5 epochs to improve performance, whereas RandAugment requires more
than 90 epochs. We propose a general framework called Tied-Augment, which
improves the efficacy of data augmentation in a wide range of applications by
adding a simple term to the loss that can control the similarity of
representations under distortions. Tied-Augment can improve state-of-the-art
methods from data augmentation (e.g. RandAugment, mixup), optimization (e.g.
SAM), and semi-supervised learning (e.g. FixMatch). For example,
Tied-RandAugment can outperform RandAugment by 2.0% on ImageNet. Notably, using
Tied-Augment, data augmentation can be made to improve generalization even when
training for a few epochs and when fine-tuning. We open source our code at
https://github.com/ekurtulus/tied-augment/tree/main.Comment: 14 pages, 2 figures, ICML 202
Specialization as an Optimal Strategy Under Varying External Conditions
We present an investigation of specialization when considering the execution of collaborative tasks by a robot swarm. Specifically, we consider the stick-pulling problem first proposed by Martinoli et al. [1], [2] and develop a macroscopic analytical model for the swarm executing a set of tasks that require the collaboration of two robots. We show, for constant external conditions, maximum productivity can be achieved by a single species swarm with carefully chosen operational parameters. While the same applies for a two species swarm, we show how specialization is a strategy best employed for changing external conditions
Recommended from our members
Computational Caches
Caching is a well-known technique for speeding up computation. We cache data from file systems and databases; we cache dynamically generated code blocks; we cache page translations in TLBs. We propose to cache the act of computation, so that we can apply it later and in different contexts. We use a state-space model of computation to support such caching, involving two interrelated parts: speculatively memoized predicted/resultant state pairs that we use to accelerate sequential computation, and trained probabilistic models that we use to generate predicted states from which to speculatively execute. The key techniques that make this approach feasible are designing probabilistic models that automatically focus on regions of program execution state space in which prediction is tractable and identifying state space equivalence classes so that predictions need not be exact.Engineering and Applied Science
Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy
We introduce a machine learning approach to determine the transition dynamics
of silicon atoms on a single layer of carbon atoms, when stimulated by the
electron beam of a scanning transmission electron microscope (STEM). Our method
is data-centric, leveraging data collected on a STEM. The data samples are
processed and filtered to produce symbolic representations, which we use to
train a neural network to predict transition probabilities. These learned
transition dynamics are then leveraged to guide a single silicon atom
throughout the lattice to pre-determined target destinations. We present
empirical analyses that demonstrate the efficacy and generality of our
approach
PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions
Cross-entropy loss and focal loss are the most common choices when training
deep neural networks for classification problems. Generally speaking, however,
a good loss function can take on much more flexible forms, and should be
tailored for different tasks and datasets. Motivated by how functions can be
approximated via Taylor expansion, we propose a simple framework, named
PolyLoss, to view and design loss functions as a linear combination of
polynomial functions. Our PolyLoss allows the importance of different
polynomial bases to be easily adjusted depending on the targeting tasks and
datasets, while naturally subsuming the aforementioned cross-entropy loss and
focal loss as special cases. Extensive experimental results show that the
optimal choice within the PolyLoss is indeed dependent on the task and dataset.
Simply by introducing one extra hyperparameter and adding one line of code, our
Poly-1 formulation outperforms the cross-entropy loss and focal loss on 2D
image classification, instance segmentation, object detection, and 3D object
detection tasks, sometimes by a large margin.Comment: Add ablation studies on COCO detection using RetinaNet (Section 8
Accurate Surface and Finite Temperature Bulk Properties of Lithium Metal at Large Scales using Machine Learning Interaction Potentials
The properties of lithium metal are key parameters in the design of lithium
ion and lithium metal batteries. They are difficult to probe experimentally due
to the high reactivity and low melting point of lithium as well as the
microscopic scales at which lithium exists in batteries where it is found to
have enhanced strength, with implications for dendrite suppression strategies.
Computationally, there is a lack of empirical potentials that are consistently
quantitatively accurate across all properties and ab-initio calculations are
too costly. In this work, we train Machine Learning Interaction Potentials
(MLIPs) on Density Functional Theory (DFT) data to state-of-the-art accuracy in
reproducing experimental and ab-initio results across a wide range of
simulations at large length and time scales. We accurately predict
thermodynamic properties, phonon spectra, temperature dependence of elastic
constants and various surface properties inaccessible using DFT. We establish
that there exists a Bell-Evans-Polanyi relation correlating the self-adsorption
energy and the minimum surface diffusion barrier for high Miller index facets.Comment: 9 pages, 4 figures, 3 pages of Supporting Informatio