1,458 research outputs found
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Convolution models with long filters have demonstrated state-of-the-art
reasoning abilities in many long-sequence tasks but lag behind the most
optimized Transformers in wall-clock time. A major bottleneck is the Fast
Fourier Transform (FFT)--which allows long convolutions to run in
time in sequence length but has poor hardware utilization. In this paper,
we study how to optimize the FFT convolution. We find two key bottlenecks: the
FFT does not effectively use specialized matrix multiply units, and it incurs
expensive I/O between layers of the memory hierarchy. In response, we propose
FlashFFTConv. FlashFFTConv uses a matrix decomposition that computes the FFT
using matrix multiply units and enables kernel fusion for long sequences,
reducing I/O. We also present two sparse convolution algorithms--1) partial
convolutions and 2) frequency-sparse convolutions--which can be implemented
simply by skipping blocks in the matrix decomposition, enabling further
opportunities for memory and compute savings. FlashFFTConv speeds up exact FFT
convolutions by up to 7.93 over PyTorch and achieves up to 4.4
speedup end-to-end. Given the same compute budget, FlashFFTConv allows
Hyena-GPT-s to achieve 2.3 points better perplexity on the PILE and
M2-BERT-base to achieve 3.3 points higher GLUE score--matching models with
twice the parameter count. FlashFFTConv also achieves 96.1% accuracy on
Path-512, a high-resolution vision task where no model had previously achieved
better than 50%. Furthermore, partial convolutions enable longer-sequence
models--yielding the first DNA model that can process the longest human genes
(2.3M base pairs)--and frequency-sparse convolutions speed up pretrained models
while maintaining or improving model quality
Mapping the unconventional orbital texture in topological crystalline insulators
The newly discovered topological crystalline insulators (TCIs) harbor a
complex band structure involving multiple Dirac cones. These materials are
potentially highly tunable by external electric field, temperature or strain
and could find future applications in field-effect transistors, photodetectors,
and nano-mechanical systems. Theoretically, it has been predicted that
different Dirac cones, offset in energy and momentum-space, might harbor vastly
different orbital character, a unique property which if experimentally
realized, would present an ideal platform for accomplishing new spintronic
devices. However, the orbital texture of the Dirac cones, which is of immense
importance in determining a variety of materials properties, still remains
elusive in TCIs. Here, we unveil the orbital texture in a prototypical TCI
PbSnSe. By using Fourier-transform (FT) scanning tunneling
spectroscopy (STS) we measure the interference patterns produced by the
scattering of surface state electrons. We discover that the intensity and
energy dependences of FTs show distinct characteristics, which can directly be
attributed to orbital effects. Our experiments reveal the complex band topology
involving two Lifshitz transitions and establish the orbital nature of the
Dirac bands in this new class of topological materials, which could provide a
different pathway towards future quantum applications
Semi-Supervised Learning for Sparsely-Labeled Sequential Data: Application to Healthcare Video Processing
Labeled data is a critical resource for training and evaluating machine
learning models. However, many real-life datasets are only partially labeled.
We propose a semi-supervised machine learning training strategy to improve
event detection performance on sequential data, such as video recordings, when
only sparse labels are available, such as event start times without their
corresponding end times. Our method uses noisy guesses of the events' end times
to train event detection models. Depending on how conservative these guesses
are, mislabeled false positives may be introduced into the training set (i.e.,
negative sequences mislabeled as positives). We further propose a mathematical
model for estimating how many inaccurate labels a model is exposed to, based on
how noisy the end time guesses are. Finally, we show that neural networks can
improve their detection performance by leveraging more training data with less
conservative approximations despite the higher proportion of incorrect labels.
We adapt sequential versions of MNIST and CIFAR-10 to empirically evaluate our
method, and find that our risk-tolerant strategy outperforms conservative
estimates by 12 points of mean average precision for MNIST, and 3.5 points for
CIFAR. Then, we leverage the proposed training strategy to tackle a real-life
application: processing continuous video recordings of epilepsy patients to
improve seizure detection, and show that our method outperforms baseline
labeling methods by 10 points of average precision
Singularity in the boundary resistance between superfluid He and a solid surface
We report new measurements in four cells of the thermal boundary resistance
between copper and He below but near the superfluid-transition
temperature . For fits of to the data yielded ,
whereas a fit to theoretical values based on the renormalization-group theory
yielded . Alternatively, a good fit of the theory to the data could
be obtained if the {\it amplitude} of the prediction was reduced by a factor
close to two. The results raise the question whether the boundary conditions
used in the theory should be modified.Comment: 4 pages, 4 figures, revte
Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision
Foundation models offer an exciting new paradigm for constructing models with
out-of-the-box embeddings and a few labeled examples. However, it is not clear
how to best apply foundation models without labeled data. A potential approach
is to fuse foundation models with weak supervision frameworks, which use weak
label sources -- pre-trained models, heuristics, crowd-workers -- to construct
pseudolabels. The challenge is building a combination that best exploits the
signal available in both foundation models and weak sources. We propose Liger,
a combination that uses foundation model embeddings to improve two crucial
elements of existing weak supervision techniques. First, we produce finer
estimates of weak source quality by partitioning the embedding space and
learning per-part source accuracies. Second, we improve source coverage by
extending source votes in embedding space. Despite the black-box nature of
foundation models, we prove results characterizing how our approach improves
performance and show that lift scales with the smoothness of label
distributions in embedding space. On six benchmark NLP and video tasks, Liger
outperforms vanilla weak supervision by 14.1 points, weakly-supervised kNN and
adapters by 11.8 points, and kNN and adapters supervised by traditional hand
labels by 7.2 points.Comment: UAI 2022 Camera Read
Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning
An ideal learned representation should display transferability and
robustness. Supervised contrastive learning (SupCon) is a promising method for
training accurate models, but produces representations that do not capture
these properties due to class collapse -- when all points in a class map to the
same representation. Recent work suggests that "spreading out" these
representations improves them, but the precise mechanism is poorly understood.
We argue that creating spread alone is insufficient for better representations,
since spread is invariant to permutations within classes. Instead, both the
correct degree of spread and a mechanism for breaking this invariance are
necessary. We first prove that adding a weighted class-conditional InfoNCE loss
to SupCon controls the degree of spread. Next, we study three mechanisms to
break permutation invariance: using a constrained encoder, adding a
class-conditional autoencoder, and using data augmentation. We show that the
latter two encourage clustering of latent subclasses under more realistic
conditions than the former. Using these insights, we show that adding a
properly-weighted class-conditional InfoNCE loss and a class-conditional
autoencoder to SupCon achieves 11.1 points of lift on coarse-to-fine transfer
across 5 standard datasets and 4.7 points on worst-group robustness on 3
datasets, setting state-of-the-art on CelebA by 11.5 points
- …