1,458 research outputs found

    FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

    Full text link
    Convolution models with long filters have demonstrated state-of-the-art reasoning abilities in many long-sequence tasks but lag behind the most optimized Transformers in wall-clock time. A major bottleneck is the Fast Fourier Transform (FFT)--which allows long convolutions to run in O(NlogN)O(N logN) time in sequence length NN but has poor hardware utilization. In this paper, we study how to optimize the FFT convolution. We find two key bottlenecks: the FFT does not effectively use specialized matrix multiply units, and it incurs expensive I/O between layers of the memory hierarchy. In response, we propose FlashFFTConv. FlashFFTConv uses a matrix decomposition that computes the FFT using matrix multiply units and enables kernel fusion for long sequences, reducing I/O. We also present two sparse convolution algorithms--1) partial convolutions and 2) frequency-sparse convolutions--which can be implemented simply by skipping blocks in the matrix decomposition, enabling further opportunities for memory and compute savings. FlashFFTConv speeds up exact FFT convolutions by up to 7.93×\times over PyTorch and achieves up to 4.4×\times speedup end-to-end. Given the same compute budget, FlashFFTConv allows Hyena-GPT-s to achieve 2.3 points better perplexity on the PILE and M2-BERT-base to achieve 3.3 points higher GLUE score--matching models with twice the parameter count. FlashFFTConv also achieves 96.1% accuracy on Path-512, a high-resolution vision task where no model had previously achieved better than 50%. Furthermore, partial convolutions enable longer-sequence models--yielding the first DNA model that can process the longest human genes (2.3M base pairs)--and frequency-sparse convolutions speed up pretrained models while maintaining or improving model quality

    Mapping the unconventional orbital texture in topological crystalline insulators

    Get PDF
    The newly discovered topological crystalline insulators (TCIs) harbor a complex band structure involving multiple Dirac cones. These materials are potentially highly tunable by external electric field, temperature or strain and could find future applications in field-effect transistors, photodetectors, and nano-mechanical systems. Theoretically, it has been predicted that different Dirac cones, offset in energy and momentum-space, might harbor vastly different orbital character, a unique property which if experimentally realized, would present an ideal platform for accomplishing new spintronic devices. However, the orbital texture of the Dirac cones, which is of immense importance in determining a variety of materials properties, still remains elusive in TCIs. Here, we unveil the orbital texture in a prototypical TCI Pb1−x_{1-x}Snx_xSe. By using Fourier-transform (FT) scanning tunneling spectroscopy (STS) we measure the interference patterns produced by the scattering of surface state electrons. We discover that the intensity and energy dependences of FTs show distinct characteristics, which can directly be attributed to orbital effects. Our experiments reveal the complex band topology involving two Lifshitz transitions and establish the orbital nature of the Dirac bands in this new class of topological materials, which could provide a different pathway towards future quantum applications

    Semi-Supervised Learning for Sparsely-Labeled Sequential Data: Application to Healthcare Video Processing

    Full text link
    Labeled data is a critical resource for training and evaluating machine learning models. However, many real-life datasets are only partially labeled. We propose a semi-supervised machine learning training strategy to improve event detection performance on sequential data, such as video recordings, when only sparse labels are available, such as event start times without their corresponding end times. Our method uses noisy guesses of the events' end times to train event detection models. Depending on how conservative these guesses are, mislabeled false positives may be introduced into the training set (i.e., negative sequences mislabeled as positives). We further propose a mathematical model for estimating how many inaccurate labels a model is exposed to, based on how noisy the end time guesses are. Finally, we show that neural networks can improve their detection performance by leveraging more training data with less conservative approximations despite the higher proportion of incorrect labels. We adapt sequential versions of MNIST and CIFAR-10 to empirically evaluate our method, and find that our risk-tolerant strategy outperforms conservative estimates by 12 points of mean average precision for MNIST, and 3.5 points for CIFAR. Then, we leverage the proposed training strategy to tackle a real-life application: processing continuous video recordings of epilepsy patients to improve seizure detection, and show that our method outperforms baseline labeling methods by 10 points of average precision

    Singularity in the boundary resistance between superfluid 4^4He and a solid surface

    Full text link
    We report new measurements in four cells of the thermal boundary resistance RR between copper and 4^4He below but near the superfluid-transition temperature TλT_\lambda. For 10−7≤t≡1−T/Tλ≤10−410^{-7} \leq t \equiv 1 - T/T_\lambda \leq 10^{-4} fits of R=R0txb+B0R = R_0 t^{x_b} + B_0 to the data yielded xb≃0.18x_b \simeq 0.18, whereas a fit to theoretical values based on the renormalization-group theory yielded xb=0.23x_b = 0.23. Alternatively, a good fit of the theory to the data could be obtained if the {\it amplitude} of the prediction was reduced by a factor close to two. The results raise the question whether the boundary conditions used in the theory should be modified.Comment: 4 pages, 4 figures, revte

    Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision

    Full text link
    Foundation models offer an exciting new paradigm for constructing models with out-of-the-box embeddings and a few labeled examples. However, it is not clear how to best apply foundation models without labeled data. A potential approach is to fuse foundation models with weak supervision frameworks, which use weak label sources -- pre-trained models, heuristics, crowd-workers -- to construct pseudolabels. The challenge is building a combination that best exploits the signal available in both foundation models and weak sources. We propose Liger, a combination that uses foundation model embeddings to improve two crucial elements of existing weak supervision techniques. First, we produce finer estimates of weak source quality by partitioning the embedding space and learning per-part source accuracies. Second, we improve source coverage by extending source votes in embedding space. Despite the black-box nature of foundation models, we prove results characterizing how our approach improves performance and show that lift scales with the smoothness of label distributions in embedding space. On six benchmark NLP and video tasks, Liger outperforms vanilla weak supervision by 14.1 points, weakly-supervised kNN and adapters by 11.8 points, and kNN and adapters supervised by traditional hand labels by 7.2 points.Comment: UAI 2022 Camera Read

    Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

    Full text link
    An ideal learned representation should display transferability and robustness. Supervised contrastive learning (SupCon) is a promising method for training accurate models, but produces representations that do not capture these properties due to class collapse -- when all points in a class map to the same representation. Recent work suggests that "spreading out" these representations improves them, but the precise mechanism is poorly understood. We argue that creating spread alone is insufficient for better representations, since spread is invariant to permutations within classes. Instead, both the correct degree of spread and a mechanism for breaking this invariance are necessary. We first prove that adding a weighted class-conditional InfoNCE loss to SupCon controls the degree of spread. Next, we study three mechanisms to break permutation invariance: using a constrained encoder, adding a class-conditional autoencoder, and using data augmentation. We show that the latter two encourage clustering of latent subclasses under more realistic conditions than the former. Using these insights, we show that adding a properly-weighted class-conditional InfoNCE loss and a class-conditional autoencoder to SupCon achieves 11.1 points of lift on coarse-to-fine transfer across 5 standard datasets and 4.7 points on worst-group robustness on 3 datasets, setting state-of-the-art on CelebA by 11.5 points
    • …
    corecore