Search CORE

3,730 research outputs found

Streams, Stream Transformers and Domain Representations

Author: A. Grzegorczyk
A. I. Malcev
D. Lacombe
G. Kahn
G. S. Ceitin
H. Rogers Jr
J. Blanck
J. Blanck
J. V. Tucker
J. V. Tucker
K. Weihrauch
M. B. Pour-El
M. O. Rabin
N. A. Harman
N. A. Harman
N. A. Harman
P. Gianantonio di
R. Stephens
V. Brattka
V. Stoltenberg-Hansen
V. Stoltenberg-Hansen
Y. N. Moscoschovakis
Yu. L. Ershov
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Stream Fusion, to Completeness

Author: ACM
Biboudis A.
Biboudis A.
Jones S. Peyton
Kiselyov O.
Pouzet M.
Prokopec A.
Taha W.
Waters R. C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/12/2016
Field of study

Stream processing is mainstream (again): Widely-used stream libraries are now available for virtually all modern OO and functional languages, from Java to C# to Scala to OCaml to Haskell. Yet expressivity and performance are still lacking. For instance, the popular, well-optimized Java 8 streams do not support the zip operator and are still an order of magnitude slower than hand-written loops. We present the first approach that represents the full generality of stream processing and eliminates overheads, via the use of staging. It is based on an unusually rich semantic model of stream interaction. We support any combination of zipping, nesting (or flat-mapping), sub-ranging, filtering, mapping-of finite or infinite streams. Our model captures idiosyncrasies that a programmer uses in optimizing stream pipelines, such as rate differences and the choice of a "for" vs. "while" loops. Our approach delivers hand-written-like code, but automatically. It explicitly avoids the reliance on black-box optimizers and sufficiently-smart compilers, offering highest, guaranteed and portable performance. Our approach relies on high-level concepts that are then readily mapped into an implementation. Accordingly, we have two distinct implementations: an OCaml stream library, staged via MetaOCaml, and a Scala library for the JVM, staged via LMS. In both cases, we derive libraries richer and simultaneously many tens of times faster than past work. We greatly exceed in performance the standard stream libraries available in Java, Scala and OCaml, including the well-optimized Java 8 streams

arXiv.org e-Print Archive

Crossref

Audio-Visual Egocentric Action Recognition

Author: Kazakos Evangelos
Publication venue
Publication date: 21/06/2022
Field of study

Explore Bristol Research

Two-Stream Transformer Architecture for Long Video Understanding

Author: Fish Edward
Gilbert Andrew
Weinbren Jon
Publication venue
Publication date: 02/08/2022
Field of study

Pure vision transformer architectures are highly effective for short video classification and action recognition tasks. However, due to the quadratic complexity of self attention and lack of inductive bias, transformers are resource intensive and suffer from data inefficiencies. Long form video understanding tasks amplify data and memory efficiency problems in transformers making current approaches unfeasible to implement on data or memory restricted domains. This paper introduces an efficient Spatio-Temporal Attention Network (STAN) which uses a two-stream transformer architecture to model dependencies between static image features and temporal contextual features. Our proposed approach can classify videos up to two minutes in length on a single GPU, is data efficient, and achieves SOTA performance on several long video understanding tasks

arXiv.org e-Print Archive

SODFormer: Streaming Object Detection with Transformer Using Events and Frames

Author: Li Dianze
Li Jianing
Tian Yonghong
Publication venue
Publication date: 08/08/2023
Field of study

DAVIS camera, streaming two complementary sensing modalities of asynchronous events and frames, has gradually been used to address major object detection challenges (e.g., fast motion blur and low-light). However, how to effectively leverage rich temporal cues and fuse two heterogeneous visual streams remains a challenging endeavor. To address this challenge, we propose a novel streaming object detector with Transformer, namely SODFormer, which first integrates events and frames to continuously detect objects in an asynchronous manner. Technically, we first build a large-scale multimodal neuromorphic object detection dataset (i.e., PKU-DAVIS-SOD) over 1080.1k manual labels. Then, we design a spatiotemporal Transformer architecture to detect objects via an end-to-end sequence prediction problem, where the novel temporal Transformer module leverages rich temporal cues from two visual streams to improve the detection performance. Finally, an asynchronous attention-based fusion module is proposed to integrate two heterogeneous sensing modalities and take complementary advantages from each end, which can be queried at any time to locate objects and break through the limited output frequency from synchronized frame-based fusion strategies. The results show that the proposed SODFormer outperforms four state-of-the-art methods and our eight baselines by a significant margin. We also show that our unifying framework works well even in cases where the conventional frame-based camera fails, e.g., high-speed motion and low-light conditions. Our dataset and code can be available at https://github.com/dianzl/SODFormer.Comment: 18 pages, 15 figures, in IEEE Transactions on Pattern Analysis and Machine Intelligenc

arXiv.org e-Print Archive

Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs

Author: Bugliarello Emanuele
Cotterell Ryan
Elliott Desmond
Okazaki Naoaki
Publication venue
Publication date: 01/01/2021
Field of study

Large-scale pretraining and task-specific fine-tuning is now the standard methodology for many tasks in computer vision and natural language processing. Recently, a multitude of methods have been proposed for pretraining vision and language BERTs to tackle challenges at the intersection of these two key areas of AI. These models can be categorised into either single-stream or dual-stream encoders. We study the differences between these two categories, and show how they can be unified under a single theoretical framework. We then conduct controlled experiments to discern the empirical differences between five V&L BERTs. Our experiments show that training data and hyperparameters are responsible for most of the differences between the reported results, but they also reveal that the embedding layer plays a crucial role in these massive models.Comment: To appear in TACL 202

arXiv.org e-Print Archive

Repository for Publications and Research Data

Copenhagen University Research Information System

Dual-Stream Attention Transformers for Sewer Defect Classification

Author: Abdeldguerfi Mahdi
Newaz Abdullah Al Redwan
Niles Kendall N.
Tom Joe
Publication venue
Publication date: 06/11/2023
Field of study

We propose a dual-stream multi-scale vision transformer (DS-MSHViT) architecture that processes RGB and optical flow inputs for efficient sewer defect classification. Unlike existing methods that combine the predictions of two separate networks trained on each modality, we jointly train a single network with two branches for RGB and motion. Our key idea is to use self-attention regularization to harness the complementary strengths of the RGB and motion streams. The motion stream alone struggles to generate accurate attention maps, as motion images lack the rich visual features present in RGB images. To facilitate this, we introduce an attention consistency loss between the dual streams. By leveraging motion cues through a self-attention regularizer, we align and enhance RGB attention maps, enabling the network to concentrate on pertinent input regions. We evaluate our data on a public dataset as well as cross-validate our model performance in a novel dataset. Our method outperforms existing models that utilize either convolutional neural networks (CNNs) or multi-scale hybrid vision transformers (MSHViTs) without employing attention regularization between the two streams

arXiv.org e-Print Archive