22 research outputs found
Audiovisual Moments in Time:A large-scale annotated dataset of audiovisual actions
We present Audiovisual Moments in Time (AVMIT), a large-scale dataset of audiovisual action events. In an extensive annotation task 11 participants labelled a subset of 3-second audiovisual videos from the Moments in Time dataset (MIT). For each trial, participants assessed whether the labelled audiovisual action event was present and whether it was the most prominent feature of the video. The dataset includes the annotation of 57,177 audiovisual videos, each independently evaluated by 3 of 11 trained participants. From this initial collection, we created a curated test set of 16 distinct action classes, with 60 videos each (960 videos). We also offer 2 sets of pre-computed audiovisual feature embeddings, using VGGish/YamNet for audio data and VGG16/EfficientNetB0 for visual data, thereby lowering the barrier to entry for audiovisual DNN research. We explored the advantages of AVMIT annotations and feature embeddings to improve performance on audiovisual event recognition. A series of 6 Recurrent Neural Networks (RNNs) were trained on either AVMIT-filtered audiovisual events or modality-agnostic events from MIT, and then tested on our audiovisual test set. In all RNNs, top 1 accuracy was increased by 2.71-5.94% by training exclusively on audiovisual events, even outweighing a three-fold increase in training data. Additionally, we introduce the Supervised Audiovisual Correspondence (SAVC) task whereby a classifier must discern whether audio and visual streams correspond to the same action label. We trained 6 RNNs on the SAVC task, with or without AVMIT-filtering, to explore whether AVMIT is helpful for cross-modal learning. In all RNNs, accuracy improved by 2.09-19.16% with AVMIT-filtered data. We anticipate that the newly annotated AVMIT dataset will serve as a valuable resource for research and comparative experiments involving computational models and human participants, specifically when addressing research questions where audiovisual correspondence is of critical importance
Recommended from our members
CHERI Concentrate: Practical Compressed Capabilities
We present CHERI Concentrate, a new fat-pointer compression scheme applied to CHERI, the most developed capability-pointer system at present. Capability fat-pointers are a primary candidate for enforcing fine-grained and non-bypassable security properties in future computer systems, although increased pointer size can severely affect performance. Thus, several proposals for capability compression have been suggested but these did not support legacy instruction sets, ignored features critical to the existing software base, and also introduced design inefficiencies to RISC-style processor pipelines. CHERI Concentrate improves on the state-of-the-art region-encoding efficiency, solves important pipeline problems, and eases semantic restrictions of compressed encoding, allowing it to protect a full legacy software stack. We analyze and extend logic from the open-source CHERI prototype processor design on FPGA to demonstrate encoding efficiency, minimize delay of pointer arithmetic, and eliminate additional load-to-use delay. To verify correctness of our proposed high-performance logic, we present a HOL4 machine-checked proof of the decode and pointer-modify operations. Finally, we measure a 50%-75% reduction in L2 misses for many compiled C-language benchmarks running under a commodity operating system using compressed 128-bit and 64-bit formats, demonstrating both compatibility with and increased performance over the uncompressed, 256-bit format
Recommended from our members
CHERI JNI: Sinking the Java Security Model into the C
Java provides security and robustness by building a high- level security model atop the foundation of memory protection. Unfortunately, any native code linked into a Java program – including the million lines used to implement the standard library – is able to bypass both the memory protection and the higher-level policies. We present a hardware-assisted implementation of the Java native code interface, which extends the guarantees required for Java’s security model to native code.
Our design supports safe direct access to buffers owned by the JVM, including hardware-enforced read-only access where appropriate. We also present Java language syntax to declaratively describe isolated compartments for native code.
We show that it is possible to preserve the memory safety and isolation requirements of the Java security model in C code, allowing native code to run in the same process as Java code with the same impact on security as running equivalent Java code. Our approach has a negligible impact on performance, compared with the existing unsafe native code interface. We demonstrate a prototype implementation running on the CHERI microprocessor synthesized in FPGA.Defense Advanced Research Projects Agency
Google, Inc.
Isaac Newton Trust
Thales E-Securit
Loss of endogenous thymosin β4 accelerates glomerular disease
Glomerular disease is characterized by morphologic changes in podocyte cells accompanied by inflammation and fibrosis. Thymosin regulates cell morphology, inflammation, and fibrosis in several organs and administration of exogenous thymosin improves animal models of unilateral ureteral obstruction and diabetic nephropathy. However, the role of endogenous thymosin in the kidney is unknown. We demonstrate that thymosin β4 is expressed prominently in podocytes of developing and adult mouse glomeruli. Global loss of thymosin did not affect healthy glomeruli, but accelerated the severity of immune-mediated nephrotoxic nephritis with worse renal function, periglomerular inflammation, and fibrosis. Lack of thymosin in nephrotoxic nephritis led to the redistribution of podocytes from the glomerular tuft toward the Bowman capsule suggesting a role for thymosin in the migration of these cells. Thymosin knockdown in cultured podocytes also increased migration in a wound-healing assay, accompanied by F-actin rearrangement and increased RhoA activity. We propose that endogenous thymosin is a modifier of glomerular injury, likely having a protective role acting as a brake to slow disease progression
Dual-stream recurrent convolutional neural networks as models of human audiovisual perception
Multisensory perception allows humans to operate successfully in the world. Increasingly, deep neural networks (DNNs) are used as models of human unisensory perception. In this work, we take some of the first steps to extend this line of research from the unisensory to the multisensory
domain, specifically, audiovisual perception. First, we produce a highly-controlled, large, labelled dataset of audiovisual action events for human vs DNN studies. Next, we introduce a novel deep neural network architecture that we name a ‘dual-stream recurrent convolutional neural network’ (DRCNN), consisting of 2 component CNNs joined by a novel ‘multimodal squeeze unit’ and fed into an RNN. We develop a series of these architectures, leveraging a number of pretrained state-of-the-art CNNs, and train a number of instances of each, producing a series of classifiers. We find that, after optimising 12 classifier instances on audiovisual action recognition, all classifiers are able to solve the audiovisual correspondence problem, indicating that this ability may be a consequence of the task constraints. Further, we find that these classifiers are highly affected by signals in the unattended to modality during unimodal classification tasks, demonstrating a high level of integration across modalities. Further experiments revealed that dual-stream RCNN classifiers perform significantly worse than humans on a visual-only action recognition task when stimuli was clean or distorted by Gaussian noise or Gaussian blur. Both classifiers and humans were able to leverage audio information to increase their levels of performance in the clean condition, and to significantly decrease the effect of visual distortion on their audiovisual performances. Indeed, 5/6 classifiers performed within the range of human performance on clean audiovisual stimuli, and 3/6 maintained human level performance when low levels of Gaussian noise were introduced
Stereoselective synthesis and cyclisation of the acyclic precursor to auripyrone A and B
Book review: Emma Liggins, George Gissing, the working woman and urban culture / Susan Hamilton, Frances Power Cobbe and Victorian feminism
Reviews of Emma Liggins's book on George Gissing and Susan Hamilton's on Frances Power Cobb
Leveraging domain expertise in architectural exploration
Domain experience is a key driver behind design quality, especially during the early design phases of a product or service. Currently, the only practical way to bring such experience into a project is to directly engage subject matter experts, which means there is the potential for a resource availability bottleneck because the experts are not available when required. Whilst many domain specific tools have attempted to capture expert knowledge in embedded analytics thus allowing less experienced engineers to perform complex tasks, this is certainly not the case for highly complex systems of systems where their architectures can go far beyond what a single human being can comprehend. This paper proposes a new approach to leveraging design expertise in a manner that facilitates architectural exploration and architecture optimization by using pre-defined architecture patterns. In addition, we propose a means to streamline such a process by delineating the knowledge creation process and architectural exploration analytics with the means to facilitate information flow from the former to the latter through a carefuly designed integration framework