22 research outputs found

    Audiovisual Moments in Time:A large-scale annotated dataset of audiovisual actions

    Get PDF
    We present Audiovisual Moments in Time (AVMIT), a large-scale dataset of audiovisual action events. In an extensive annotation task 11 participants labelled a subset of 3-second audiovisual videos from the Moments in Time dataset (MIT). For each trial, participants assessed whether the labelled audiovisual action event was present and whether it was the most prominent feature of the video. The dataset includes the annotation of 57,177 audiovisual videos, each independently evaluated by 3 of 11 trained participants. From this initial collection, we created a curated test set of 16 distinct action classes, with 60 videos each (960 videos). We also offer 2 sets of pre-computed audiovisual feature embeddings, using VGGish/YamNet for audio data and VGG16/EfficientNetB0 for visual data, thereby lowering the barrier to entry for audiovisual DNN research. We explored the advantages of AVMIT annotations and feature embeddings to improve performance on audiovisual event recognition. A series of 6 Recurrent Neural Networks (RNNs) were trained on either AVMIT-filtered audiovisual events or modality-agnostic events from MIT, and then tested on our audiovisual test set. In all RNNs, top 1 accuracy was increased by 2.71-5.94% by training exclusively on audiovisual events, even outweighing a three-fold increase in training data. Additionally, we introduce the Supervised Audiovisual Correspondence (SAVC) task whereby a classifier must discern whether audio and visual streams correspond to the same action label. We trained 6 RNNs on the SAVC task, with or without AVMIT-filtering, to explore whether AVMIT is helpful for cross-modal learning. In all RNNs, accuracy improved by 2.09-19.16% with AVMIT-filtered data. We anticipate that the newly annotated AVMIT dataset will serve as a valuable resource for research and comparative experiments involving computational models and human participants, specifically when addressing research questions where audiovisual correspondence is of critical importance

    Loss of endogenous thymosin β4 accelerates glomerular disease

    Get PDF
    Glomerular disease is characterized by morphologic changes in podocyte cells accompanied by inflammation and fibrosis. Thymosin β4\beta_4 regulates cell morphology, inflammation, and fibrosis in several organs and administration of exogenous thymosin β4\beta_4 improves animal models of unilateral ureteral obstruction and diabetic nephropathy. However, the role of endogenous thymosin β4\beta_4 in the kidney is unknown. We demonstrate that thymosin β4 is expressed prominently in podocytes of developing and adult mouse glomeruli. Global loss of thymosin β4\beta_4 did not affect healthy glomeruli, but accelerated the severity of immune-mediated nephrotoxic nephritis with worse renal function, periglomerular inflammation, and fibrosis. Lack of thymosin β4\beta_4 in nephrotoxic nephritis led to the redistribution of podocytes from the glomerular tuft toward the Bowman capsule suggesting a role for thymosin β4\beta_4 in the migration of these cells. Thymosin β4\beta_4 knockdown in cultured podocytes also increased migration in a wound-healing assay, accompanied by F-actin rearrangement and increased RhoA activity. We propose that endogenous thymosin β4\beta_4 is a modifier of glomerular injury, likely having a protective role acting as a brake to slow disease progression

    Dual-stream recurrent convolutional neural networks as models of human audiovisual perception

    No full text
    Multisensory perception allows humans to operate successfully in the world. Increasingly, deep neural networks (DNNs) are used as models of human unisensory perception. In this work, we take some of the first steps to extend this line of research from the unisensory to the multisensory domain, specifically, audiovisual perception. First, we produce a highly-controlled, large, labelled dataset of audiovisual action events for human vs DNN studies. Next, we introduce a novel deep neural network architecture that we name a ‘dual-stream recurrent convolutional neural network’ (DRCNN), consisting of 2 component CNNs joined by a novel ‘multimodal squeeze unit’ and fed into an RNN. We develop a series of these architectures, leveraging a number of pretrained state-of-the-art CNNs, and train a number of instances of each, producing a series of classifiers. We find that, after optimising 12 classifier instances on audiovisual action recognition, all classifiers are able to solve the audiovisual correspondence problem, indicating that this ability may be a consequence of the task constraints. Further, we find that these classifiers are highly affected by signals in the unattended to modality during unimodal classification tasks, demonstrating a high level of integration across modalities. Further experiments revealed that dual-stream RCNN classifiers perform significantly worse than humans on a visual-only action recognition task when stimuli was clean or distorted by Gaussian noise or Gaussian blur. Both classifiers and humans were able to leverage audio information to increase their levels of performance in the clean condition, and to significantly decrease the effect of visual distortion on their audiovisual performances. Indeed, 5/6 classifiers performed within the range of human performance on clean audiovisual stimuli, and 3/6 maintained human level performance when low levels of Gaussian noise were introduced

    Leveraging domain expertise in architectural exploration

    No full text
    Domain experience is a key driver behind design quality, especially during the early design phases of a product or service. Currently, the only practical way to bring such experience into a project is to directly engage subject matter experts, which means there is the potential for a resource availability bottleneck because the experts are not available when required. Whilst many domain specific tools have attempted to capture expert knowledge in embedded analytics thus allowing less experienced engineers to perform complex tasks, this is certainly not the case for highly complex systems of systems where their architectures can go far beyond what a single human being can comprehend. This paper proposes a new approach to leveraging design expertise in a manner that facilitates architectural exploration and architecture optimization by using pre-defined architecture patterns. In addition, we propose a means to streamline such a process by delineating the knowledge creation process and architectural exploration analytics with the means to facilitate information flow from the former to the latter through a carefuly designed integration framework
    corecore