4,605 research outputs found
Flood dynamics derived from video remote sensing
Flooding is by far the most pervasive natural hazard, with the human impacts of floods expected to worsen in the coming decades due to climate change. Hydraulic models are a key tool for understanding flood dynamics and play a pivotal role in unravelling the processes that occur during a flood event, including inundation flow patterns and velocities. In the realm of river basin dynamics, video remote sensing is emerging as a transformative tool that can offer insights into flow dynamics and thus, together with other remotely sensed data, has the potential to be deployed to estimate discharge. Moreover, the integration of video remote sensing data with hydraulic models offers a pivotal opportunity to enhance the predictive capacity of these models.
Hydraulic models are traditionally built with accurate terrain, flow and bathymetric data and are often calibrated and validated using observed data to obtain meaningful and actionable model predictions. Data for accurately calibrating and validating hydraulic models are not always available, leaving the assessment of the predictive capabilities of some models deployed in flood risk management in question. Recent advances in remote sensing have heralded the availability of vast video datasets of high resolution. The parallel evolution of computing capabilities, coupled with advancements in artificial intelligence are enabling the processing of data at unprecedented scales and complexities, allowing us to glean meaningful insights into datasets that can be integrated with hydraulic models. The aims of the research presented in this thesis were twofold. The first aim was to evaluate and explore the potential applications of video from air- and space-borne platforms to comprehensively calibrate and validate two-dimensional hydraulic models. The second aim was to estimate river discharge using satellite video combined with high resolution topographic data. In the first of three empirical chapters, non-intrusive image velocimetry techniques were employed to estimate river surface velocities in a rural catchment. For the first time, a 2D hydraulicvmodel was fully calibrated and validated using velocities derived from Unpiloted Aerial Vehicle (UAV) image velocimetry approaches. This highlighted the value of these data in mitigating the limitations associated with traditional data sources used in parameterizing two-dimensional hydraulic models. This finding inspired the subsequent chapter where river surface velocities, derived using Large Scale Particle Image Velocimetry (LSPIV), and flood extents, derived using deep neural network-based segmentation, were extracted from satellite video and used to rigorously assess the skill of a two-dimensional hydraulic model. Harnessing the ability of deep neural networks to learn complex features and deliver accurate and contextually informed flood segmentation, the potential value of satellite video for validating two dimensional hydraulic model simulations is exhibited. In the final empirical chapter, the convergence of satellite video imagery and high-resolution topographical data bridges the gap between visual observations and quantitative measurements by enabling the direct extraction of velocities from video imagery, which is used to estimate river discharge. Overall, this thesis demonstrates the significant potential of emerging video-based remote sensing datasets and offers approaches for integrating these data into hydraulic modelling and discharge estimation practice. The incorporation of LSPIV techniques into flood modelling workflows signifies a methodological progression, especially in areas lacking robust data collection infrastructure. Satellite video remote sensing heralds a major step forward in our ability to observe river dynamics in real time, with potentially significant implications in the domain of flood modelling science
A Spark Of Emotion: The Impact of Electrical Facial Muscle Activation on Emotional State and Affective Processing
Facial feedback, which involves the brain receiving information about the activation of facial muscles, has the potential to influence our emotional states and judgments. The extent to which this applies is still a matter of debate, particularly considering a failed replication of a seminal study. One factor contributing to the lack of replication in facial feedback effects may be the imprecise manipulation of facial muscle activity in terms of both degree and timing. To overcome these limitations, this thesis proposes a non-invasive method for inducing precise facial muscle contractions, called facial neuromuscular electrical stimulation (fNMES). I begin by presenting a systematic literature review that lays the groundwork for standardising the use of fNMES in psychological research, by evaluating its application in existing studies. This review highlights two issues, the lack of use of fNMES in psychology research and the lack of parameter reporting. I provide practical recommendations for researchers interested in implementing fNMES. Subsequently, I conducted an online experiment to investigate participants' willingness to participate in fNMES research. This experiment revealed that concerns over potential burns and involuntary muscle movements are significant deterrents to participation. Understanding these anxieties is critical for participant management and expectation setting. Subsequently, two laboratory studies are presented that investigated the facial FFH using fNMES. The first study showed that feelings of happiness and sadness, and changes in peripheral physiology, can be induced by stimulating corresponding facial muscles with 5–seconds of fNMES. The second experiment showed that fNMES-induced smiling alters the perception of ambiguous facial emotions, creating a bias towards happiness, and alters neural correlates of face processing, as measured with event-related potentials (ERPs). In summary, the thesis presents promising results for testing the facial feedback hypothesis with fNMES and provides practical guidelines and recommendations for researchers interested in using fNMES for psychological research
A flexible deep learning crater detection scheme using Segment Anything Model (SAM)
Peer reviewedPublisher PD
Neural Architecture Search for Image Segmentation and Classification
Deep learning (DL) is a class of machine learning algorithms that relies on deep neural networks (DNNs) for computations. Unlike traditional machine learning algorithms, DL can learn from raw data directly and effectively. Hence, DL has been successfully applied to tackle many real-world problems. When applying DL to a given problem, the primary task is designing the optimum DNN. This task relies heavily on human expertise, is time-consuming, and requires many trial-and-error experiments.
This thesis aims to automate the laborious task of designing the optimum DNN by exploring the neural architecture search (NAS) approach. Here, we propose two new NAS algorithms for two real-world problems: pedestrian lane detection for assistive navigation and hyperspectral image segmentation for biosecurity scanning. Additionally, we also introduce a new dataset-agnostic predictor of neural network performance, which can be used to speed-up NAS algorithms that require the evaluation of candidate DNNs
Posthuman Creative Styling can a creative writer’s style of writing be described as procedural?
This thesis is about creative styling — the styling a creative writer might use to make their writing
unique. It addresses the question as to whether such styling can be described as procedural. Creative
styling is part of the technique a creative writer uses when writing. It is how they make the text more
‘lively’ by use of tips and tricks they have either learned or discovered. In essence these are rules, ones
the writer accrues over time by their practice. The thesis argues that the use and invention of these
rules can be set as procedures. and so describe creative styling as procedural.
The thesis follows from questioning why it is that machines or algorithms have, so far, been
incapable of producing creative writing which has value. Machine-written novels do not abound on
the bookshelves and writing styled by computers is, on the whole, dull in comparison to human-crafted
literature. It came about by thinking how it would be possible to reach a point where writing by people
and procedural writing are considered to have equal value. For this reason the thesis is set in a
posthuman context, where the differences between machines and people are erased.
The thesis uses practice to inform an original conceptual space model, based on quality dimensions
and dynamic-inter operation of spaces. This model gives an example of the procedures which a
posthuman creative writer uses when engaged in creative styling. It suggests an original formulation
for the conceptual blending of conceptual spaces, based on the casting of qualities from one space to
another. In support of and informing its arguments are ninety-nine examples of creative writing
practice which show the procedures by which style has been applied, created and assessed. It provides
a route forward for further joint research into both computational and human-coded creative writing
Learning Situation Hyper-Graphs for Video Question Answering
Answering questions about complex situations in videos requires not only
capturing the presence of actors, objects, and their relations but also the
evolution of these relationships over time. A situation hyper-graph is a
representation that describes situations as scene sub-graphs for video frames
and hyper-edges for connected sub-graphs and has been proposed to capture all
such information in a compact structured form. In this work, we propose an
architecture for Video Question Answering (VQA) that enables answering
questions related to video content by predicting situation hyper-graphs, coined
Situation Hyper-Graph based Video Question Answering (SHG-VQA). To this end, we
train a situation hyper-graph decoder to implicitly identify graph
representations with actions and object/human-object relationships from the
input video clip. and to use cross-attention between the predicted situation
hyper-graphs and the question embedding to predict the correct answer. The
proposed method is trained in an end-to-end manner and optimized by a VQA loss
with the cross-entropy function and a Hungarian matching loss for the situation
graph prediction. The effectiveness of the proposed architecture is extensively
evaluated on two challenging benchmarks: AGQA and STAR. Our results show that
learning the underlying situation hyper-graphs helps the system to
significantly improve its performance for novel challenges of video
question-answering tasks
Contrastive Audio-Visual Masked Autoencoder
In this paper, we first extend the recent Masked Auto-Encoder (MAE) model
from a single modality to audio-visual multi-modalities. Subsequently, we
propose the Contrastive Audio-Visual Masked Auto-Encoder (CAV-MAE) by combining
contrastive learning and masked data modeling, two major self-supervised
learning frameworks, to learn a joint and coordinated audio-visual
representation. Our experiments show that the contrastive audio-visual
correspondence learning objective not only enables the model to perform
audio-visual retrieval tasks, but also helps the model learn a better joint
representation. As a result, our fully self-supervised pretrained CAV-MAE
achieves a new SOTA accuracy of 65.9% on VGGSound, and is comparable with the
previous best supervised pretrained model on AudioSet in the audio-visual event
classification task. Code and pretrained models are at
https://github.com/yuangongnd/cav-mae.Comment: Accepted at ICLR 2023 as a notable top 25% paper. Code and pretrained
models are at https://github.com/yuangongnd/cav-ma
Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models
Visual Question Answering is a challenging task, as it requires seamless
interaction between perceptual, linguistic, and background knowledge systems.
While the recent progress of visual and natural language models like BLIP has
led to improved performance on this task, we lack understanding of the ability
of such models to perform on different kinds of questions and reasoning types.
As our initial analysis of BLIP-family models revealed difficulty with
answering fine-detail questions, we investigate the following question: Can
visual cropping be employed to improve the performance of state-of-the-art
visual question answering models on fine-detail questions? Given the recent
success of the BLIP-family models, we study a zero-shot and a fine-tuned BLIP
model. We define three controlled subsets of the popular VQA-v2 benchmark to
measure whether cropping can help model performance. Besides human cropping, we
devise two automatic cropping strategies based on multi-modal embedding by CLIP
and BLIP visual QA model gradients. Our experiments demonstrate that the
performance of BLIP model variants can be significantly improved through human
cropping, and automatic cropping methods can produce comparable benefits. A
deeper dive into our findings indicates that the performance enhancement is
more pronounced in zero-shot models than in fine-tuned models and more salient
with smaller bounding boxes than larger ones. We perform case studies to
connect quantitative differences with qualitative observations across question
types and datasets. Finally, we see that the cropping enhancement is robust, as
we gain an improvement of 4.59% (absolute) in the general VQA-random task by
simply inputting a concatenation of the original and gradient-based cropped
images. We make our code available to facilitate further innovation on visual
cropping methods for question answering.Comment: 16 pages, 5 figures, 7 table
Automated identification and behaviour classification for modelling social dynamics in group-housed mice
Mice are often used in biology as exploratory models of human conditions, due to their similar genetics and physiology. Unfortunately, research on behaviour has traditionally been limited to studying individuals in isolated environments and over short periods of time. This can miss critical time-effects, and, since mice are social creatures, bias results.
This work addresses this gap in research by developing tools to analyse the individual behaviour of group-housed mice in the home-cage over several days and with minimal disruption. Using data provided by the Mary Lyon Centre at MRC Harwell we designed an end-to-end system that (a) tracks and identifies mice in a cage, (b) infers their behaviour, and subsequently (c) models the group dynamics as functions of individual activities. In support of the above, we also curated and made available a large dataset of mouse localisation and behaviour classifications (IMADGE), as well as two smaller annotated datasets for training/evaluating the identification (TIDe) and behaviour inference (ABODe) systems. This research constitutes the first of its kind in terms of the scale and challenges addressed. The data source (side-view single-channel video with clutter and no identification markers for mice) presents challenging conditions for analysis, but has the potential to give richer information while using industry standard housing.
A Tracking and Identification module was developed to automatically detect, track and identify the (visually similar) mice in the cluttered home-cage using only single-channel IR video and coarse position from RFID readings. Existing detectors and trackers were combined with a novel Integer Linear Programming formulation to assign anonymous tracks to mouse identities. This utilised a probabilistic weight model of affinity between detections and RFID pickups.
The next task necessitated the implementation of the Activity Labelling module that classifies the behaviour of each mouse, handling occlusion to avoid giving unreliable classifications when the mice cannot be observed. Two key aspects of this were (a) careful feature-selection, and (b) judicious balancing of the errors of the system in line with the repercussions for our setup.
Given these sequences of individual behaviours, we analysed the interaction dynamics between mice in the same cage by collapsing the group behaviour into a sequence of interpretable latent regimes using both static and temporal (Markov) models. Using a permutation matrix, we were able to automatically assign mice to roles in the HMM, fit a global model to a group of cages and analyse abnormalities in data from a different demographic
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge
Given an audio-visual pair, audio-visual segmentation (AVS) aims to locate
sounding sources by predicting pixel-wise maps. Previous methods assume that
each sound component in an audio signal always has a visual counterpart in the
image. However, this assumption overlooks that off-screen sounds and background
noise often contaminate the audio recordings in real-world scenarios. They
impose significant challenges on building a consistent semantic mapping between
audio and visual signals for AVS models and thus impede precise sound
localization. In this work, we propose a two-stage bootstrapping audio-visual
segmentation framework by incorporating multi-modal foundation knowledge. In a
nutshell, our BAVS is designed to eliminate the interference of background
noise or off-screen sounds in segmentation by establishing the audio-visual
correspondences in an explicit manner. In the first stage, we employ a
segmentation model to localize potential sounding objects from visual data
without being affected by contaminated audio signals. Meanwhile, we also
utilize a foundation audio classification model to discern audio semantics.
Considering the audio tags provided by the audio foundation model are noisy,
associating object masks with audio tags is not trivial. Thus, in the second
stage, we develop an audio-visual semantic integration strategy (AVIS) to
localize the authentic-sounding objects. Here, we construct an audio-visual
tree based on the hierarchical correspondence between sounds and object
categories. We then examine the label concurrency between the localized objects
and classified audio tags by tracing the audio-visual tree. With AVIS, we can
effectively segment real-sounding objects. Extensive experiments demonstrate
the superiority of our method on AVS datasets, particularly in scenarios
involving background noise. Our project website is
https://yenanliu.github.io/AVSS.github.io/
- …