5,266 research outputs found

    Flood dynamics derived from video remote sensing

    Get PDF
    Flooding is by far the most pervasive natural hazard, with the human impacts of floods expected to worsen in the coming decades due to climate change. Hydraulic models are a key tool for understanding flood dynamics and play a pivotal role in unravelling the processes that occur during a flood event, including inundation flow patterns and velocities. In the realm of river basin dynamics, video remote sensing is emerging as a transformative tool that can offer insights into flow dynamics and thus, together with other remotely sensed data, has the potential to be deployed to estimate discharge. Moreover, the integration of video remote sensing data with hydraulic models offers a pivotal opportunity to enhance the predictive capacity of these models. Hydraulic models are traditionally built with accurate terrain, flow and bathymetric data and are often calibrated and validated using observed data to obtain meaningful and actionable model predictions. Data for accurately calibrating and validating hydraulic models are not always available, leaving the assessment of the predictive capabilities of some models deployed in flood risk management in question. Recent advances in remote sensing have heralded the availability of vast video datasets of high resolution. The parallel evolution of computing capabilities, coupled with advancements in artificial intelligence are enabling the processing of data at unprecedented scales and complexities, allowing us to glean meaningful insights into datasets that can be integrated with hydraulic models. The aims of the research presented in this thesis were twofold. The first aim was to evaluate and explore the potential applications of video from air- and space-borne platforms to comprehensively calibrate and validate two-dimensional hydraulic models. The second aim was to estimate river discharge using satellite video combined with high resolution topographic data. In the first of three empirical chapters, non-intrusive image velocimetry techniques were employed to estimate river surface velocities in a rural catchment. For the first time, a 2D hydraulicvmodel was fully calibrated and validated using velocities derived from Unpiloted Aerial Vehicle (UAV) image velocimetry approaches. This highlighted the value of these data in mitigating the limitations associated with traditional data sources used in parameterizing two-dimensional hydraulic models. This finding inspired the subsequent chapter where river surface velocities, derived using Large Scale Particle Image Velocimetry (LSPIV), and flood extents, derived using deep neural network-based segmentation, were extracted from satellite video and used to rigorously assess the skill of a two-dimensional hydraulic model. Harnessing the ability of deep neural networks to learn complex features and deliver accurate and contextually informed flood segmentation, the potential value of satellite video for validating two dimensional hydraulic model simulations is exhibited. In the final empirical chapter, the convergence of satellite video imagery and high-resolution topographical data bridges the gap between visual observations and quantitative measurements by enabling the direct extraction of velocities from video imagery, which is used to estimate river discharge. Overall, this thesis demonstrates the significant potential of emerging video-based remote sensing datasets and offers approaches for integrating these data into hydraulic modelling and discharge estimation practice. The incorporation of LSPIV techniques into flood modelling workflows signifies a methodological progression, especially in areas lacking robust data collection infrastructure. Satellite video remote sensing heralds a major step forward in our ability to observe river dynamics in real time, with potentially significant implications in the domain of flood modelling science

    Tinto: Multisensor Benchmark for 3-D Hyperspectral Point Cloud Segmentation in the Geosciences

    Get PDF
    The increasing use of deep learning techniques has reduced interpretation time and, ideally, reduced interpreter bias by automatically deriving geological maps from digital outcrop models. However, accurate validation of these automated mapping approaches is a significant challenge due to the subjective nature of geological mapping and the difficulty in collecting quantitative validation data. Additionally, many state-of-the-art deep learning methods are limited to 2-D image data, which is insufficient for 3-D digital outcrops, such as hyperclouds. To address these challenges, we present Tinto, a multisensor benchmark digital outcrop dataset designed to facilitate the development and validation of deep learning approaches for geological mapping, especially for nonstructured 3-D data like point clouds. Tinto comprises two complementary sets: 1) a real digital outcrop model from Corta Atalaya (Spain), with spectral attributes and ground-truth data and 2) a synthetic twin that uses latent features in the original datasets to reconstruct realistic spectral data (including sensor noise and processing artifacts) from the ground truth. The point cloud is dense and contains 3242964 labeled points. We used these datasets to explore the abilities of different deep learning approaches for automated geological mapping. By making Tinto publicly available, we hope to foster the development and adaptation of new deep learning tools for 3-D applications in Earth sciences. The dataset can be accessed through this link: https://doi.org/10.14278/rodare.2256

    DILF: Differentiable Rendering-Based Multi-View Image-Language Fusion for Zero-Shot 3D Shape Understanding

    Get PDF
    Zero-shot 3D shape understanding aims to recognize “unseen” 3D categories that are not present in training data. Recently, Contrastive Language–Image Pre-training (CLIP) has shown promising open-world performance in zero-shot 3D shape understanding tasks by information fusion among language and 3D modality. It first renders 3D objects into multiple 2D image views and then learns to understand the semantic relationships between the textual descriptions and images, enabling the model to generalize to new and unseen categories. However, existing studies in zero-shot 3D shape understanding rely on predefined rendering parameters, resulting in repetitive, redundant, and low-quality views. This limitation hinders the model’s ability to fully comprehend 3D shapes and adversely impacts the text–image fusion in a shared latent space. To this end, we propose a novel approach called Differentiable rendering-based multi-view Image–Language Fusion (DILF) for zero-shot 3D shape understanding. Specifically, DILF leverages large-scale language models (LLMs) to generate textual prompts enriched with 3D semantics and designs a differentiable renderer with learnable rendering parameters to produce representative multi-view images. These rendering parameters can be iteratively updated using a text–image fusion loss, which aids in parameters’ regression, allowing the model to determine the optimal viewpoint positions for each 3D object. Then a group-view mechanism is introduced to model interdependencies across views, enabling efficient information fusion to achieve a more comprehensive 3D shape understanding. Experimental results can demonstrate that DILF outperforms state-of-the-art methods for zero-shot 3D classification while maintaining competitive performance for standard 3D classification. The code is available at https://github.com/yuzaiyang123/DILP

    Sound of Violent Images / Violence of Sound Images: Pulling apart Tom and Jerry

    Get PDF
    Violence permeates Tom and Jerry in the repetitive, physically violent gags and scenes of humiliation and mocking, yet unarguably, there is comedic value in the onscreen violence.The musical scoring of Tom and Jerry in the early William Hanna and Joseph Barbera period of production (pre-1958) by Scott Bradley played a key role in conveying the comedic impact of violent gags due to the close synchronisation of music and sound with visual action and is typified by a form of sound design characteristic of zip crash animation as described by Paul Taberham (2012), in which sound actively participates in the humour and directly influences the viewer’s interpretation of the visual action. This research investigates the sound-image relationships in Tom and Jerry through practice, by exploring how processes of decontextualisation and desynchronisation of sound and image elements of violent gags unmask the underlying violent subtext of Tom and Jerry’s slapstick comedy. This research addresses an undertheorised area in animation related to the role of sound-image synchronisation and presents new knowledge derived from the novel application of audiovisual analysis of Tom and Jerry source material and the production of audiovisual artworks. The findings of this research are discussed from a pan theoretical perspective drawing on theorisation of film sound and cognitivist approaches to film music. This investigation through practice, supports the notion that intrinsic and covert processes of sound-image synchronisation as theorised by Kevin Donnelly (2014), play a key role in the reading of slapstick violence as comedic. Therefore, this practice-based research can be viewed as a case study that demonstrates the potential of a sampling-based creative practice to enable new readings to emerge from sampled source material. Novel artefacts were created in the form of audiovisual works that embody specific knowledge of factors related to the reconfiguration of sound-image relations and their impact in altering viewers’ readings of violence contained within Tom and Jerry. Critically, differences emerged between the artworks in terms of the extent to which they unmasked underlying themes of violence and potential mediating factors are discussed related to the influence of asynchrony on comical framing, the role of the unseen voice, perceived musicality and perceptions of interiority in the audiovisual artworks. The research findings yielded new knowledge regarding a potential gender-based bias in the perception of the human voice in the animated artworks produced. This research also highlights the role of intra-animation dimensions pertaining to the use of the single frame, the use of blank spaces and the relationship of sound-image synchronisation to the notion of the acousmatic imaginary. The PhD includes a portfolio of experimental audiovisual artworks produced during the testing and experimental phases of the research on which the textual dissertation critically reflects

    Self-supervised learning for transferable representations

    Get PDF
    Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks

    Redefining Disproportionate Arrest Rates: An Exploratory Quasi-Experiment that Reassesses the Role of Skin Tone

    Get PDF
    The New York Times reported that Black Lives Matter was the third most-read subject of 2020. These articles brought to the forefront the question of disparity in arrest rates for darker-skinned people. Questioning arrest disparity is understandable because virtually everything known about disproportionate arrest rates has been a guess, and virtually all prior research on disproportionate arrest rates is questionable because of improper benchmarking (the denominator effect). Current research has highlighted the need to switch from demographic data to skin tone data and start over on disproportionate arrest rate research; therefore, this study explored the relationship between skin tone and disproportionate arrest rates. This study also sought to determine which of the three theories surrounding disproportionate arrests is most predictive of disproportionate rates. The current theories are that disproportionate arrests increase as skin tone gets darker (stereotype threat theory), disproportionate rates are different for Black and Brown people (self-categorization theory), or disproportionate rates apply equally across all darker skin colors (social dominance theory). This study used a quantitative exploratory quasi-experimental design using linear spline regression to analyze arrest rates in Alachua County, Florida, before and after the county’s mandate to reduce arrests as much as possible during the COVID-19 pandemic to protect the prison population. The study was exploratory as no previous study has used skin tone analysis to examine arrest disparity. The findings of this study redefines the understanding of the existence and nature of disparities in arrest rates and offer a solid foundation for additional studies about the relationship between disproportionate arrest rates and skin color

    A Survey on Few-Shot Class-Incremental Learning

    Get PDF
    Large deep learning models are impressive, but they struggle when real-time data is not available. Few-shot class-incremental learning (FSCIL) poses a significant challenge for deep neural networks to learn new tasks from just a few labeled samples without forgetting the previously learned ones. This setup can easily leads to catastrophic forgetting and overfitting problems, severely affecting model performance. Studying FSCIL helps overcome deep learning model limitations on data volume and acquisition time, while improving practicality and adaptability of machine learning models. This paper provides a comprehensive survey on FSCIL. Unlike previous surveys, we aim to synthesize few-shot learning and incremental learning, focusing on introducing FSCIL from two perspectives, while reviewing over 30 theoretical research studies and more than 20 applied research studies. From the theoretical perspective, we provide a novel categorization approach that divides the field into five subcategories, including traditional machine learning methods, meta learning-based methods, feature and feature space-based methods, replay-based methods, and dynamic network structure-based methods. We also evaluate the performance of recent theoretical research on benchmark datasets of FSCIL. From the application perspective, FSCIL has achieved impressive achievements in various fields of computer vision such as image classification, object detection, and image segmentation, as well as in natural language processing and graph. We summarize the important applications. Finally, we point out potential future research directions, including applications, problem setups, and theory development. Overall, this paper offers a comprehensive analysis of the latest advances in FSCIL from a methodological, performance, and application perspective

    Representing Input Transformations by Low-Dimensional Parameter Subspaces

    Full text link
    Deep models lack robustness to simple input transformations such as rotation, scaling, and translation, unless they feature a particular invariant architecture or undergo specific training, e.g., learning the desired robustness from data augmentations. Alternatively, input transformations can be treated as a domain shift problem, and solved by post-deployment model adaptation. Although a large number of methods deal with transformed inputs, the fundamental relation between input transformations and optimal model weights is unknown. In this paper, we put forward the configuration subspace hypothesis that model weights optimal for parameterized continuous transformations can reside in low-dimensional linear subspaces. We introduce subspace-configurable networks to learn these subspaces and observe their structure and surprisingly low dimensionality on all tested transformations, datasets and architectures from computer vision and audio signal processing domains. Our findings enable efficient model reconfiguration, especially when limited storage and computing resources are at stake

    Depth Estimation and Image Restoration by Deep Learning from Defocused Images

    Full text link
    Monocular depth estimation and image deblurring are two fundamental tasks in computer vision, given their crucial role in understanding 3D scenes. Performing any of them by relying on a single image is an ill-posed problem. The recent advances in the field of Deep Convolutional Neural Networks (DNNs) have revolutionized many tasks in computer vision, including depth estimation and image deblurring. When it comes to using defocused images, the depth estimation and the recovery of the All-in-Focus (Aif) image become related problems due to defocus physics. Despite this, most of the existing models treat them separately. There are, however, recent models that solve these problems simultaneously by concatenating two networks in a sequence to first estimate the depth or defocus map and then reconstruct the focused image based on it. We propose a DNN that solves the depth estimation and image deblurring in parallel. Our Two-headed Depth Estimation and Deblurring Network (2HDED:NET) extends a conventional Depth from Defocus (DFD) networks with a deblurring branch that shares the same encoder as the depth branch. The proposed method has been successfully tested on two benchmarks, one for indoor and the other for outdoor scenes: NYU-v2 and Make3D. Extensive experiments with 2HDED:NET on these benchmarks have demonstrated superior or close performances to those of the state-of-the-art models for depth estimation and image deblurring

    Blind Concealment from Reconstruction-based Attack Detectors for Industrial Control Systems via Backdoor Attacks

    Get PDF
    Industrial Control Systems (ICS) are responsible for the safety and operations of critical infrastructure such as power grids. Attacks on such systems threaten the well-being of societies, and the lives of human operators, and pose huge financial risks. Due to their reliance on insecure legacy protocols and hosts, the systems cannot be protected easily. Fortunately, detailed process data is available and can be leveraged by process-aware attack detectors that verify inherent physical correlations. In commercial products, such detectors will be trained by the vendors on process data from the target system, which might allow malicious manipulations of the training process to later evade detection at runtime. Previously proposed attacks in this direction rely on detailed process knowledge to predict the exact attack features to be concealed. In this work, we show that even without any process knowledge, it is possible to launch training time attacks against such attack detectors. Our backdoor attacks achieve this by identifying `alien' actuator state combinations that never occur in the training samples and injecting them with legitimate sensor data into the training set. At runtime, the attacker spoofs one of those alien actuator state combinations, which triggers (regardless of current process sensor values) the classification as `normal'. To demonstrate this, we design and implement five backdoor attacks against autoencoder-based anomaly detectors for 14 attacks from the BATADAL dataset collection. Out of these five variations, four implementations have been found to be effective. Our evaluation also shows that our best backdoor attack implementation can achieve perfect attack concealment and accomplish an average accuracy of 0.19. Compared to the performance of the detector for anomalies that are not concealed by inserted triggers, our attacks decrease the detector's accuracy by 0.39
    • 

    corecore