242 research outputs found

    Flood dynamics derived from video remote sensing

    Get PDF
    Flooding is by far the most pervasive natural hazard, with the human impacts of floods expected to worsen in the coming decades due to climate change. Hydraulic models are a key tool for understanding flood dynamics and play a pivotal role in unravelling the processes that occur during a flood event, including inundation flow patterns and velocities. In the realm of river basin dynamics, video remote sensing is emerging as a transformative tool that can offer insights into flow dynamics and thus, together with other remotely sensed data, has the potential to be deployed to estimate discharge. Moreover, the integration of video remote sensing data with hydraulic models offers a pivotal opportunity to enhance the predictive capacity of these models. Hydraulic models are traditionally built with accurate terrain, flow and bathymetric data and are often calibrated and validated using observed data to obtain meaningful and actionable model predictions. Data for accurately calibrating and validating hydraulic models are not always available, leaving the assessment of the predictive capabilities of some models deployed in flood risk management in question. Recent advances in remote sensing have heralded the availability of vast video datasets of high resolution. The parallel evolution of computing capabilities, coupled with advancements in artificial intelligence are enabling the processing of data at unprecedented scales and complexities, allowing us to glean meaningful insights into datasets that can be integrated with hydraulic models. The aims of the research presented in this thesis were twofold. The first aim was to evaluate and explore the potential applications of video from air- and space-borne platforms to comprehensively calibrate and validate two-dimensional hydraulic models. The second aim was to estimate river discharge using satellite video combined with high resolution topographic data. In the first of three empirical chapters, non-intrusive image velocimetry techniques were employed to estimate river surface velocities in a rural catchment. For the first time, a 2D hydraulicvmodel was fully calibrated and validated using velocities derived from Unpiloted Aerial Vehicle (UAV) image velocimetry approaches. This highlighted the value of these data in mitigating the limitations associated with traditional data sources used in parameterizing two-dimensional hydraulic models. This finding inspired the subsequent chapter where river surface velocities, derived using Large Scale Particle Image Velocimetry (LSPIV), and flood extents, derived using deep neural network-based segmentation, were extracted from satellite video and used to rigorously assess the skill of a two-dimensional hydraulic model. Harnessing the ability of deep neural networks to learn complex features and deliver accurate and contextually informed flood segmentation, the potential value of satellite video for validating two dimensional hydraulic model simulations is exhibited. In the final empirical chapter, the convergence of satellite video imagery and high-resolution topographical data bridges the gap between visual observations and quantitative measurements by enabling the direct extraction of velocities from video imagery, which is used to estimate river discharge. Overall, this thesis demonstrates the significant potential of emerging video-based remote sensing datasets and offers approaches for integrating these data into hydraulic modelling and discharge estimation practice. The incorporation of LSPIV techniques into flood modelling workflows signifies a methodological progression, especially in areas lacking robust data collection infrastructure. Satellite video remote sensing heralds a major step forward in our ability to observe river dynamics in real time, with potentially significant implications in the domain of flood modelling science

    ON NEURAL ARCHITECTURES FOR SEGMENTATION IN NATURAL AND MEDICAL IMAGES

    Get PDF
    Segmentation is an important research field in computer vision. It requires recognizing and segmenting the objects at the pixel level. In the past decade, many deep neural networks have been proposed, which have been central to the development in this area. These frameworks have demonstrated human-level or beyond performance on many challenging benchmarks, and have been widely used in many real-life applications, including surveillance, autonomous driving, and medical image analysis. However, it is non-trivial to design neural architectures with both efficiency and effectiveness, especially when they need to be tailored to the target tasks and datasets. In this dissertation, I will present our research works in this area from the following aspects. (i) To enable automatic neural architecture design on the costly 3D medical image segmentation, we propose an efficient and effective neural architecture search algorithm that tackles the problem in a coarse-to-fine manner. (ii) To further take advantage of the neural architecture search, we propose to search for a channel-level replacement for 3D networks, which leads to strong alternatives to 3D networks. (iii) To perform segmentation with great detail, we design a coarse-to-fine segmentation framework for matting-level segmentation; (iv) To provide stronger features for segmentation, we propose a stronger transformer-based backbone that can work on dense tasks. (v) To better resolve the panoptic segmentation problem in an end-to-end manner, we propose to combine transformers with the traditional clustering algorithm, which leads to a more intuitive segmentation framework with better performance

    Towards Generalizable Deep Image Matting: Decomposition, Interaction, and Merging

    Get PDF
    Image matting refers to extracting the precise alpha mattes from images, playing a critical role in many downstream applications. Despite extensive attention, key challenges persist and motivate the research presented in this thesis. One major challenge is the reliance of auxiliary inputs in previous methods, hindering real-time practicality. To address this, we introduce fully automatic image matting by decomposing the task into high-level semantic segmentation and low-level details matting. We then incorporate plug-in modules to enhance the interaction between the sub-tasks through feature integration. Furthermore, we propose an attention-based mechanism to guide the matting process through collaboration merging. Another challenge lies in limited matting datasets, resulting in reliance on composite images and inferior performance on images in the wild. In response, our research proposes a composition route to mitigate the discrepancies and result in remarkable generalization ability. Additionally, we construct numerous large datasets of high-quality real-world images with manually labeled alpha mattes, providing a solid foundation for training and evaluation. Moreover, our research uncovers new observations that warrant further investigation. Firstly, we systematically analyze and address privacy issues that have been neglected in previous portrait matting research. Secondly, we explore the adaptation of automatic matting methods to non-salient or transparent categories beyond salient ones. Furthermore, we collaborate with language modality to achieve a more controllable matting process, enabling specific target selection at a low cost. To validate our studies, we conduct extensive experiments and provide all codes and datasets through the link (https://github.com/JizhiziLi/). We believe that the analyses, methods, and datasets presented in this thesis will offer valuable insights for future research endeavors in the field of image matting

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    GLFF: Global and Local Feature Fusion for AI-synthesized Image Detection

    Full text link
    With the rapid development of deep generative models (such as Generative Adversarial Networks and Diffusion models), AI-synthesized images are now of such high quality that humans can hardly distinguish them from pristine ones. Although existing detection methods have shown high performance in specific evaluation settings, e.g., on images from seen models or on images without real-world post-processing, they tend to suffer serious performance degradation in real-world scenarios where testing images can be generated by more powerful generation models or combined with various post-processing operations. To address this issue, we propose a Global and Local Feature Fusion (GLFF) framework to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for AI synthesized image detection. GLFF fuses information from two branches: the global branch to extract multi-scale semantic features and the local branch to select informative patches for detailed local artifacts extraction. Due to the lack of a synthesized image dataset simulating real-world applications for evaluation, we further create a challenging fake image dataset, named DeepFakeFaceForensics (DF 3 ), which contains 6 state-of-the-art generation models and a variety of post-processing techniques to approach the real-world scenarios. Experimental results demonstrate the superiority of our method to the state-of-the-art methods on the proposed DF 3 dataset and three other open-source datasets.Comment: 13 pages, 6 figures, 8 table

    DeepCut: Unsupervised Segmentation using Graph Neural Networks Clustering

    Full text link
    Image segmentation is a fundamental task in computer vision. Data annotation for training supervised methods can be labor-intensive, motivating unsupervised methods. Current approaches often rely on extracting deep features from pre-trained networks to construct a graph, and classical clustering methods like k-means and normalized-cuts are then applied as a post-processing step. However, this approach reduces the high-dimensional information encoded in the features to pair-wise scalar affinities. To address this limitation, this study introduces a lightweight Graph Neural Network (GNN) to replace classical clustering methods while optimizing for the same clustering objective function. Unlike existing methods, our GNN takes both the pair-wise affinities between local image features and the raw features as input. This direct connection between the raw features and the clustering objective enables us to implicitly perform classification of the clusters between different graphs, resulting in part semantic segmentation without the need for additional post-processing steps. We demonstrate how classical clustering objectives can be formulated as self-supervised loss functions for training an image segmentation GNN. Furthermore, we employ the Correlation-Clustering (CC) objective to perform clustering without defining the number of clusters, allowing for k-less clustering. We apply the proposed method for object localization, segmentation, and semantic part segmentation tasks, surpassing state-of-the-art performance on multiple benchmarks

    Transformer-Based Visual Segmentation: A Survey

    Full text link
    Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several closely related settings, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research. The project page can be found at https://github.com/lxtGH/Awesome-Segmenation-With-Transformer. We will also continually monitor developments in this rapidly evolving field.Comment: Work in progress. Github: https://github.com/lxtGH/Awesome-Segmenation-With-Transforme

    SEM-CS: Semantic CLIPStyler for Text-Based Image Style Transfer

    Full text link
    CLIPStyler demonstrated image style transfer with realistic textures using only the style text description (instead of requiring a reference style image). However, the ground semantics of objects in style transfer output is lost due to style spillover on salient and background objects (content mismatch) or over-stylization. To solve this, we propose Semantic CLIPStyler (Sem-CS) that performs semantic style transfer. Sem-CS first segments the content image into salient and non-salient objects and then transfers artistic style based on a given style text description. The semantic style transfer is achieved using global foreground loss (for salient objects) and global background loss (for non-salient objects). Our empirical results, including DISTS, NIMA and user study scores, show that our proposed framework yields superior qualitative and quantitative performance.Comment: 11 Pages, 4 Figures, 2 Table

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

    Deep Image Matting: A Comprehensive Survey

    Full text link
    Image matting refers to extracting precise alpha matte from natural images, and it plays a critical role in various downstream applications, such as image editing. Despite being an ill-posed problem, traditional methods have been trying to solve it for decades. The emergence of deep learning has revolutionized the field of image matting and given birth to multiple new techniques, including automatic, interactive, and referring image matting. This paper presents a comprehensive review of recent advancements in image matting in the era of deep learning. We focus on two fundamental sub-tasks: auxiliary input-based image matting, which involves user-defined input to predict the alpha matte, and automatic image matting, which generates results without any manual intervention. We systematically review the existing methods for these two tasks according to their task settings and network structures and provide a summary of their advantages and disadvantages. Furthermore, we introduce the commonly used image matting datasets and evaluate the performance of representative matting methods both quantitatively and qualitatively. Finally, we discuss relevant applications of image matting and highlight existing challenges and potential opportunities for future research. We also maintain a public repository to track the rapid development of deep image matting at https://github.com/JizhiziLi/matting-survey
    corecore