488 research outputs found

    Flood dynamics derived from video remote sensing

    Get PDF
    Flooding is by far the most pervasive natural hazard, with the human impacts of floods expected to worsen in the coming decades due to climate change. Hydraulic models are a key tool for understanding flood dynamics and play a pivotal role in unravelling the processes that occur during a flood event, including inundation flow patterns and velocities. In the realm of river basin dynamics, video remote sensing is emerging as a transformative tool that can offer insights into flow dynamics and thus, together with other remotely sensed data, has the potential to be deployed to estimate discharge. Moreover, the integration of video remote sensing data with hydraulic models offers a pivotal opportunity to enhance the predictive capacity of these models. Hydraulic models are traditionally built with accurate terrain, flow and bathymetric data and are often calibrated and validated using observed data to obtain meaningful and actionable model predictions. Data for accurately calibrating and validating hydraulic models are not always available, leaving the assessment of the predictive capabilities of some models deployed in flood risk management in question. Recent advances in remote sensing have heralded the availability of vast video datasets of high resolution. The parallel evolution of computing capabilities, coupled with advancements in artificial intelligence are enabling the processing of data at unprecedented scales and complexities, allowing us to glean meaningful insights into datasets that can be integrated with hydraulic models. The aims of the research presented in this thesis were twofold. The first aim was to evaluate and explore the potential applications of video from air- and space-borne platforms to comprehensively calibrate and validate two-dimensional hydraulic models. The second aim was to estimate river discharge using satellite video combined with high resolution topographic data. In the first of three empirical chapters, non-intrusive image velocimetry techniques were employed to estimate river surface velocities in a rural catchment. For the first time, a 2D hydraulicvmodel was fully calibrated and validated using velocities derived from Unpiloted Aerial Vehicle (UAV) image velocimetry approaches. This highlighted the value of these data in mitigating the limitations associated with traditional data sources used in parameterizing two-dimensional hydraulic models. This finding inspired the subsequent chapter where river surface velocities, derived using Large Scale Particle Image Velocimetry (LSPIV), and flood extents, derived using deep neural network-based segmentation, were extracted from satellite video and used to rigorously assess the skill of a two-dimensional hydraulic model. Harnessing the ability of deep neural networks to learn complex features and deliver accurate and contextually informed flood segmentation, the potential value of satellite video for validating two dimensional hydraulic model simulations is exhibited. In the final empirical chapter, the convergence of satellite video imagery and high-resolution topographical data bridges the gap between visual observations and quantitative measurements by enabling the direct extraction of velocities from video imagery, which is used to estimate river discharge. Overall, this thesis demonstrates the significant potential of emerging video-based remote sensing datasets and offers approaches for integrating these data into hydraulic modelling and discharge estimation practice. The incorporation of LSPIV techniques into flood modelling workflows signifies a methodological progression, especially in areas lacking robust data collection infrastructure. Satellite video remote sensing heralds a major step forward in our ability to observe river dynamics in real time, with potentially significant implications in the domain of flood modelling science

    Insights into temperature controls on rockfall occurrence and cliff erosion

    Get PDF
    A variety of environmental triggers have been associated with the occurrence of rockfalls however their role and relative significance remains poorly constrained. This is in part due to the lack of concurrent data on rockfall occurrence and cliff face conditions at temporal resolutions that mirror the variability of environmental conditions, and over durations for large enough numbers of rockfall events to be captured. The aim of this thesis is to fill this data gap, and then to specifically focus on the role of temperature in triggering rockfall that this data illuminates. To achieve this, a long-term multiannual 3D rockfall dataset and contemporaneous Infrared Thermography (IRT) monitoring of cliff surface temperatures has been generated. The approaches used in this thesis are undertaken at East Cliff, Whitby, which is a coastal cliff located in North Yorkshire, UK. The monitored section is ~ 200 m wide and ~65 m high, with a total cliff face area of ~9,592 mÂČ. A method for the automated quantification of rockfall volumes is used to explore data collected between 2017–2019 and 2021, with the resulting inventory including > 8,300 rockfalls from 2017–2019 and > 4,100 rockfalls in 2021, totalling > 12,400 number of rockfalls. The analysis of the inventory demonstrates that during dry conditions, increases in rockfall frequency are coincident with diurnal surface temperature fluctuations, notably at sunrise, noon and sunset in all seasons, leading to a marked diurnal pattern of rockfall. Statistically significant relationships are observed to link cliff temperature and rockfall, highlighting the response of rock slopes to absolute temperatures and changes in temperature. This research also shows that inclement weather constitutes the dominant control over the annual production of rockfalls but also quantifies the period when temperature controls are dominant. Temperature-controlled rockfall activity is shown to have an important erosional role, particularly in periods of iterative erosion dominated by small size rockfalls. As such, this thesis provides for the first high-resolution evidence of temperature controls on rockfall activity, cliff erosion and landform development

    WinDB: HMD-free and Distortion-free Panoptic Video Fixation Learning

    Full text link
    To date, the widely-adopted way to perform fixation collection in panoptic video is based on a head-mounted display (HMD), where participants' fixations are collected while wearing an HMD to explore the given panoptic scene freely. However, this widely-used data collection method is insufficient for training deep models to accurately predict which regions in a given panoptic are most important when it contains intermittent salient events. The main reason is that there always exist "blind zooms" when using HMD to collect fixations since the participants cannot keep spinning their heads to explore the entire panoptic scene all the time. Consequently, the collected fixations tend to be trapped in some local views, leaving the remaining areas to be the "blind zooms". Therefore, fixation data collected using HMD-based methods that accumulate local views cannot accurately represent the overall global importance of complex panoramic scenes. This paper introduces the auxiliary Window with a Dynamic Blurring (WinDB) fixation collection approach for panoptic video, which doesn't need HMD and is blind-zoom-free. Thus, the collected fixations can well reflect the regional-wise importance degree. Using our WinDB approach, we have released a new PanopticVideo-300 dataset, containing 300 panoptic clips covering over 225 categories. Besides, we have presented a simple baseline design to take full advantage of PanopticVideo-300 to handle the blind-zoom-free attribute-induced fixation shifting problem

    Shaped-based IMU/Camera Tightly Coupled Object-level SLAM using Rao-Blackwellized Particle Filtering

    Get PDF
    Simultaneous Localization and Mapping (SLAM) is a decades-old problem. The classical solution to this problem utilizes entities such as feature points that cannot facilitate the interactions between a robot and its environment (e.g., grabbing objects). Recent advances in deep learning have paved the way to accurately detect objects in the image under various illumination conditions and occlusions. This led to the emergence of object-level solutions to the SLAM problem. Current object-level methods depend on an initial solution using classical approaches and assume that errors are Gaussian. This research develops a standalone solution to object-level SLAM that integrates the data from a monocular camera and an IMU (available in low-end devices) using Rao Blackwellized Particle Filter (RBPF). RBPF does not assume Gaussian distribution for the error; thus, it can handle a variety of scenarios (such as when a symmetrical object with pose ambiguities is encountered). The developed method utilizes shape instead of texture; therefore, texture-less objects can be incorporated into the solution. In the particle weighing process, a new method is developed that utilizes the Intersection over the Union (IoU) area of the observed and projected boundaries of the object that does not require point-to-point correspondence. Thus, it is not prone to false data correspondences. Landmark initialization is another important challenge for object-level SLAM. In the state-of-the-art delayed initialization, the trajectory estimation only relies on the motion model provided by IMU mechanization (during the initialization), leading to large errors. In this thesis, two novel undelayed initializations are developed. One relies only on a monocular camera and IMU, and the other utilizes an ultrasonic rangefinder as well. The developed object-level SLAM is tested using wheeled robots and handheld devices, and an error (in the position) of 4.1 to 13.1 cm (0.005 to 0.028 of the total path length) has been obtained through extensive experiments using only a single object. These experiments are conducted in different indoor environments under different conditions (e.g. illumination). Further, it is shown that undelayed initialization using an ultrasonic sensor can reduce the algorithm's runtime by half

    Teaching Unknown Objects by Leveraging Human Gaze and Augmented Reality in Human-Robot Interaction

    Get PDF
    Roboter finden aufgrund ihrer außergewöhnlichen Arbeitsleistung, PrĂ€zision, Effizienz und Skalierbarkeit immer mehr Verwendung in den verschiedensten Anwendungsbereichen. Diese Entwicklung wurde zusĂ€tzlich begĂŒnstigt durch Fortschritte in der KĂŒnstlichen Intelligenz (KI), insbesondere im Maschinellem Lernen (ML). Mit Hilfe moderner neuronaler Netze sind Roboter in der Lage, Objekte in ihrer Umgebung zu erkennen und mit ihnen zu interagieren. Ein erhebliches Manko besteht jedoch darin, dass das Training dieser Objekterkennungsmodelle, in aller Regel mit einer zugrundeliegenden AbhĂ€ngig von umfangreichen DatensĂ€tzen und der VerfĂŒgbarkeit großer Datenmengen einhergeht. Dies ist insbesondere dann problematisch, wenn der konkrete Einsatzort des Roboters und die Umgebung, einschließlich der darin befindlichen Objekte, nicht im Voraus bekannt sind. Die breite und stĂ€ndig wachsende Palette von Objekten macht es dabei praktisch unmöglich, das gesamte Spektrum an existierenden Objekten allein mit bereits zuvor erstellten DatensĂ€tzen vollstĂ€ndig abzudecken. Das Ziel dieser Dissertation war es, einem Roboter unbekannte Objekte mit Hilfe von Human-Robot Interaction (HRI) beizubringen, um ihn von seiner AbhĂ€ngigkeit von Daten sowie den EinschrĂ€nkungen durch vordefinierte Szenarien zu befreien. Die Synergie von Eye Tracking und Augmented Reality (AR) ermöglichte es dem als Lehrer fungierenden Menschen, mit dem Roboter zu kommunizieren und ihn mittels des menschlichen Blickes auf Objekte hinzuweisen. Dieser holistische Ansatz ermöglichte die Konzeption eines multimodalen HRI-Systems, durch das der Roboter Objekte identifizieren und dreidimensional segmentieren konnte, obwohl sie ihm zu diesem Zeitpunkt noch unbekannt waren, um sie anschließend aus unterschiedlichen Blickwinkeln eigenstĂ€ndig zu inspizieren. Anhand der Klasseninformationen, die ihm der Mensch mitteilte, war der Roboter daraufhin in der Lage, die entsprechenden Objekte zu erlernen und spĂ€ter wiederzuerkennen. Mit dem Wissen, das dem Roboter durch diesen auf HRI basierenden Lehrvorgang beigebracht worden war, war dessen FĂ€higkeit Objekte zu erkennen vergleichbar mit den FĂ€higkeiten modernster Objektdetektoren, die auf umfangreichen DatensĂ€tzen trainiert worden waren. Dabei war der Roboter jedoch nicht auf vordefinierte Klassen beschrĂ€nkt, was seine Vielseitigkeit und AnpassungsfĂ€higkeit unter Beweis stellte. Die im Rahmen dieser Dissertation durchgefĂŒhrte Forschung leistete bedeutende BeitrĂ€ge an der Schnittstelle von Machine Learning (ML), AR, Eye Tracking und Robotik. Diese Erkenntnisse tragen nicht nur zum besseren VerstĂ€ndnis der genannten Felder bei, sondern ebnen auch den Weg fĂŒr weitere interdisziplinĂ€re Forschung. Die in dieser Dissertation enthalten wissenschaftlichen Artikel wurden auf hochrangigen Konferenzen in den Bereichen Robotik, Eye Tracking und HRI veröffentlicht.Robots are becoming increasingly popular in a wide range of environments due to their exceptional work capacity, precision, efficiency, and scalability. This development has been further encouraged by advances in Artificial Intelligence (AI), particularly Machine Learning (ML). By employing sophisticated neural networks, robots are given the ability to detect and interact with objects in their vicinity. However, a significant drawback arises from the underlying dependency on extensive datasets and the availability of substantial amounts of training data for these object detection models. This issue becomes particularly problematic when the specific deployment location of the robot and the surroundings, including the objects within it, are not known in advance. The vast and ever-expanding array of objects makes it virtually impossible to comprehensively cover the entire spectrum of existing objects using preexisting datasets alone. The goal of this dissertation was to teach a robot unknown objects in the context of Human-Robot Interaction (HRI) in order to liberate it from its data dependency, unleashing it from predefined scenarios. In this context, the combination of eye tracking and Augmented Reality (AR) created a powerful synergy that empowered the human teacher to seamlessly communicate with the robot and effortlessly point out objects by means of human gaze. This holistic approach led to the development of a multimodal HRI system that enabled the robot to identify and visually segment the Objects of Interest (OOIs) in three-dimensional space, even though they were initially unknown to it, and then examine them autonomously from different angles. Through the class information provided by the human, the robot was able to learn the objects and redetect them at a later stage. Due to the knowledge gained from this HRI based teaching process, the robot’s object detection capabilities exhibited comparable performance to state-of-the-art object detectors trained on extensive datasets, without being restricted to predefined classes, showcasing its versatility and adaptability. The research conducted within the scope of this dissertation made significant contributions at the intersection of ML, AR, eye tracking, and robotics. These findings not only enhance the understanding of these fields, but also pave the way for further interdisciplinary research. The scientific articles included in this dissertation have been published at high-impact conferences in the fields of robotics, eye tracking, and HRI

    INSAM Journal of Contemporary Music, Art and Technology 10 (I/2023)

    Get PDF
    Having in mind the foundational idea not only of our Journal but also the INSAM Institute itself, the main theme of this issue is titled “Technological Aspects of Contemporary Artistic and Scientific Research”. This theme was recognized as important, timely, and necessary by a number of authors coming from various disciplines. The (Inter)Views section brings us three diverse pieces; the issue is opened by Aida AdĆŸović’s interview with the legendary Slovene act Laibach regarding their performance of the Wir sing das Volk project at the Sarajevo National Theater on May 9, 2023. Following this, Marija Mitrović prepared an interview with media artist Leon Eckard, concerning this artist’s views on contemporary art and the interaction between technology and human sensitivity. An essay by Alexander Liebermann on the early 20th-century composer Erwin Schulhoff, whose search for a unique personal voice could be encouraging in any given period, closes this rubric. The Main theme section contains seven scientific articles. In the first one, Filipa MagalhĂŁes, InĂȘs Filipe, Mariana Silva and Henrique Carvalho explore the process and details of technological and artistic challenges of reviving the music theater work FE...DE...RI...CO... (1987) by Constança Capdeville. The second article, written by Milan Milojković, is dedicated to the analysis of historical composer Vojislav Vučković and his ChatGPT-generated doppelganger and opera. The fictional narrative woven around the actual historical figure served as an example of the current possibilities of AI in the domain of musicological work. In the next paper, LuĂ­s Arandas, Miguel Carvalhais and Mick Grierson expand on their work on the film Irreplaceable Biography, which was created via language-guided generative models in audiovisual production. Thomas Moore focuses on the Belgium-based Nadar Ensemble and discusses the ways in which the performers of the ensemble understand the concept of the integrated concert and distinguish themselves from it, specifying the broadening of performers’ competencies and responsibilities. In her paper, Dana Papachristou contributes to the discussion on the politics of connectivity based on the examination of three projects: the online project Xenakis Networked Performance Marathon 2022, 2023Eleusis Mystery 91_Magnetic Dance in Elefsina European Capital of Culture, and Spaces of Reflection offline PirateBox network in the 10th Berlin Biennale. The penultimate article in the section is written by Kenrick Ho and presents us with the author’s composition Flou for solo violin through the prism of the relationship between (historically present) algorithmic processes, the composer, and the performer. Finally, Rijad KaniĆŸa adds to the critical discourse on the reshaping of the musical experience via technology and the understanding of said technology using the example of musique concrĂšte. In the final Review section, Bakir MemiĆĄević gives an overview of the 13th International Symposium “Music in Society” that was held in Sarajevo in December 2022

    LOOKING INTO ACTORS, OBJECTS AND THEIR INTERACTIONS FOR VIDEO UNDERSTANDING

    Get PDF
    Automatic video understanding is critical for enabling new applications in video surveillance, augmented reality, and beyond. Powered by deep networks that learn holistic representations of video clips, and large-scale annotated datasets, modern systems are capable of accurately recognizing hundreds of human activity classes. However, their performance significantly degrades as the number of actors in the scene or the complexity of the activities increases. Therefore, most of the research thus far has focused on videos that are short and/or contain a few activities performed only by adults. Furthermore, most current systems require expensive, spatio-temporal annotations for training. These limitations prevent the deployment of such systems in real-life applications, such as detecting activities of people and vehicles in an extended surveillance videos. To address these limitations, this thesis focuses on developing data-driven, compositional, region-based video understanding models motivated by the observation that actors, objects and their spatio-temporal interactions are the building blocks of activities and the main content of video descriptions provided by humans. This thesis makes three main contributions. First, we propose a novel Graph Neural Network for representation learning on heterogeneous graphs that encode spatio-temporal interactions between actor and object regions in videos. This model can learn context-aware representations for detected actors and objects, which we leverage for detecting complex activities. Second, we propose an attention-based deep conditional generative model of sentences, whose latent variables correspond to alignments between words in textual descriptions of videos and object regions. Building upon the framework of Conditional Variational Autoencoders, we train this model using only textual descriptions without bounding box annotations, and leverage its latent variables for localizing the actors and objects that are mentioned in generated or ground-truth descriptions of videos. Finally, we propose an actor-centric framework for real-time activity detection in videos that are extended both in space and time. Our framework leverages object detections and tracking to generate actor-centric tubelets, capturing all relevant spatio-temporal context for a single actor, and detects activities per tubelet based on contextual region embeddings. The models described have demonstrably improved the ability to temporally detect activities, as well as ground words in visual inputs

    Semantic Validation in Structure from Motion

    Full text link
    The Structure from Motion (SfM) challenge in computer vision is the process of recovering the 3D structure of a scene from a series of projective measurements that are calculated from a collection of 2D images, taken from different perspectives. SfM consists of three main steps; feature detection and matching, camera motion estimation, and recovery of 3D structure from estimated intrinsic and extrinsic parameters and features. A problem encountered in SfM is that scenes lacking texture or with repetitive features can cause erroneous feature matching between frames. Semantic segmentation offers a route to validate and correct SfM models by labelling pixels in the input images with the use of a deep convolutional neural network. The semantic and geometric properties associated with classes in the scene can be taken advantage of to apply prior constraints to each class of object. The SfM pipeline COLMAP and semantic segmentation pipeline DeepLab were used. This, along with planar reconstruction of the dense model, were used to determine erroneous points that may be occluded from the calculated camera position, given the semantic label, and thus prior constraint of the reconstructed plane. Herein, semantic segmentation is integrated into SfM to apply priors on the 3D point cloud, given the object detection in the 2D input images. Additionally, the semantic labels of matched keypoints are compared and inconsistent semantically labelled points discarded. Furthermore, semantic labels on input images are used for the removal of objects associated with motion in the output SfM models. The proposed approach is evaluated on a data-set of 1102 images of a repetitive architecture scene. This project offers a novel method for improved validation of 3D SfM models

    Gurus and Media: Sound, image, machine, text and the digital

    Get PDF
    Gurus and Media is the first book dedicated to media and mediation in domains of public guruship and devotion. Illuminating the mediatisation of guruship and the guru-isation of media, it bridges the gap between scholarship on gurus and the disciplines of media and visual culture studies. It investigates guru iconographies in and across various time periods and also the distinctive ways in which diverse gurus engage with and inhabit different forms of media: statuary, games, print publications, photographs, portraiture, films, machines, social media, bodies, words, graffiti, dolls, sound, verse, tombs and more. The book’s interdisciplinary chapters advance, both conceptually and ethnographically, our understanding of the function of media in the dramatic production of guruship, and reflect on the corporate branding of gurus and on mediated guruship as a series of aesthetic traps for the captivation of devotees and others. They show how different media can further enliven the complex plurality of guruship, for instance in instantiating notions of ‘absent-present’ guruship and demonstrating the mutual mediation of gurus, caste and Hindutva. Throughout, the book foregrounds contested visions of the guru in the development of devotional publics and pluriform guruship across time and space. Thinking through the guru’s many media entanglements in a single place, the book contributes new insights to the study of South Asian religions and to the study of mediation more broadly
    • 

    corecore