4,002 research outputs found
Flood dynamics derived from video remote sensing
Flooding is by far the most pervasive natural hazard, with the human impacts of floods expected to worsen in the coming decades due to climate change. Hydraulic models are a key tool for understanding flood dynamics and play a pivotal role in unravelling the processes that occur during a flood event, including inundation flow patterns and velocities. In the realm of river basin dynamics, video remote sensing is emerging as a transformative tool that can offer insights into flow dynamics and thus, together with other remotely sensed data, has the potential to be deployed to estimate discharge. Moreover, the integration of video remote sensing data with hydraulic models offers a pivotal opportunity to enhance the predictive capacity of these models.
Hydraulic models are traditionally built with accurate terrain, flow and bathymetric data and are often calibrated and validated using observed data to obtain meaningful and actionable model predictions. Data for accurately calibrating and validating hydraulic models are not always available, leaving the assessment of the predictive capabilities of some models deployed in flood risk management in question. Recent advances in remote sensing have heralded the availability of vast video datasets of high resolution. The parallel evolution of computing capabilities, coupled with advancements in artificial intelligence are enabling the processing of data at unprecedented scales and complexities, allowing us to glean meaningful insights into datasets that can be integrated with hydraulic models. The aims of the research presented in this thesis were twofold. The first aim was to evaluate and explore the potential applications of video from air- and space-borne platforms to comprehensively calibrate and validate two-dimensional hydraulic models. The second aim was to estimate river discharge using satellite video combined with high resolution topographic data. In the first of three empirical chapters, non-intrusive image velocimetry techniques were employed to estimate river surface velocities in a rural catchment. For the first time, a 2D hydraulicvmodel was fully calibrated and validated using velocities derived from Unpiloted Aerial Vehicle (UAV) image velocimetry approaches. This highlighted the value of these data in mitigating the limitations associated with traditional data sources used in parameterizing two-dimensional hydraulic models. This finding inspired the subsequent chapter where river surface velocities, derived using Large Scale Particle Image Velocimetry (LSPIV), and flood extents, derived using deep neural network-based segmentation, were extracted from satellite video and used to rigorously assess the skill of a two-dimensional hydraulic model. Harnessing the ability of deep neural networks to learn complex features and deliver accurate and contextually informed flood segmentation, the potential value of satellite video for validating two dimensional hydraulic model simulations is exhibited. In the final empirical chapter, the convergence of satellite video imagery and high-resolution topographical data bridges the gap between visual observations and quantitative measurements by enabling the direct extraction of velocities from video imagery, which is used to estimate river discharge. Overall, this thesis demonstrates the significant potential of emerging video-based remote sensing datasets and offers approaches for integrating these data into hydraulic modelling and discharge estimation practice. The incorporation of LSPIV techniques into flood modelling workflows signifies a methodological progression, especially in areas lacking robust data collection infrastructure. Satellite video remote sensing heralds a major step forward in our ability to observe river dynamics in real time, with potentially significant implications in the domain of flood modelling science
Self-supervised learning for transferable representations
Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks
Multidisciplinary perspectives on Artificial Intelligence and the law
This open access book presents an interdisciplinary, multi-authored, edited collection of chapters on Artificial Intelligence (‘AI’) and the Law. AI technology has come to play a central role in the modern data economy. Through a combination of increased computing power, the growing availability of data and the advancement of algorithms, AI has now become an umbrella term for some of the most transformational technological breakthroughs of this age. The importance of AI stems from both the opportunities that it offers and the challenges that it entails. While AI applications hold the promise of economic growth and efficiency gains, they also create significant risks and uncertainty. The potential and perils of AI have thus come to dominate modern discussions of technology and ethics – and although AI was initially allowed to largely develop without guidelines or rules, few would deny that the law is set to play a fundamental role in shaping the future of AI. As the debate over AI is far from over, the need for rigorous analysis has never been greater. This book thus brings together contributors from different fields and backgrounds to explore how the law might provide answers to some of the most pressing questions raised by AI. An outcome of the Católica Research Centre for the Future of Law and its interdisciplinary working group on Law and Artificial Intelligence, it includes contributions by leading scholars in the fields of technology, ethics and the law.info:eu-repo/semantics/publishedVersio
Cross-frame feature-saliency mutual reinforcing for weakly supervised video salient object detection
Neuroimaging investigations of cortical specialisation for different types of semantic knowledge
Embodied theories proposed that semantic knowledge is grounded in motor and perceptual experiences. This leads to two questions: (1) whether the neural underpinnings of perception are also necessary for semantic cognition; (2) how do biases towards different sensorimotor experiences cause brain regions to specialise for particular types of semantic information. This thesis tackles these questions in a series of neuroimaging and behavioural investigations.
Regarding question 1, strong embodiment theory holds that semantic representation is reenactment of corresponding experiences, and brain regions for perception are necessary for comprehending modality-specific concepts. However, the weak embodiment view argues that reenactment may not be necessary, and areas near to perceiving regions may be sufficient to support semantic representation.
In the particular case of motion concepts, lateral occipital temporal cortex (LOTC) has been long identified as an important area, but the roles of its different subregions are still uncertain. Chapter 3 examined how different parts of LOTC reacted to written descriptions of motion and static events, using multiple analysis methods. A series of anterior to posterior sub-regions were analyzed through univariate, multivariate pattern analysis (MVPA), and psychophysical interaction (PPI) analyses. MVPA revealed strongest decoding effects for motion vs. static events in the posterior parts of LOTC, including both visual motion area (V5) and posterior middle temporal gyrus (pMTG). In contrast, only the middle portion of LOTC showed increased activation for motion sentences in univariate analyses. PPI analyses showed increased functional connectivity between posterior LOTC and the multiple demand network for motion events. These findings suggest that posterior LOTC, which overlapped with the motion perception V5 region, is selectively involved in comprehending motion events, while the anterior part of LOTC contributes to general semantic processing.
Regarding question 2, the hub-and-spoke theory suggests that anterior temporal lobe (ATL) acts as a hub, using inputs from modality-specific regions to construct multimodal concepts. However, some researchers propose temporal parietal cortex (TPC) as an additional hub, specialised in processing and integrating interaction and contextual information (e.g., for actions and locations). These hypotheses are summarized as the "dual-hub theory" and different aspects of this theory were investigated in in Chapters 4 and 5.
Chapter 4 focuses on taxonomic and thematic relations. Taxonomic relations (or categorical relations) occur when two concepts belong to the same category (e.g., ‘dog’ and ‘wolf’ are both canines). In contrast, thematic relations (or associative relations) refer to situations that two concepts co-occur in events or scenes (e.g., ‘dog’ and ‘bone’), focusing on the interaction or association between concepts. Some studies have indicated ATL specialization for taxonomic relations and TPC specialization for thematic relations, but others have reported inconsistent or even converse results. Thus Chapter 4 first conducted an activation likelihood estimation (ALE) meta-analysis of neuroimaging studies contrasting taxonomic and thematic relations. This found that thematic relations reliably engage action and location processing regions (left pMTG and SMG), while taxonomic relations only showed consistent effects in the right occipital lobe. A primed semantic judgement task was then used to test the dual-hub theory’s prediction that taxonomic relations are heavily reliant on colour and shape knowledge, while thematic relations rely on action and location knowledge. This behavioural experiment revealed that action or location priming facilitated thematic relation processing, but colour and shape did not lead to priming effects for taxonomic relations. This indicates that thematic relations rely more on action and location knowledge, which may explain why the preferentially engage TPC, whereas taxonomic relations are not specifically linked to shape and colour features. This may explain why they did not preferentially engage left ATL.
Chapter 5 concentrates on event and object concepts. Previous studies suggest ATL specialization for coding similarity of objects’ semantics, and angular gyrus (AG) specialization for sentence and event structure representation. In addition, in neuroimaging studies, event semantics are usually investigated using complex temporally extended stimuli, unlike than the single-concept stimuli used to investigate object semantics. Thus chapter 5 used representational similarity analysis (RSA), univariate analysis, and PPI analysis to explore neural activation patterns for event and object concepts presented as static images. Bilateral AGs encoded semantic similarity for event concepts, with the left AG also coding object similarity. Bilateral ATLs encoded semantic similarity for object concepts but also for events. Left ATL exhibited stronger coding for events than objects. PPI analysis revealed stronger connections between left ATL and right pMTG, and between right AG and bilateral inferior temporal gyrus (ITG) and middle occipital gyrus, for event concepts compared to object concepts. Consistent with the meta-analysis in chapter 4, the results in chapter 5 support the idea of partial specialization in AG for event semantics but do not support ATL specialization for object semantics. In fact, both the meta-analysis and chapter 5 findings suggest greater ATL involvement in coding objects' associations compared to their similarity.
To conclude, the thesis provides support for the idea that perceptual brain regions are engaged in conceptual processing, in the case of motion concepts. It also provides evidence for a specialised role for TPC regions in processing thematic relations (pMTG) and event concepts (AG). There was mixed evidence for specialisation within the ATLs and this remains an important target for future research
Reliable Sensor Intelligence in Resource Constrained and Unreliable Environment
The objective of this research is to design a sensor intelligence that is reliable in a resource constrained, unreliable environment. There are various sources of variations and uncertainty involved in intelligent sensor system, so it is critical to build reliable sensor intelligence. Many prior works seek to design reliable sensor intelligence by developing robust and reliable task. This thesis suggests that along with improving task itself, task reliability quantification based early warning can further improve sensor intelligence. DNN based early warning generator quantifies task reliability based on spatiotemporal characteristics of input, and the early warning controls sensor parameters and avoids system failure. This thesis presents an early warning generator that predicts task failure due to sensor hardware induced input corruption and controls the sensor operation. Moreover, lightweight uncertainty estimator is presented to take account of DNN model uncertainty in task reliability quantification without prohibitive computation from stochastic DNN. Cross-layer uncertainty estimation is also discussed to consider the effect of PIM variations.Ph.D
The Application of Data Analytics Technologies for the Predictive Maintenance of Industrial Facilities in Internet of Things (IoT) Environments
In industrial production environments, the maintenance of equipment has a decisive influence on costs and on the plannability of production capacities. In particular, unplanned failures during production times cause high costs, unplanned downtimes and possibly additional collateral damage. Predictive Maintenance starts here and tries to predict a possible failure and its cause so early that its prevention can be prepared and carried out in time. In order to be able to predict malfunctions and failures, the industrial plant with its characteristics, as well as wear and ageing processes, must be modelled. Such modelling can be done by replicating its physical properties. However, this is very complex and requires enormous expert knowledge about the plant and about wear and ageing processes of each individual component. Neural networks and machine learning make it possible to train such models using data and offer an alternative, especially when very complex and non-linear behaviour is evident.
In order for models to make predictions, as much data as possible about the condition of a plant and its environment and production planning data is needed. In Industrial Internet of Things (IIoT) environments, the amount of available data is constantly increasing. Intelligent sensors and highly interconnected production facilities produce a steady stream of data. The sheer volume of data, but also the steady stream in which data is transmitted, place high demands on the data processing systems. If a participating system wants to perform live analyses on the incoming data streams, it must be able to process the incoming data at least as fast as the continuous data stream delivers it. If this is not the case, the system falls further and further behind in processing and thus in its analyses. This also applies to Predictive Maintenance systems, especially if they use complex and computationally intensive machine learning models. If sufficiently scalable hardware resources are available, this may not be a problem at first. However, if this is not the case or if the processing takes place on decentralised units with limited hardware resources (e.g. edge devices), the runtime behaviour and resource requirements of the type of neural network used can become an important criterion.
This thesis addresses Predictive Maintenance systems in IIoT environments using neural networks and Deep Learning, where the runtime behaviour and the resource requirements are relevant. The question is whether it is possible to achieve better runtimes with similarly result quality using a new type of neural network. The focus is on reducing the complexity of the network and improving its parallelisability. Inspired by projects in which complexity was distributed to less complex neural subnetworks by upstream measures, two hypotheses presented in this thesis emerged: a) the distribution of complexity into simpler subnetworks leads to faster processing overall, despite the overhead this creates, and b) if a neural cell has a deeper internal structure, this leads to a less complex network. Within the framework of a qualitative study, an overall impression of Predictive Maintenance applications in IIoT environments using neural networks was developed. Based on the findings, a novel model layout was developed named Sliced Long Short-Term Memory Neural Network (SlicedLSTM). The SlicedLSTM implements the assumptions made in the aforementioned hypotheses in its inner model architecture.
Within the framework of a quantitative study, the runtime behaviour of the SlicedLSTM was compared with that of a reference model in the form of laboratory tests. The study uses synthetically generated data from a NASA project to predict failures of modules of aircraft gas turbines. The dataset contains 1,414 multivariate time series with 104,897 samples of test data and 160,360 samples of training data.
As a result, it could be proven for the specific application and the data used that the SlicedLSTM delivers faster processing times with similar result accuracy and thus clearly outperforms the reference model in this respect. The hypotheses about the influence of complexity in the internal structure of the neuronal cells were confirmed by the study carried out in the context of this thesis
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
This paper aims to efficiently enable Large Language Models (LLMs) to use
multimodal tools. Advanced proprietary LLMs, such as ChatGPT and GPT-4, have
shown great potential for tool usage through sophisticated prompt engineering.
Nevertheless, these models typically rely on prohibitive computational costs
and publicly inaccessible data. To address these challenges, we propose the
GPT4Tools based on self-instruct to enable open-source LLMs, such as LLaMA and
OPT, to use tools. It generates an instruction-following dataset by prompting
an advanced teacher with various multi-modal contexts. By using the Low-Rank
Adaptation (LoRA) optimization, our approach facilitates the open-source LLMs
to solve a range of visual problems, including visual comprehension and image
generation. Moreover, we provide a benchmark to evaluate the ability of LLMs to
use tools, which is performed in both zero-shot and fine-tuning ways. Extensive
experiments demonstrate the effectiveness of our method on various language
models, which not only significantly improves the accuracy of invoking seen
tools, but also enables the zero-shot capacity for unseen tools. The code and
demo are available at https://github.com/StevenGrove/GPT4Tools
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Transformer is a deep neural network that employs a self-attention mechanism
to comprehend the contextual relationships within sequential data. Unlike
conventional neural networks or updated versions of Recurrent Neural Networks
(RNNs) such as Long Short-Term Memory (LSTM), transformer models excel in
handling long dependencies between input sequence elements and enable parallel
processing. As a result, transformer-based models have attracted substantial
interest among researchers in the field of artificial intelligence. This can be
attributed to their immense potential and remarkable achievements, not only in
Natural Language Processing (NLP) tasks but also in a wide range of domains,
including computer vision, audio and speech processing, healthcare, and the
Internet of Things (IoT). Although several survey papers have been published
highlighting the transformer's contributions in specific fields, architectural
differences, or performance evaluations, there is still a significant absence
of a comprehensive survey paper encompassing its major applications across
various domains. Therefore, we undertook the task of filling this gap by
conducting an extensive survey of proposed transformer models from 2017 to
2022. Our survey encompasses the identification of the top five application
domains for transformer-based models, namely: NLP, Computer Vision,
Multi-Modality, Audio and Speech Processing, and Signal Processing. We analyze
the impact of highly influential transformer-based models in these domains and
subsequently classify them based on their respective tasks using a proposed
taxonomy. Our aim is to shed light on the existing potential and future
possibilities of transformers for enthusiastic researchers, thus contributing
to the broader understanding of this groundbreaking technology
Scalable Exploration of Complex Objects and Environments Beyond Plain Visual Replication​
Digital multimedia content and presentation means are rapidly increasing their sophistication and are now capable of describing detailed representations of the physical world. 3D exploration experiences allow people to appreciate, understand and interact with intrinsically virtual objects.
Communicating information on objects requires the ability to explore them under different angles, as well as to mix highly photorealistic or illustrative presentations of the object themselves with additional data that provides additional insights on these objects, typically represented in the form of annotations. Effectively providing these capabilities requires the solution of important problems in visualization and user interaction.
In this thesis, I studied these problems in the cultural heritage-computing-domain, focusing on the very common and important special case of mostly planar, but visually, geometrically, and semantically rich objects. These could be generally roughly flat objects with a standard frontal viewing direction (e.g., paintings, inscriptions, bas-reliefs), as well as visualizations of fully 3D objects from a particular point of views (e.g., canonical views of buildings or statues). Selecting a precise application domain and a specific presentation mode allowed me to concentrate on the well defined use-case of the exploration of annotated relightable stratigraphic models (in particular, for local and remote museum presentation).
My main results and contributions to the state of the art have been a novel technique for interactively controlling visualization lenses while automatically maintaining good focus-and-context parameters, a novel approach for avoiding clutter in an annotated model and for guiding users towards interesting areas, and a method for structuring audio-visual object annotations into a graph and for using that graph to improve guidance and support storytelling and automated tours.
We demonstrated the effectiveness and potential of our techniques by performing interactive exploration sessions on various screen sizes and types ranging from desktop devices to large-screen displays for a walk-up-and-use museum installation.
KEYWORDS - Computer Graphics, Human-Computer Interaction, Interactive Lenses, Focus-and-Context, Annotated Models, Cultural Heritage Computing
- …