17 research outputs found

    A semantic communication and VVC based hybrid video coding system

    Get PDF
    Requirements of next-generation video applications are becoming a challenge for conven-tional video coding systems, although they have evolved over decades to accommodate the most demanding of current video applications. Semantic communications, built on the concept of transmitting just the semantics of a message and allowing the receiver to reconstruct the message based on a shared context, is a non-conventional approach being considered to overcome these challenges and improve performance of video coding systems. In this paper, a first such semantic communication-based video coding system in hybrid mode is proposed, which uses an autoencoder-based semantic encoder for inter coding, augmented by the intra coding capabilities of Versatile Video Coding (VVC) to encode key frames that form the context for the semantic communication and the residuals for improving the fidelity of the output frames. For a range of videos with differing levels of complexity, the proposed system consistently outperforms High Efficiency Video Coding (HEVC) and Advanced Video Coding (AVC) in terms of rate distortion metrics quantified by Bjontegaard Delta Rates. It also outperforms Versatile Video Coding with videos with low or high complexity, but slightly falls behind with videos with medium complexity, which can be improved by addressing the open research areas that stem from this work. The proposed system demonstrates the potential of semantic communication based video coding systems to consistently outperform state-of-the-art conventional video coding systems over a wide range video applications

    Tailoring Interaction. Sensing Social Signals with Textiles.

    Get PDF
    Nonverbal behaviour is an important part of conversation and can reveal much about the nature of an interaction. It includes phenomena ranging from large-scale posture shifts to small scale nods. Capturing these often spontaneous phenomena requires unobtrusive sensing techniques that do not interfere with the interaction. We propose an underexploited sensing modality for sensing nonverbal behaviours: textiles. As a material in close contact with the body, they provide ubiquitous, large surfaces that make them a suitable soft interface. Although the literature on nonverbal communication focuses on upper body movements such as gestures, observations of multi-party, seated conversations suggest that sitting postures, leg and foot movements are also systematically related to patterns of social interaction. This thesis addressees the following questions: Can the textiles surrounding us measure social engagement? Can they tell who is speaking, and who, if anyone, is listening? Furthermore, how should wearable textile sensing systems be designed and what behavioural signals could textiles reveal? To address these questions, we have designed and manufactured bespoke chairs and trousers with integrated textile pressure sensors, that are introduced here. The designs are evaluated in three user studies that produce multi-modal datasets for the exploration of fine-grained interactional signals. Two approaches to using these bespoke textile sensors are explored. First, hand crafted sensor patches in chair covers serve to distinguish speakers and listeners. Second, a pressure sensitive matrix in custom-made smart trousers is developed to detect static sitting postures, dynamic bodily movement, as well as basic conversational states. Statistical analyses, machine learning approaches, and ethnographic methods show that by moni- toring patterns of pressure change alone it is possible to not only classify postures with high accuracy, but also to identify a wide range of behaviours reliably in individuals and groups. These findings es- tablish textiles as a novel, wearable sensing system for applications in social sciences, and contribute towards a better understanding of nonverbal communication, especially the significance of posture shifts when seated. If chairs know who is speaking, if our trousers can capture our social engagement, what role can smart textiles have in the future of human interaction? How can we build new ways to map social ecologies and tailor interactions

    Investigations of collaborative design environments: A framework for real-time collaborative 3D CAD

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.This research investigates computer-based collaborative design environments, in particular issues of real-time collaborative 3D CAD. The thesis first presents a broad perspective of collaborative design environments with a preliminary case study of team design activities in a conventional and a computer mediated setting. This study identifies the impact and the feasibility of computer support for collaborative design and suggests four kinds of essential technologies for a successful collaborative design environment: information-sharing systems, synchronous and asynchronous co- working tools, project management systems, and communication systems. A new conceptual framework for a real-time collaborative 3D design tool, Shared Stage, is proposed based upon the preliminary study. The Shared Stage is defined as a shared 3D design workspace aiming to smoothly incorporate shared 3D workspaces into existing individual 3D workspaces. The addition of a Shared Stage allows collaborating designers to interact in real-time and to have a dynamic and interactive exchange of intermediate 3D design data. The acceptability of collaborative features is maximised by maintaining consistency of the user interface between 3D CAD systems. The framework is subsequently implemented as a software prototype using a new software development environment, customised by integrating related real-time and 3D graphic software development tools. Two main components of the Shared Stage module in the prototype, the Synchronised Stage View (SSV) and the Data Structure Diagram (DSD), provide essential collaborative features for real-time collaborative 3D CAD. These features include synchronised shared 3D representation, dynamic data exchange and awareness support in 3D workspaces. The software prototype is subsequently evaluated to examine the usefulness and usability. A range of quantitative and qualitative methods is used to evaluate the impact of the Shared Stage. The results, including the analysis of collaborative interactions and user perception, illustrate that the Shared Stage is a feasible and valuable addition for real-time collaborative 3D CAD. This research identifies the issues to be addressed for collaborative design environments and also provides a new framework and development strategy of a novel real-time collaborative 3D CAD system. The framework is successfully demonstrated through prototype implementation and an analytical usability evaluation.Financial support from the Department and from the UK government through the Overseas Research Studentship Awards

    Using contour information and segmentation for object registration, modeling and retrieval

    Get PDF
    This thesis considers different aspects of the utilization of contour information and syntactic and semantic image segmentation for object registration, modeling and retrieval in the context of content-based indexing and retrieval in large collections of images. Target applications include retrieval in collections of closed silhouettes, holistic w ord recognition in handwritten historical manuscripts and shape registration. Also, the thesis explores the feasibility of contour-based syntactic features for improving the correspondence of the output of bottom-up segmentation to semantic objects present in the scene and discusses the feasibility of different strategies for image analysis utilizing contour information, e.g. segmentation driven by visual features versus segmentation driven by shape models or semi-automatic in selected application scenarios. There are three contributions in this thesis. The first contribution considers structure analysis based on the shape and spatial configuration of image regions (socalled syntactic visual features) and their utilization for automatic image segmentation. The second contribution is the study of novel shape features, matching algorithms and similarity measures. Various applications of the proposed solutions are presented throughout the thesis providing the basis for the third contribution which is a discussion of the feasibility of different recognition strategies utilizing contour information. In each case, the performance and generality of the proposed approach has been analyzed based on extensive rigorous experimentation using as large as possible test collections

    Geographically distributed requirements elicitation

    Get PDF
    The technology revolution has transformed the way in which many organisations do their business. The resultant information systems have increased the decision making powers of executives, leading to increased effectiveness and ultimately to improved product delivery. The process of information systems development is, however, complex. Furthermore, it has a poor track record in terms of on-time and within-budget delivery, but more significantly in terms of low user acceptance frequently attributable to poor user requirements specification. Consequently, much attention has been given to the process of requirements elicitation, with both researchers and businessmen seeking new, innovative and effective methods. These methods usually involve large numbers of participants who are drawn from within the client and developer organisations. This is a financially costly characteristic of the requirements elicitation process. Besides information systems, the technology revolution has also brought sophisticated communication technologies into the marketplace. These communication technologies allow people to communicate with one another in a variety of different time and space scenarios. An important spin-off of this is the ability for people located in significantly different geographical locations to work collaboratively on a project. It is claimed that this approach to work has significant cost and productivity advantages. This study draws the requirements elicitation process into the realm of collaborative work. Important project management, communication, and collaborative working principles are examined in detail, and a model is developed which represents these issues as they pertain to the requirements elicitation process. An empirical study (conducted in South Africa) is performed in order to examine the principles of the model and the relationships between its constituent elements. A model of geographically distributed requirements elicitation (GDRE) is developed on the basis of the findings of this investigation. The model of GDRE is presented as a 3-phased approach to requirements elicitation, namely planning, implementation, and termination. Significantly, the model suggests the use of interviews, structured workshops, and prototyping as the chief requirements elicitation methods to be adopted in appropriate conditions. Although a detailed study of communications technology was not performed, this thesis suggests that each individual GDRE implementation requires a different mix of communication technologies to support its implementation

    The application of range imaging for improved local feature representations

    Get PDF
    This thesis presents an investigation into the integration of information extracted from co-aligned range and intensity images to achieve pose invariant object recognition. Local feature matching is a fundamental technique in image analysis that underpins many computer vision-based applications; the approach comprises identifying a collection of interest points in an image, characterising the local image region surrounding the interest point by means of a descriptor, and matching these descriptors between example images. Such local feature descriptors are formed from a measure of the local image statistics in the region surrounding the interest point. The interest point locations and the means of measuring local image statistics should be chosen such that resultant descriptor remains stable across a range of common image transformations. Recently the availability of low cost, high quality range imaging devices has motivated an interest in local feature extraction from range images. It has been widely assumed in the vision community that the range imaging domain has properties which remain quasi-invariant through a wide range of changes in illumination and pose. Accordingly, it has been suggested that local feature extraction in the range domain should allow the calculation of local feature descriptors that are potentially more robust than those calculated from the intensity imaging domain alone. However, range images represent differing characteristics from those represented within intensity images which are frequently used, independently from range images, to create robust local features. Therefore, this work attempts to establish the best means of combining information from these two imaging modalities to further increase the reliability of matching local features. Local feature extraction comprises a series of processes applied to an image location such that a collection of repeatable descriptors can be established. By using co-aligned range and intensity images this work investigates the choice of modality and method for each step in the extraction process as an approach to optimising the resulting descriptor. Additionally, multimodal features are formed by combining information from both domains in a single stage in the extraction process. To further improve the quality of feature descriptors, a calculation of the surface normals and a use of the 3D structure from the range image are applied to correct the 3D appearance of a local sample patch, thereby increasing the similarity between observations. The matching performance of local features is evaluated using an experimental setup comprising a turntable and stereo pair of cameras. This experimental setup is used to create a database of intensity and range images for 5 objects imaged at 72 calibrated viewpoints, creating a database of 360 object observations. The use of a calibrated turntable in combination with the 3D object surface coordiantes, supplied by the range image allow location correspondences between object observations to be established; and therefore descriptor matches to be labelled as either true positive or false positive. Applying this methodology to the formulated local features show that two approaches demonstrate state-of-the-art performance, with a ~40% increase in area under ROC curve at a False Positive Rate of 10% when compared with standard SIFT. These approaches are range affine corrected intensity SIFT and element corrected surface gradients SIFT. Furthermore,this work uses the 3D structure encoded in the range image to organise collections of interest points from a series of observations into a collection of canonical views in a new model local feature. The canonical views for a interest point are stored in a view compartmentalised structure which allows the appearance of a local interest point to be characterised across the view sphere. Each canonical view is assigned a confidence measure based on the 3D pose of the interest point at observation, this confidence measure is then used to match similar canonical views of model and query interest points thereby achieving a pose invariant interest point description. This approach does not produce a statistically significant performance increase. However, does contribute a validated methodology for combining multiple descriptors with differing confidence weightings into a single keypoint

    The blessings of explainable AI in operations & maintenance of wind turbines

    Get PDF
    Wind turbines play an integral role in generating clean energy, but regularly suffer from operational inconsistencies and failures leading to unexpected downtimes and significant Operations & Maintenance (O&M) costs. Condition-Based Monitoring (CBM) has been utilised in the past to monitor operational inconsistencies in turbines by applying signal processing techniques to vibration data. The last decade has witnessed growing interest in leveraging Supervisory Control & Acquisition (SCADA) data from turbine sensors towards CBM. Machine Learning (ML) techniques have been utilised to predict incipient faults in turbines and forecast vital operational parameters with high accuracy by leveraging SCADA data and alarm logs. More recently, Deep Learning (DL) methods have outperformed conventional ML techniques, particularly for anomaly prediction. Despite demonstrating immense promise in transitioning to Artificial Intelligence (AI), such models are generally black-boxes that cannot provide rationales behind their predictions, hampering the ability of turbine operators to rely on automated decision making. We aim to help combat this challenge by providing a novel perspective on Explainable AI (XAI) for trustworthy decision support.This thesis revolves around three key strands of XAI – DL, Natural Language Generation (NLG) and Knowledge Graphs (KGs), which are investigated by utilising data from an operational turbine. We leverage DL and NLG to predict incipient faults and alarm events in the turbine in natural language as well as generate human-intelligible O&M strategies to assist engineers in fixing/averting the faults. We also propose specialised DL models which can predict causal relationships in SCADA features as well as quantify the importance of vital parameters leading to failures. The thesis finally culminates with an interactive Question- Answering (QA) system for automated reasoning that leverages multimodal domain-specific information from a KG, facilitating engineers to retrieve O&M strategies with natural language questions. By helping make turbines more reliable, we envisage wider adoption of wind energy sources towards tackling climate change

    Language variation, automatic speech recognition and algorithmic bias

    Get PDF
    In this thesis, I situate the impacts of automatic speech recognition systems in relation to sociolinguistic theory (in particular drawing on concepts of language variation, language ideology and language policy) and contemporary debates in AI ethics (especially regarding algorithmic bias and fairness). In recent years, automatic speech recognition systems, alongside other language technologies, have been adopted by a growing number of users and have been embedded in an increasing number of algorithmic systems. This expansion into new application domains and language varieties can be understood as an expansion into new sociolinguistic contexts. In this thesis, I am interested in how automatic speech recognition tools interact with this sociolinguistic context, and how they affect speakers, speech communities and their language varieties. Focussing on commercial automatic speech recognition systems for British Englishes, I first explore the extent and consequences of performance differences of these systems for different user groups depending on their linguistic background. When situating this predictive bias within the wider sociolinguistic context, it becomes apparent that these systems reproduce and potentially entrench existing linguistic discrimination and could therefore cause direct and indirect harms to already marginalised speaker groups. To understand the benefits and potentials of automatic transcription tools, I highlight two case studies: transcribing sociolinguistic data in English and transcribing personal voice messages in isiXhosa. The central role of the sociolinguistic context in developing these tools is emphasised in this comparison. Design choices, such as the choice of training data, are particularly consequential because they interact with existing processes of language standardisation. To understand the impacts of these choices, and the role of the developers making them better, I draw on theory from language policy research and critical data studies. These conceptual frameworks are intended to help practitioners and researchers in anticipating and mitigating predictive bias and other potential harms of speech technologies. Beyond looking at individual choices, I also investigate the discourses about language variation and linguistic diversity deployed in the context of language technologies. These discourses put forward by researchers, developers and commercial providers not only have a direct effect on the wider sociolinguistic context, but they also highlight how this context (e.g., existing beliefs about language(s)) affects technology development. Finally, I explore ways of building better automatic speech recognition tools, focussing in particular on well-documented, naturalistic and diverse benchmark datasets. However, inclusive datasets are not necessarily a panacea, as they still raise important questions about the nature of linguistic data and language variation (especially in relation to identity), and may not mitigate or prevent all potential harms of automatic speech recognition systems as embedded in larger algorithmic systems and sociolinguistic contexts
    corecore