1,810 research outputs found

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    On automatic emotion classification using acoustic features

    No full text
    In this thesis, we describe extensive experiments on the classification of emotions from speech using acoustic features. This area of research has important applications in human computer interaction. We have thoroughly reviewed the current literature and present our results on some of the contemporary emotional speech databases. The principal focus is on creating a large set of acoustic features, descriptive of different emotional states and finding methods for selecting a subset of best performing features by using feature selection methods. In this thesis we have looked at several traditional feature selection methods and propose a novel scheme which employs a preferential Borda voting strategy for ranking features. The comparative results show that our proposed scheme can strike a balance between accurate but computationally intensive wrapper methods and less accurate but computationally less intensive filter methods for feature selection. By using the selected features, several schemes for extending the binary classifiers to multiclass classification are tested. Some of these classifiers form serial combinations of binary classifiers while others use a hierarchical structure to perform this task. We describe a new hierarchical classification scheme, which we call Data-Driven Dimensional Emotion Classification (3DEC), whose decision hierarchy is based on non-metric multidimensional scaling (NMDS) of the data. This method of creating a hierarchical structure for the classification of emotion classes gives significant improvements over other methods tested. The NMDS representation of emotional speech data can be interpreted in terms of the well-known valence-arousal model of emotion. We find that this model does not givea particularly good fit to the data: although the arousal dimension can be identified easily, valence is not well represented in the transformed data. From the recognitionresults on these two dimensions, we conclude that valence and arousal dimensions are not orthogonal to each other. In the last part of this thesis, we deal with the very difficult but important topic of improving the generalisation capabilities of speech emotion recognition (SER) systems over different speakers and recording environments. This topic has been generally overlooked in the current research in this area. First we try the traditional methods used in automatic speech recognition (ASR) systems for improving the generalisation of SER in intra– and inter–database emotion classification. These traditional methods do improve the average accuracy of the emotion classifier. In this thesis, we identify these differences in the training and test data, due to speakers and acoustic environments, as a covariate shift. This shift is minimised by using importance weighting algorithms from the emerging field of transfer learning to guide the learning algorithm towards that training data which gives better representation of testing data. Our results show that importance weighting algorithms can be used to minimise the differences between the training and testing data. We also test the effectiveness of importance weighting algorithms on inter–database and cross-lingual emotion recognition. From these results, we draw conclusions about the universal nature of emotions across different languages

    Reviewing the Role of Cognitive Load, Expertise Level, Motivation, and Unconscious Processing in Working Memory Performance

    Get PDF
    Human cognitive capacity is unavailable for conscious processing of every amount of instructional messages. Aligning an instructional design with learner expertise level would allow better use of available working memory capacity in a cognitive learning task. Motivating students to learn consciously is also an essential determinant of the capacity usage. However, motivational factors are often subject to unconscious rather than conscious emotional processing. This review sets out the need for further studies to elucidate the role of motivation and unconscious processing in the use of cognitive capacity

    A Collective Adaptive Approach to Decentralised k-Coverage in Multi-robot Systems

    Get PDF
    We focus on the online multi-object k-coverage problem (OMOkC), where mobile robots are required to sense a mobile target from k diverse points of view, coordinating themselves in a scalable and possibly decentralised way. There is active research on OMOkC, particularly in the design of decentralised algorithms for solving it. We propose a new take on the issue: Rather than classically developing new algorithms, we apply a macro-level paradigm, called aggregate computing, specifically designed to directly program the global behaviour of a whole ensemble of devices at once. To understand the potential of the application of aggregate computing to OMOkC, we extend the Alchemist simulator (supporting aggregate computing natively) with a novel toolchain component supporting the simulation of mobile robots. This way, we build a software engineering toolchain comprising language and simulation tooling for addressing OMOkC. Finally, we exercise our approach and related toolchain by introducing new algorithms for OMOkC; we show that they can be expressed concisely, reuse existing software components and perform better than the current state-of-the-art in terms of coverage over time and number of objects covered overall

    Content-prioritised video coding for British Sign Language communication.

    Get PDF
    Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people

    Maximum risk reduction with a fixed budget in the railway industry

    Get PDF
    Decision-makers in safety-critical industries such as the railways are frequently faced with the complexity of selecting technological, procedural and operational solutions to minimise staff, passengers and third parties’ safety risks. In reality, the options for maximising risk reduction are limited by time and budget constraints as well as performance objectives. Maximising risk reduction is particularly necessary in the times of economic recession where critical services such as those on the UK rail network are not immune to budget cuts. This dilemma is further complicated by statutory frameworks stipulating ‘suitable and sufficient’ risk assessments and constraints such as ‘as low as reasonably practicable’. These significantly influence risk reduction option selection and influence their effective implementation. This thesis provides extensive research in this area and highlights the limitations of widely applied practices. These practices have limited significance on fundamental engineering principles and become impracticable when a constraint such as a fixed budget is applied – this is the current reality of UK rail network operations and risk management. This thesis identifies three main areas of weaknesses to achieving the desired objectives with current risk reduction methods as: Inaccurate, and unclear problem definition; Option evaluation and selection removed from implementation subsequently resulting in misrepresentation of risks and costs; Use of concepts and methods that are not based on fundamental engineering principles, not verifiable and with resultant sub-optimal solutions. Although not solely intended for a single industrial sector, this thesis focuses on guiding the railway risk decision-maker by providing clear categorisation of measures used on railways for risk reduction. This thesis establishes a novel understanding of risk reduction measures’ application limitations and respective strengths. This is achieved by applying ‘key generic engineering principles’ to measures employed for risk reduction. A comprehensive study of their preventive and protective capability in different configurations is presented. Subsequently, the fundamental understanding of risk reduction measures and their railway applications, the ‘cost-of-failure’ (CoF), ‘risk reduction readiness’ (RRR), ‘design-operationalprocedural-technical’ (DOPT) concepts are developed for rational and cost-effective risk reduction. These concepts are shown to be particularly relevant to cases where blind applications of economic and mathematical theories are misleading and detrimental to engineering risk management. The case for successfully implementing this framework for maximum risk reduction within a fixed budget is further strengthened by applying, for the first time in railway risk reduction applications, the dynamic programming technique based on practical railway examples

    Machine learning for automatic analysis of affective behaviour

    Get PDF
    The automated analysis of affect has been gaining rapidly increasing attention by researchers over the past two decades, as it constitutes a fundamental step towards achieving next-generation computing technologies and integrating them into everyday life (e.g. via affect-aware, user-adaptive interfaces, medical imaging, health assessment, ambient intelligence etc.). The work presented in this thesis focuses on several fundamental problems manifesting in the course towards the achievement of reliable, accurate and robust affect sensing systems. In more detail, the motivation behind this work lies in recent developments in the field, namely (i) the creation of large, audiovisual databases for affect analysis in the so-called ''Big-Data`` era, along with (ii) the need to deploy systems under demanding, real-world conditions. These developments led to the requirement for the analysis of emotion expressions continuously in time, instead of merely processing static images, thus unveiling the wide range of temporal dynamics related to human behaviour to researchers. The latter entails another deviation from the traditional line of research in the field: instead of focusing on predicting posed, discrete basic emotions (happiness, surprise etc.), it became necessary to focus on spontaneous, naturalistic expressions captured under settings more proximal to real-world conditions, utilising more expressive emotion descriptions than a set of discrete labels. To this end, the main motivation of this thesis is to deal with challenges arising from the adoption of continuous dimensional emotion descriptions under naturalistic scenarios, considered to capture a much wider spectrum of expressive variability than basic emotions, and most importantly model emotional states which are commonly expressed by humans in their everyday life. In the first part of this thesis, we attempt to demystify the quite unexplored problem of predicting continuous emotional dimensions. This work is amongst the first to explore the problem of predicting emotion dimensions via multi-modal fusion, utilising facial expressions, auditory cues and shoulder gestures. A major contribution of the work presented in this thesis lies in proposing the utilisation of various relationships exhibited by emotion dimensions in order to improve the prediction accuracy of machine learning methods - an idea which has been taken on by other researchers in the field since. In order to experimentally evaluate this, we extend methods such as the Long Short-Term Memory Neural Networks (LSTM), the Relevance Vector Machine (RVM) and Canonical Correlation Analysis (CCA) in order to exploit output relationships in learning. As it is shown, this increases the accuracy of machine learning models applied to this task. The annotation of continuous dimensional emotions is a tedious task, highly prone to the influence of various types of noise. Performed real-time by several annotators (usually experts), the annotation process can be heavily biased by factors such as subjective interpretations of the emotional states observed, the inherent ambiguity of labels related to human behaviour, the varying reaction lags exhibited by each annotator as well as other factors such as input device noise and annotation errors. In effect, the annotations manifest a strong spatio-temporal annotator-specific bias. Failing to properly deal with annotation bias and noise leads to an inaccurate ground truth, and therefore to ill-generalisable machine learning models. This deems the proper fusion of multiple annotations, and the inference of a clean, corrected version of the ``ground truth'' as one of the most significant challenges in the area. A highly important contribution of this thesis lies in the introduction of Dynamic Probabilistic Canonical Correlation Analysis (DPCCA), a method aimed at fusing noisy continuous annotations. By adopting a private-shared space model, we isolate the individual characteristics that are annotator-specific and not shared, while most importantly we model the common, underlying annotation which is shared by annotators (i.e., the derived ground truth). By further learning temporal dynamics and incorporating a time-warping process, we are able to derive a clean version of the ground truth given multiple annotations, eliminating temporal discrepancies and other nuisances. The integration of the temporal alignment process within the proposed private-shared space model deems DPCCA suitable for the problem of temporally aligning human behaviour; that is, given temporally unsynchronised sequences (e.g., videos of two persons smiling), the goal is to generate the temporally synchronised sequences (e.g., the smile apex should co-occur in the videos). Temporal alignment is an important problem for many applications where multiple datasets need to be aligned in time. Furthermore, it is particularly suitable for the analysis of facial expressions, where the activation of facial muscles (Action Units) typically follows a set of predefined temporal phases. A highly challenging scenario is when the observations are perturbed by gross, non-Gaussian noise (e.g., occlusions), as is often the case when analysing data acquired under real-world conditions. To account for non-Gaussian noise, a robust variant of Canonical Correlation Analysis (RCCA) for robust fusion and temporal alignment is proposed. The model captures the shared, low-rank subspace of the observations, isolating the gross noise in a sparse noise term. RCCA is amongst the first robust variants of CCA proposed in literature, and as we show in related experiments outperforms other, state-of-the-art methods for related tasks such as the fusion of multiple modalities under gross noise. Beyond private-shared space models, Component Analysis (CA) is an integral component of most computer vision systems, particularly in terms of reducing the usually high-dimensional input spaces in a meaningful manner pertaining to the task-at-hand (e.g., prediction, clustering). A final, significant contribution of this thesis lies in proposing the first unifying framework for probabilistic component analysis. The proposed framework covers most well-known CA methods, such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Locality Preserving Projections (LPP) and Slow Feature Analysis (SFA), providing further theoretical insights into the workings of CA. Moreover, the proposed framework is highly flexible, enabling novel CA methods to be generated by simply manipulating the connectivity of latent variables (i.e. the latent neighbourhood). As shown experimentally, methods derived via the proposed framework outperform other equivalents in several problems related to affect sensing and facial expression analysis, while providing advantages such as reduced complexity and explicit variance modelling.Open Acces

    Holistic, data-driven, service and supply chain optimisation: linked optimisation.

    Get PDF
    The intensity of competition and technological advancements in the business environment has made companies collaborate and cooperate together as a means of survival. This creates a chain of companies and business components with unified business objectives. However, managing the decision-making process (like scheduling, ordering, delivering and allocating) at the various business components and maintaining a holistic objective is a huge business challenge, as these operations are complex and dynamic. This is because the overall chain of business processes is widely distributed across all the supply chain participants; therefore, no individual collaborator has a complete overview of the processes. Increasingly, such decisions are automated and are strongly supported by optimisation algorithms - manufacturing optimisation, B2B ordering, financial trading, transportation scheduling and allocation. However, most of these algorithms do not incorporate the complexity associated with interacting decision-making systems like supply chains. It is well-known that decisions made at one point in supply chains can have significant consequences that ripple through linked production and transportation systems. Recently, global shocks to supply chains (COVID-19, climate change, blockage of the Suez Canal) have demonstrated the importance of these interdependencies, and the need to create supply chains that are more resilient and have significantly reduced impact on the environment. Such interacting decision-making systems need to be considered through an optimisation process. However, the interactions between such decision-making systems are not modelled. We therefore believe that modelling such interactions is an opportunity to provide computational extensions to current optimisation paradigms. This research study aims to develop a general framework for formulating and solving holistic, data-driven optimisation problems in service and supply chains. This research achieved this aim and contributes to scholarship by firstly considering the complexities of supply chain problems from a linked problem perspective. This leads to developing a formalism for characterising linked optimisation problems as a model for supply chains. Secondly, the research adopts a method for creating a linked optimisation problem benchmark by linking existing classical benchmark sets. This involves using a mix of classical optimisation problems, typically relating to supply chain decision problems, to describe different modes of linkages in linked optimisation problems. Thirdly, several techniques for linking supply chain fragmented data have been proposed in the literature to identify data relationships. Therefore, this thesis explores some of these techniques and combines them in specific ways to improve the data discovery process. Lastly, many state-of-the-art algorithms have been explored in the literature and these algorithms have been used to tackle problems relating to supply chain problems. This research therefore investigates the resilient state-of-the-art optimisation algorithms presented in the literature, and then designs suitable algorithmic approaches inspired by the existing algorithms and the nature of problem linkages to address different problem linkages in supply chains. Considering research findings and future perspectives, the study demonstrates the suitability of algorithms to different linked structures involving two sub-problems, which suggests further investigations on issues like the suitability of algorithms on more complex structures, benchmark methodologies, holistic goals and evaluation, processmining, game theory and dependency analysis
    • 

    corecore