21 research outputs found

    Deep Learning for Distant Speech Recognition

    Full text link
    Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called network of deep neural networks. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.Comment: PhD Thesis Unitn, 201

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    Multimedia Retrieval

    Get PDF

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Computational Intelligence and Human- Computer Interaction: Modern Methods and Applications

    Get PDF
    The present book contains all of the articles that were accepted and published in the Special Issue of MDPI’s journal Mathematics titled "Computational Intelligence and Human–Computer Interaction: Modern Methods and Applications". This Special Issue covered a wide range of topics connected to the theory and application of different computational intelligence techniques to the domain of human–computer interaction, such as automatic speech recognition, speech processing and analysis, virtual reality, emotion-aware applications, digital storytelling, natural language processing, smart cars and devices, and online learning. We hope that this book will be interesting and useful for those working in various areas of artificial intelligence, human–computer interaction, and software engineering as well as for those who are interested in how these domains are connected in real-life situations

    RFID Technology in Intelligent Tracking Systems in Construction Waste Logistics Using Optimisation Techniques

    Get PDF
    Construction waste disposal is an urgent issue for protecting our environment. This paper proposes a waste management system and illustrates the work process using plasterboard waste as an example, which creates a hazardous gas when land filled with household waste, and for which the recycling rate is less than 10% in the UK. The proposed system integrates RFID technology, Rule-Based Reasoning, Ant Colony optimization and knowledge technology for auditing and tracking plasterboard waste, guiding the operation staff, arranging vehicles, schedule planning, and also provides evidence to verify its disposal. It h relies on RFID equipment for collecting logistical data and uses digital imaging equipment to give further evidence; the reasoning core in the third layer is responsible for generating schedules and route plans and guidance, and the last layer delivers the result to inform users. The paper firstly introduces the current plasterboard disposal situation and addresses the logistical problem that is now the main barrier to a higher recycling rate, followed by discussion of the proposed system in terms of both system level structure and process structure. And finally, an example scenario will be given to illustrate the system’s utilization

    Understanding Video Transformers for Segmentation: A Survey of Application and Interpretability

    Full text link
    Video segmentation encompasses a wide range of categories of problem formulation, e.g., object, scene, actor-action and multimodal video segmentation, for delineating task-specific scene components with pixel-level masks. Recently, approaches in this research area shifted from concentrating on ConvNet-based to transformer-based models. In addition, various interpretability approaches have appeared for transformer models and video temporal dynamics, motivated by the growing interest in basic scientific understanding, model diagnostics and societal implications of real-world deployment. Previous surveys mainly focused on ConvNet models on a subset of video segmentation tasks or transformers for classification tasks. Moreover, component-wise discussion of transformer-based video segmentation models has not yet received due focus. In addition, previous reviews of interpretability methods focused on transformers for classification, while analysis of video temporal dynamics modelling capabilities of video models received less attention. In this survey, we address the above with a thorough discussion of various categories of video segmentation, a component-wise discussion of the state-of-the-art transformer-based models, and a review of related interpretability methods. We first present an introduction to the different video segmentation task categories, their objectives, specific challenges and benchmark datasets. Next, we provide a component-wise review of recent transformer-based models and document the state of the art on different video segmentation tasks. Subsequently, we discuss post-hoc and ante-hoc interpretability methods for transformer models and interpretability methods for understanding the role of the temporal dimension in video models. Finally, we conclude our discussion with future research directions
    corecore