163 research outputs found

    Deep Learning for Audio Signal Processing

    Full text link
    Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

    Audio Event Detection using Weakly Labeled Data

    Full text link
    Acoustic event detection is essential for content analysis and description of multimedia recordings. The majority of current literature on the topic learns the detectors through fully-supervised techniques employing strongly labeled data. However, the labels available for majority of multimedia data are generally weak and do not provide sufficient detail for such methods to be employed. In this paper we propose a framework for learning acoustic event detectors using only weakly labeled data. We first show that audio event detection using weak labels can be formulated as an Multiple Instance Learning problem. We then suggest two frameworks for solving multiple-instance learning, one based on support vector machines, and the other on neural networks. The proposed methods can help in removing the time consuming and expensive process of manually annotating data to facilitate fully supervised learning. Moreover, it can not only detect events in a recording but can also provide temporal locations of events in the recording. This helps in obtaining a complete description of the recording and is notable since temporal information was never known in the first place in weakly labeled data.Comment: ACM Multimedia 201

    Study, Implementation and Evaluation of Event Detection and Anomaly Identification Systems based on acoustic information

    Get PDF
    En la actualidad, el interés por la detección de eventos anómalos ha ido en aumento entre diferentes campos de investigación del estado del arte, como la visión por ordenador, el procesamiento de señales, la banca, etc. Las técnicas de Machine Learning (ML), y en concreto las técnicas de aprendizaje profundo, o Deep Learning (DL), han tenido un gran impacto en el desarrollo de las recientes aproximaciones,permitiendo grandes mejoras en cuanto a los índices de precisión de los sistemas propuestos. La visión por ordenador es el campo más avanzado en esta área. No obstante, existen sistemas en los que este problema se aborda a través de la información acústica proporcionada por un micrófono, o un conjunto de ellos, colocado en un entorno, debido a diferentes condicionantes: i) Privacidad del usuario; entornos en los que se debe monitorizar una situación y avisar si se encuentra alguna anomalía. Un ejemplo de este tipo de sistema es un sistema de detección de violencia doméstica desplegado en un hogar. ii) Mal funcionamiento de maquinaria; Componentes como el interior de un motor en donde es complejo instalar una cámara para comprobar el desgaste de las piezas o su correcto funcionamiento, abordar esta tarea con información acústica es una solución típica A partir de un estudio del estado actual del arte en la detección de eventos acústicos anómalos, se ha considerado utilizar un sistema existente para el desarrollo de este trabajo fin de grado. Los principales objetivos planteados han sido: reproducir los experimentos realizados por los desarrolladores del sistema elegido, consiguiendo así resultados similares; cambiar la base de datos utilizada para entrenar, validar y probar el sistema, con el fin de estudiar la adaptabilidad de la red a un nuevo tipo de datos; y modificar la red dada para estudiar el efecto que éstas tienen en el rendimiento del sistema. Además, se ha estudiado un segundo sistema. Dicho sistema, denominado SELDNet, es bien conocido en el estado del arte y se centra en la detección de eventos acústicos así como en la clasificación multiclase de los mismos. Aunque no se aproxima a la tarea de detección de eventos anómalos propuesta en este proyecto, es relevante su estudio ya que un primer paso para la detección de anomalías es la detección de los eventos acústicos.Nowadays, the interest in detecting anomalous events has been rising within different state-of-the-art research fields, such as computer vision, signal processing, banking and so on. Machine Learning techniques, and specifically Deep Learning techniques, have had a great impact on the recent approaches developed, allowing great improvements in terms of the accuracy rates of the proposed systems. Computer vision is the most advanced field in this area. Nevertheless, there are systems where this problem is addressed through the acoustic information provided by a microphone placed inside an environment, due to different constraints: i) User privacy; environments where a situation must be monitored and a warning given if an anomaly is found. An example of this kind of system is a domestic violence detection system deployed in a house. ii) Machinery malfunction; Components such as engines where it is complex to set up a camera inside to check the wear of the pieces or their correct operation, approaching this task with acoustic information is a typical solution. Based on a study of the current state of the art in the detection of anomalous acoustic events, it has been considered to use an existing system for the development of this degree final project. The main objectives set have been: to reproduce the experiments carried out by the chosen system developers, thus achieving similar results; to change the database used to train, validate and test the system, in order to study the adaptability of the network to a new type of data; and to modify the given network to study the effect that these have on the performance of the system. In addition, a second system has been studied. Said system, named as SELDNet, is well-known in the state of the art and focuses on the detection of acoustic events as well as on the multi-class classification of them. Although it does not approach the anomalous event detection task proposed in this project, it is relevant to study it since a first step for anomaly detection is the detection of the acoustic eventsGrado en Ingeniería en Tecnologías de Telecomunicació

    Deep learning for deep waters: An expert-in-the-loop machine learning framework for marine sciences

    Get PDF
    Driven by the unprecedented availability of data, machine learning has become a pervasive and transformative technology across industry and science. Its importance to marine science has been codified as one goal of the UN Ocean Decade. While increasing amounts of, for example, acoustic marine data are collected for research and monitoring purposes, and machine learning methods can achieve automatic processing and analysis of acoustic data, they require large training datasets annotated or labelled by experts. Consequently, addressing the relative scarcity of labelled data is, besides increasing data analysis and processing capacities, one of the main thrust areas. One approach to address label scarcity is the expert-in-the-loop approach which allows analysis of limited and unbalanced data efficiently. Its advantages are demonstrated with our novel deep learning-based expert-in-the-loop framework for automatic detection of turbulent wake signatures in echo sounder data. Using machine learning algorithms, such as the one presented in this study, greatly increases the capacity to analyse large amounts of acoustic data. It would be a first step in realising the full potential of the increasing amount of acoustic data in marine sciences

    Spatiotemporal anomaly detection: streaming architecture and algorithms

    Get PDF
    Includes bibliographical references.2020 Summer.Anomaly detection is the science of identifying one or more rare or unexplainable samples or events in a dataset or data stream. The field of anomaly detection has been extensively studied by mathematicians, statisticians, economists, engineers, and computer scientists. One open research question remains the design of distributed cloud-based architectures and algorithms that can accurately identify anomalies in previously unseen, unlabeled streaming, multivariate spatiotemporal data. With streaming data, time is of the essence, and insights are perishable. Real-world streaming spatiotemporal data originate from many sources, including mobile phones, supervisory control and data acquisition enabled (SCADA) devices, the internet-of-things (IoT), distributed sensor networks, and social media. Baseline experiments are performed on four (4) non-streaming, static anomaly detection multivariate datasets using unsupervised offline traditional machine learning (TML), and unsupervised neural network techniques. Multiple architectures, including autoencoders, generative adversarial networks, convolutional networks, and recurrent networks, are adapted for experimentation. Extensive experimentation demonstrates that neural networks produce superior detection accuracy over TML techniques. These same neural network architectures can be extended to process unlabeled spatiotemporal streaming using online learning. Space and time relationships are further exploited to provide additional insights and increased anomaly detection accuracy. A novel domain-independent architecture and set of algorithms called the Spatiotemporal Anomaly Detection Environment (STADE) is formulated. STADE is based on federated learning architecture. STADE streaming algorithms are based on a geographically unique, persistently executing neural networks using online stochastic gradient descent (SGD). STADE is designed to be pluggable, meaning that alternative algorithms may be substituted or combined to form an ensemble. STADE incorporates a Stream Anomaly Detector (SAD) and a Federated Anomaly Detector (FAD). The SAD executes at multiple locations on streaming data, while the FAD executes at a single server and identifies global patterns and relationships among the site anomalies. Each STADE site streams anomaly scores to the centralized FAD server for further spatiotemporal dependency analysis and logging. The FAD is based on recent advances in DNN-based federated learning. A STADE testbed is implemented to facilitate globally distributed experimentation using low-cost, commercial cloud infrastructure provided by Microsoft™. STADE testbed sites are situated in the cloud within each continent: Africa, Asia, Australia, Europe, North America, and South America. Communication occurs over the commercial internet. Three STADE case studies are investigated. The first case study processes commercial air traffic flows, the second case study processes global earthquake measurements, and the third case study processes social media (i.e., Twitter™) feeds. These case studies confirm that STADE is a viable architecture for the near real-time identification of anomalies in streaming data originating from (possibly) computationally disadvantaged, geographically dispersed sites. Moreover, the addition of the FAD provides enhanced anomaly detection capability. Since STADE is domain-independent, these findings can be easily extended to additional application domains and use cases

    Unveiling the frontiers of deep learning: innovations shaping diverse domains

    Full text link
    Deep learning (DL) enables the development of computer models that are capable of learning, visualizing, optimizing, refining, and predicting data. In recent years, DL has been applied in a range of fields, including audio-visual data processing, agriculture, transportation prediction, natural language, biomedicine, disaster management, bioinformatics, drug design, genomics, face recognition, and ecology. To explore the current state of deep learning, it is necessary to investigate the latest developments and applications of deep learning in these disciplines. However, the literature is lacking in exploring the applications of deep learning in all potential sectors. This paper thus extensively investigates the potential applications of deep learning across all major fields of study as well as the associated benefits and challenges. As evidenced in the literature, DL exhibits accuracy in prediction and analysis, makes it a powerful computational tool, and has the ability to articulate itself and optimize, making it effective in processing data with no prior training. Given its independence from training data, deep learning necessitates massive amounts of data for effective analysis and processing, much like data volume. To handle the challenge of compiling huge amounts of medical, scientific, healthcare, and environmental data for use in deep learning, gated architectures like LSTMs and GRUs can be utilized. For multimodal learning, shared neurons in the neural network for all activities and specialized neurons for particular tasks are necessary.Comment: 64 pages, 3 figures, 3 table

    Improving Engagement Assessment by Model Individualization and Deep Learning

    Get PDF
    This dissertation studies methods that improve engagement assessment for pilots. The major work addresses two challenging problems involved in the assessment: individual variation among pilots and the lack of labeled data for training assessment models. Task engagement is usually assessed by analyzing physiological measurements collected from subjects who are performing a task. However, physiological measurements such as Electroencephalography (EEG) vary from subject to subject. An assessment model trained for one subject may not be applicable to other subjects. We proposed a dynamic classifier selection algorithm for model individualization and compared it to other two methods: base line normalization and similarity-based model replacement. Experimental results showed that baseline normalization and dynamic classifier selection can significantly improve cross-subject engagement assessment. For complex tasks such as piloting an air plane, labeling engagement levels for pilots is challenging. Without enough labeled data, it is very difficult for traditional methods to train valid models for effective engagement assessment. This dissertation proposed to utilize deep learning models to address this challenge. Deep learning models are capable of learning valuable feature hierarchies by taking advantage of both labeled and unlabeled data. Our results showed that deep models are better tools for engagement assessment when label information is scarce. To further verify the power of deep learning techniques for scarce labeled data, we applied the deep learning algorithm to another small size data set, the ADNI data set. The ADNI data set is a public data set containing MRI and PET scans of Alzheimer\u27s Disease (AD) patients for AD diagnosis. We developed a robust deep learning system incorporating dropout and stability selection techniques to identify the different progression stages of AD patients. The experimental results showed that deep learning is very effective in AD diagnosis. In addition, we studied several imbalance learning techniques that are useful when data is highly unbalanced, i.e., when majority classes have many more training samples than minority classes. Conventional machine learning techniques usually tend to classify all data samples into majority classes and to perform poorly for minority classes. Unbalanced learning techniques can balance data sets before training and can improve learning performance

    Learning Sensory Representations with Minimal Supervision

    Get PDF
    corecore