1,412 research outputs found

    Deep Learning for Audio Signal Processing

    Full text link
    Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

    Vulnerable road users and connected autonomous vehicles interaction: a survey

    Get PDF
    There is a group of users within the vehicular traffic ecosystem known as Vulnerable Road Users (VRUs). VRUs include pedestrians, cyclists, motorcyclists, among others. On the other hand, connected autonomous vehicles (CAVs) are a set of technologies that combines, on the one hand, communication technologies to stay always ubiquitous connected, and on the other hand, automated technologies to assist or replace the human driver during the driving process. Autonomous vehicles are being visualized as a viable alternative to solve road accidents providing a general safe environment for all the users on the road specifically to the most vulnerable. One of the problems facing autonomous vehicles is to generate mechanisms that facilitate their integration not only within the mobility environment, but also into the road society in a safe and efficient way. In this paper, we analyze and discuss how this integration can take place, reviewing the work that has been developed in recent years in each of the stages of the vehicle-human interaction, analyzing the challenges of vulnerable users and proposing solutions that contribute to solving these challenges.This work was partially funded by the Ministry of Economy, Industry, and Competitiveness of Spain under Grant: Supervision of drone fleet and optimization of commercial operations flight plans, PID2020-116377RB-C21.Peer ReviewedPostprint (published version

    Predictive coding and stochastic resonance as fundamental principles of auditory phantom perception

    Get PDF
    Mechanistic insight is achieved only when experiments are employed to test formal or computational models. Furthermore, in analogy to lesion studies, phantom perception may serve as a vehicle to understand the fundamental processing principles underlying healthy auditory perception. With a special focus on tinnitus—as the prime example of auditory phantom perception—we review recent work at the intersection of artificial intelligence, psychology and neuroscience. In particular, we discuss why everyone with tinnitus suffers from (at least hidden) hearing loss, but not everyone with hearing loss suffers from tinnitus. We argue that intrinsic neural noise is generated and amplified along the auditory pathway as a compensatory mechanism to restore normal hearing based on adaptive stochastic resonance. The neural noise increase can then be misinterpreted as auditory input and perceived as tinnitus. This mechanism can be formalized in the Bayesian brain framework, where the percept (posterior) assimilates a prior prediction (brain’s expectations) and likelihood (bottom-up neural signal). A higher mean and lower variance (i.e. enhanced precision) of the likelihood shifts the posterior, evincing a misinterpretation of sensory evidence, which may be further confounded by plastic changes in the brain that underwrite prior predictions. Hence, two fundamental processing principles provide the most explanatory power for the emergence of auditory phantom perceptions: predictive coding as a top-down and adaptive stochastic resonance as a complementary bottom-up mechanism. We conclude that both principles also play a crucial role in healthy auditory perception. Finally, in the context of neuroscience-inspired artificial intelligence, both processing principles may serve to improve contemporary machine learning techniques

    Automatic music transcription: challenges and future directions

    Get PDF
    Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects

    Machine learning in marine ecology: an overview of techniques and applications

    Get PDF
    Machine learning covers a large set of algorithms that can be trained to identify patterns in data. Thanks to the increase in the amount of data and computing power available, it has become pervasive across scientific disciplines. We first highlight why machine learning is needed in marine ecology. Then we provide a quick primer on machine learning techniques and vocabulary. We built a database of ∌1000 publications that implement such techniques to analyse marine ecology data. For various data types (images, optical spectra, acoustics, omics, geolocations, biogeochemical profiles, and satellite imagery), we present a historical perspective on applications that proved influential, can serve as templates for new work, or represent the diversity of approaches. Then, we illustrate how machine learning can be used to better understand ecological systems, by combining various sources of marine data. Through this coverage of the literature, we demonstrate an increase in the proportion of marine ecology studies that use machine learning, the pervasiveness of images as a data source, the dominance of machine learning for classification-type problems, and a shift towards deep learning for all data types. This overview is meant to guide researchers who wish to apply machine learning methods to their marine datasets.Machine learning in marine ecology: an overview of techniques and applicationspublishedVersio

    The University Defence Research Collaboration In Signal Processing

    Get PDF
    This chapter describes the development of algorithms for automatic detection of anomalies from multi-dimensional, undersampled and incomplete datasets. The challenge in this work is to identify and classify behaviours as normal or abnormal, safe or threatening, from an irregular and often heterogeneous sensor network. Many defence and civilian applications can be modelled as complex networks of interconnected nodes with unknown or uncertain spatio-temporal relations. The behavior of such heterogeneous networks can exhibit dynamic properties, reflecting evolution in both network structure (new nodes appearing and existing nodes disappearing), as well as inter-node relations. The UDRC work has addressed not only the detection of anomalies, but also the identification of their nature and their statistical characteristics. Normal patterns and changes in behavior have been incorporated to provide an acceptable balance between true positive rate, false positive rate, performance and computational cost. Data quality measures have been used to ensure the models of normality are not corrupted by unreliable and ambiguous data. The context for the activity of each node in complex networks offers an even more efficient anomaly detection mechanism. This has allowed the development of efficient approaches which not only detect anomalies but which also go on to classify their behaviour
    • 

    corecore