Search CORE

125 research outputs found

Music Artist Classification with WaveNet Classifier for Raw Waveform Audio Data

Author: Gao Yongwei
Li Wei
Yu Yi
Zhang Xulong
Publication venue
Publication date: 09/04/2020
Field of study

Models for music artist classification usually were operated in the frequency domain, in which the input audio samples are processed by the spectral transformation. The WaveNet architecture, originally designed for speech and music generation. In this paper, we propose an end-to-end architecture in the time domain for this task. A WaveNet classifier was introduced which directly models the features from a raw audio waveform. The WaveNet takes the waveform as the input and several downsampling layers are subsequent to discriminate which artist the input belongs to. In addition, the proposed method is applied to singer identification. The model achieving the best performance obtains an average F1 score of 0.854 on benchmark dataset of Artist20, which is a significant improvement over the related works. In order to show the effectiveness of feature learning of the proposed method, the bottleneck layer of the model is visualized.Comment: 12 page

arXiv.org e-Print Archive

Sound event detection and localization in domestic environments using deep learning

Author: Χύτας Σωτήριος-Παναγιώτης Π.
Publication venue
Publication date: 01/01/2019
Field of study

Project Achoo: A Practical Model and Application for COVID-19 Detection from Recordings of Breath, Voice, and Cough

Author: Avetisian Manvel
Burenko Ilya
Kokh Vladimir
Malkin Elian
Nazarov Ivan
Ponomarchuk Alexander
Zhukov Leonid
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/01/2022
Field of study

The COVID-19 pandemic created a significant interest and demand for infection detection and monitoring solutions. In this paper we propose a machine learning method to quickly triage COVID-19 using recordings made on consumer devices. The approach combines signal processing methods with fine-tuned deep learning networks and provides methods for signal denoising, cough detection and classification. We have also developed and deployed a mobile application that uses symptoms checker together with voice, breath and cough signals to detect COVID-19 infection. The application showed robust performance on both open sourced datasets and on the noisy data collected during beta testing by the end users

arXiv.org e-Print Archive

Sound event detection in the DCASE 2017 Challenge

Author: Diment Aleksandr
Elizalde Benjamin
Heittola Toni
Mesaros Annamaria
Raj Bhiksha
Vincent Emmanuel
Virtanen Tuomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2019
Field of study

International audienceEach edition of the challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) contained several tasks involving sound event detection in different setups. DCASE 2017 presented participants with three such tasks, each having specific datasets and detection requirements: Task 2, in which target sound events were very rare in both training and testing data, Task 3 having overlapping events annotated in real-life audio, and Task 4, in which only weakly-labeled data was available for training. In this paper, we present the three tasks, including the datasets and baseline systems, and analyze the challenge entries for each task. We observe the popularity of methods using deep neural networks, and the still widely used mel frequency based representations, with only few approaches standing out as radically different. Analysis of the systems behavior reveals that task-specific optimization has a big role in producing good performance; however, often this optimization closely follows the ranking metric, and its maximization/minimization does not result in universally good performance. We also introduce the calculation of confidence intervals based on a jackknife resampling procedure, to perform statistical analysis of the challenge results. The analysis indicates that while the 95% confidence intervals for many systems overlap, there are significant difference in performance between the top systems and the baseline for all tasks

INRIA a CCSD electronic archive server

HAL-Rennes 1

Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018)

Author
Publication venue: Tampere University of Technology
Publication date: 01/01/2018
Field of study