Search CORE

168 research outputs found

Speaker Identification for Swiss German with Spectral and Rhythm Features

Author: Dellwo Volker
Lykartsis Athanasios
Weinzierl Stefan
Publication venue
Publication date: 13/06/2017
Field of study

We present results of speech rhythm analysis for automatic speaker identification. We expand previous experiments using similar methods for language identification. Features describing the rhythmic properties of salient changes in signal components are extracted and used in an speaker identification task to determine to which extent they are descriptive of speaker variability. We also test the performance of state-of-the-art but simple-to-extract frame-based features. The paper focus is the evaluation on one corpus (swiss german, TEVOID) using support vector machines. Results suggest that the general spectral features can provide very good performance on this dataset, whereas the rhythm features are not as successful in the task, indicating either the lack of suitability for this task or the dataset specificity

DepositOnce

Music Structure Boundaries Estimation Using Multiple Self-Similarity Matrices as Input Depth of Convolutional Neural Networks

Author: Cohen-Hadria Alice
Peeters Geoffroy
Publication venue: HAL CCSD
Publication date: 21/06/2017
Field of study

International audienceIn this paper, we propose a new representation as input of a Convolutional Neural Network with the goal of estimating music structure boundaries. For this task, previous works used a network performing the late-fusion of a Mel-scaled log-magnitude spectrogram and a self-similarity-lag-matrix. We propose here to use the square-sub-matrices centered on the main diagonals of several self-similarity-matrices, each one representing a different audio descriptors. We propose to combine them using the depth of the input layer. We show that this representation improves the results over the use of the self-similarity-lag-matrix. We also show that using the depth of the input layer provide a convenient way for early fusion of audio representations

Gaussian Framework for Interference Reduction in Live Recordings

Author: Di Carlo Diego
Publication venue
Publication date: 08/04/2022
Field of study

Here typical live full-length music recordings are considered. In this scenarios, some instrumental voices are captured by microphones intended to other voices, leading to so-called “interferences”. Reducing this phenomenon is desirable because it opens new possibilities for sound engineers and also it has been proven that it increase performances of music analysis and processing tools (e.g. pitch tracking). In this work we propose an fast NMF-based algorithm to solve this problem.ope

Padua Thesis and Dissertation Archive

Gaussian framework for interference reduction in live recordings

Author: Di Carlo Diego
Déguernel Ken
Liutkus Antoine
Publication venue: HAL CCSD
Publication date: 22/06/2017
Field of study

International audienceIn live multitrack recordings, each voice is usually captured by dedicated close microphones. Unfortunately, it is also captured in practice by other microphones intended for other sources, leading to so-called " interferences ". Reducing this interference is desirable because it opens new perspectives for the engineering of live recordings. Hence, it has been the topic of recent research in audio processing. In this paper, we show how a Gaussian probabilistic framework may be set up for obtaining good isolation of the target sources. Doing so, we extend several state-of-the art methods by fixing some heuristic parts of their algorithms. As we show in a perceptual evaluation on real-world multitrack live recordings, the resulting principled techniques yield improved quality

INRIA a CCSD electronic archive server

Assessing the Relevance of Onset Information for Note Tracking in Piano Music Transcription

Author: Benetos E
Inesta JM
Valero-Mas JJ
Publication venue
Publication date: 03/04/2017
Field of study

Queen Mary Research Online

Polyphonic note and instrument tracking using linear dynamical systems

Author: Benetos E
Publication venue
Publication date: 26/03/2017
Field of study

Queen Mary Research Online

Audiovisual Database with 360 Video and Higher-Order Ambisonics Audio for Perception, Cognition, Behavior, and QoE Evaluation Research

Author: Habets Emanuël A. P.
Raake Alexander
Robotham Thomas
Rummukainen Olli S.
Singla Ashutosh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/12/2022
Field of study

Research into multi-modal perception, human cognition, behavior, and attention can benefit from high-fidelity content that may recreate real-life-like scenes when rendered on head-mounted displays. Moreover, aspects of audiovisual perception, cognitive processes, and behavior may complement questionnaire-based Quality of Experience (QoE) evaluation of interactive virtual environments. Currently, there is a lack of high-quality open-source audiovisual databases that can be used to evaluate such aspects or systems capable of reproducing high-quality content. With this paper, we provide a publicly available audiovisual database consisting of twelve scenes capturing real-life nature and urban environments with a video resolution of 7680x3840 at 60 frames-per-second and with 4th-order Ambisonics audio. These 360 video sequences, with an average duration of 60 seconds, represent real-life settings for systematically evaluating various dimensions of uni-/multi-modal perception, cognition, behavior, and QoE. The paper provides details of the scene requirements, recording approach, and scene descriptions. The database provides high-quality reference material with a balanced focus on auditory and visual sensory information. The database will be continuously updated with additional scenes and further metadata such as human ratings and saliency information.Comment: 6 pages, 2 figures, accepted and presented at the 2022 14th International Conference on Quality of Multimedia Experience (QoMEX). Database is publicly accessible at https://qoevave.github.io/database

arXiv.org e-Print Archive

An Intelligent audio workstation in the browser

Author: Jillings Nicholas
Stables Ryan
Publication venue
Publication date: 01/01/2017
Field of study

Music production is a complex process requiring skill and time to undertake. The industry has undergone a digital revolution, but unlike other industries the process has not changed. However, intelligent systems, using the semantic web and signal processing, can reduce this complexity by making certain decisions for the user with minimal interaction, saving both time and investment on the engineers’ part. This paper will outline an intelligent Digital Audio Workstation (DAW) designed for use in the browser. It outlines the architecture of the DAW with its audio engine (built on the Web Audio API), using AngularJS for the user interface and a relational database

Birmingham City University Open Access Repository

BCU Open Access

Queen Mary Research Online

Reviews on Technology and Standard of Spatial Audio Coding

Author: Elfitri Ikhwana
Luthfi Amirul
Publication venue: 'Perpustakaan Universitas Andalas'
Publication date: 16/03/2017
Field of study

Market demands on a more impressive entertainment media have motivated for delivery of three dimensional (3D) audio content to home consumers through Ultra High Definition TV (UHDTV), the next generation of TV broadcasting, where spatial audio coding plays fundamental role. This paper reviews fundamental concept on spatial audio coding which includes technology, standard, and application. Basic principle of object-based audio reproduction system will also be elaborated, compared to the traditional channel-based system, to provide good understanding on this popular interactive audio reproduction system which gives end users flexibility to render their own preferred audio composition.Keywords : spatial audio, audio coding, multi-channel audio signals, MPEG standard, object-based audi

Jurnal Nasional Teknik Elektro