168 research outputs found

    Speaker Identification for Swiss German with Spectral and Rhythm Features

    Get PDF
    We present results of speech rhythm analysis for automatic speaker identification. We expand previous experiments using similar methods for language identification. Features describing the rhythmic properties of salient changes in signal components are extracted and used in an speaker identification task to determine to which extent they are descriptive of speaker variability. We also test the performance of state-of-the-art but simple-to-extract frame-based features. The paper focus is the evaluation on one corpus (swiss german, TEVOID) using support vector machines. Results suggest that the general spectral features can provide very good performance on this dataset, whereas the rhythm features are not as successful in the task, indicating either the lack of suitability for this task or the dataset specificity

    Music Structure Boundaries Estimation Using Multiple Self-Similarity Matrices as Input Depth of Convolutional Neural Networks

    Get PDF
    International audienceIn this paper, we propose a new representation as input of a Convolutional Neural Network with the goal of estimating music structure boundaries. For this task, previous works used a network performing the late-fusion of a Mel-scaled log-magnitude spectrogram and a self-similarity-lag-matrix. We propose here to use the square-sub-matrices centered on the main diagonals of several self-similarity-matrices, each one representing a different audio descriptors. We propose to combine them using the depth of the input layer. We show that this representation improves the results over the use of the self-similarity-lag-matrix. We also show that using the depth of the input layer provide a convenient way for early fusion of audio representations

    Gaussian Framework for Interference Reduction in Live Recordings

    Get PDF
    Here typical live full-length music recordings are considered. In this scenarios, some instrumental voices are captured by microphones intended to other voices, leading to so-called “interferences”. Reducing this phenomenon is desirable because it opens new possibilities for sound engineers and also it has been proven that it increase performances of music analysis and processing tools (e.g. pitch tracking). In this work we propose an fast NMF-based algorithm to solve this problem.ope

    Gaussian framework for interference reduction in live recordings

    Get PDF
    International audienceIn live multitrack recordings, each voice is usually captured by dedicated close microphones. Unfortunately, it is also captured in practice by other microphones intended for other sources, leading to so-called " interferences ". Reducing this interference is desirable because it opens new perspectives for the engineering of live recordings. Hence, it has been the topic of recent research in audio processing. In this paper, we show how a Gaussian probabilistic framework may be set up for obtaining good isolation of the target sources. Doing so, we extend several state-of-the art methods by fixing some heuristic parts of their algorithms. As we show in a perceptual evaluation on real-world multitrack live recordings, the resulting principled techniques yield improved quality

    Audiovisual Database with 360 Video and Higher-Order Ambisonics Audio for Perception, Cognition, Behavior, and QoE Evaluation Research

    Full text link
    Research into multi-modal perception, human cognition, behavior, and attention can benefit from high-fidelity content that may recreate real-life-like scenes when rendered on head-mounted displays. Moreover, aspects of audiovisual perception, cognitive processes, and behavior may complement questionnaire-based Quality of Experience (QoE) evaluation of interactive virtual environments. Currently, there is a lack of high-quality open-source audiovisual databases that can be used to evaluate such aspects or systems capable of reproducing high-quality content. With this paper, we provide a publicly available audiovisual database consisting of twelve scenes capturing real-life nature and urban environments with a video resolution of 7680x3840 at 60 frames-per-second and with 4th-order Ambisonics audio. These 360 video sequences, with an average duration of 60 seconds, represent real-life settings for systematically evaluating various dimensions of uni-/multi-modal perception, cognition, behavior, and QoE. The paper provides details of the scene requirements, recording approach, and scene descriptions. The database provides high-quality reference material with a balanced focus on auditory and visual sensory information. The database will be continuously updated with additional scenes and further metadata such as human ratings and saliency information.Comment: 6 pages, 2 figures, accepted and presented at the 2022 14th International Conference on Quality of Multimedia Experience (QoMEX). Database is publicly accessible at https://qoevave.github.io/database

    An Intelligent audio workstation in the browser

    Get PDF
    Music production is a complex process requiring skill and time to undertake. The industry has undergone a digital revolution, but unlike other industries the process has not changed. However, intelligent systems, using the semantic web and signal processing, can reduce this complexity by making certain decisions for the user with minimal interaction, saving both time and investment on the engineers’ part. This paper will outline an intelligent Digital Audio Workstation (DAW) designed for use in the browser. It outlines the architecture of the DAW with its audio engine (built on the Web Audio API), using AngularJS for the user interface and a relational database

    Reviews on Technology and Standard of Spatial Audio Coding

    Get PDF
    Market  demands  on a more impressive entertainment media have motivated for delivery of three dimensional  (3D) audio content to  home consumers  through Ultra  High  Definition  TV  (UHDTV), the next generation of TV broadcasting, where spatial  audio  coding plays  fundamental role. This paper reviews fundamental concept on spatial audio coding which includes technology, standard, and application. Basic principle of object-based audio reproduction system  will also be elaborated, compared  to  the  traditional channel-based system, to provide good understanding on this popular interactive audio reproduction system which gives end users flexibility to render  their  own preferred  audio composition.Keywords : spatial audio, audio coding, multi-channel audio signals, MPEG standard, object-based audi
    corecore