5 research outputs found

    Machine listening techniques as a complement to video image analysis in forensics

    Get PDF
    International audienceVideo is now one of the major sources of information for forensics. However, video documents can be originating from various recording devices (CCTV, mobile devices. . . ) with inconsistent quality and can sometimes be recorded in challenging light or motion conditions. Therefore, the amount of information that can be extracted relying solely on video image can vary to a great extent. Most of the videos however generally include audio recording as well. Machine listening can then become a valuable complement to video image analysis in challenging scenarios. In this paper, the authors present a brief overview of some machine listening techniques and their application to the analysis of video documents for forensics. The applicability of these techniques to forensics problems is then discussed in the light of machine listening system performances

    Training Sound Event Detection On A Heterogeneous Dataset

    Get PDF
    Training a sound event detection algorithm on a heterogeneous dataset including both recorded and synthetic soundscapes that can have various labeling granularity is a non-trivial task that can lead to systems requiring several technical choices. These technical choices are often passed from one system to another without being questioned. We propose to perform a detailed analysis of DCASE 2020 task 4 sound event detection baseline with regards to several aspects such as the type of data used for training, the parameters of the mean-teacher or the transformations applied while generating the synthetic soundscapes. Some of the parameters that are usually used as default are shown to be sub-optimal

    Training Sound Event Detection On A Heterogeneous Dataset

    Get PDF
    International audienceTraining a sound event detection algorithm on a heterogeneous dataset including both recorded and synthetic soundscapes that can have various labeling granularity is a non-trivial task that can lead to systems requiring several technical choices. These technical choices are often passed from one system to another without being questioned. We propose to perform a detailed analysis of DCASE 2020 task 4 sound event detection baseline with regards to several aspects such as the type of data used for training, the parameters of the mean-teacher or the transformations applied while generating the synthetic soundscapes. Some of the parameters that are usually used as default are shown to be sub-optimal

    Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification

    Get PDF
    International audienceIn this paper, we study the usefulness of various matrix factorization methods for learning features to be used for the specific Acoustic Scene Classification problem. A common way of addressing ASC has been to engineer features capable of capturing the specificities of acoustic environments. Instead, we show that better representations of the scenes can be automatically learned from time-frequency representations using matrix factorization techniques. We mainly focus on extensions including sparse, kernel-based, convolutive and a novel supervised dictionary learning variant of Principal Component Analysis and Nonnegative Matrix Factorization. An experimental evaluation is performed on two of the largest ASC datasets available in order to compare and discuss the usefulness of these methods for the task. We show that the unsupervised learning methods provide better representations of acoustic scenes than the best conventional hand-crafted features on both datasets. Furthermore, the introduction of a novel nonnegative supervised matrix factorization model and Deep Neural networks trained on spectrograms, allow us to reach further improvements

    Characterization of Ambient Noise

    Get PDF
    An Air Force sponsor is interested in improving an acoustic detection model by providing better estimates on how to characterize the background noise of various environments. This would inform decision makers on the probability of acoustic detection of different systems of interest given different levels of noise. Data mining and statistical learning techniques are applied to a National Park Service acoustic summary data set to find overall trends over varying environments. Linear regression, conditional inference trees, and random forest techniques are discussed. Findings indicate only sixteen geospatial variables at different resolutions are necessary to characterize the first ten â…“ octave band frequencies of the L90 band using just the linear regression. The accuracy of the regression model is within 2 to 6 decibels and depends on the frequency of interest. This research is the first of its kind to apply multiple linear regression and a conditional inference tree to the national park service acoustic dataset for insights on predicting noise levels with dramatically less variables than needed in random forest algorithms. Recommended next steps are to supplement the national park service dataset with more geographic information system variables in common global databases, not unique to the United States
    corecore