Search CORE

75 research outputs found

Gramophone noise detection and reconstruction using time delay artificial neural networks

Author: Engelbrecht Andries P.
Stallmann Christoph F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2017
Field of study

Gramophone records were the main recording medium for more than seven decades and regained widespread popularity over the past several years. Being an analog storage medium, gramophone records are subject to distortions caused by scratches, dust particles, degradation, and other means of improper handling. The observed noise often leads to an unpleasant listening experience and requires a filtering process to remove the unwanted disruptions and improve the audio quality. This paper proposes a novel approach that employs various feed forward time delay artificial neural networks to detect and reconstruct noise in musical sound waves. A set of 800 songs from eight different genres were used to validate the performance of the neural networks. The performance was analyzed according to the outlier detection and interpolation accuracy, the computational time and the tradeoff between the accuracy and the time. The empirical results of both detection and reconstruction neural networks were compared to a number of other algorithms, including various statistical measurements, duplication approaches, trigonometric processes, polynomials, and time series models. It was found that the neural networks' outlier detection accuracy was slightly lower than some of the other noise identification algorithms, but achieved a more efficient tradeoff by detecting most of the noise in real time. The reconstruction process favored neural networks with an increase in the interpolation accuracy compared to other widely used time series models. It was also found that certain genres such as classical, country, and jazz music were interpolated more accurately. Volatile signals, such as electronic, metal, and pop music were more challenging to reconstruct and were substantially better interpolated using neural networks than the other examined algorithms.http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6221021hj2017Computer Scienc

Crossref

UPSpace at the University of Pretoria

Development of a Speech Quality Database Under Uncontrolled Conditions

Author: 21st Annual Conference of the International Speech Communication Association (INTERSPEECH 2020)
Benetos E
Hines A
Ragano A
Publication venue
Publication date: 25/10/2020
Field of study

Objective audio quality assessment is preferred to avoid time-consuming and costly listening tests. The development of objective quality metrics depends on the availability of datasets appropriate to the application under study. Currently, a suitable human-annotated dataset for developing quality metrics in archive audio is missing. Given the online availability of archival recordings, we propose to develop a real-world audio quality dataset. We present a methodology used to curate a speech quality database using the archive recordings from the Apollo Space Program. The proposed procedure is based on two steps: a pilot listening test and an exploratory data analysis. The pilot listening test shows that we can extract audio clips through the control of speech-to-text performance metrics to prevent data repetition. Through unsupervised exploratory data analysis, we explore the characteristics of the degradations. We classify distinct degradations and we study spectral, intensity, tonality and overall quality properties of the data through clustering techniques. These results provide the necessary foundation to support the subsequent development of large-scale crowdsourced datasets for audio quality

Queen Mary Research Online

A Multiple Hidden Layers Extreme Learning Machine Method and Its Application

Author: Beijing Li
Dong Xiao
Yachun Mao
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Crossref

Drawing, Handwriting Processing Analysis: New Advances and Challenges

Author: Anquetil Eric
Prevost Lionel
Rémi Céline
Publication venue: HAL CCSD
Publication date: 21/06/2015
Field of study

International audienceDrawing and handwriting are communicational skills that are fundamental in geopolitical, ideological and technological evolutions of all time. drawingand handwriting are still useful in defining innovative applications in numerous fields. In this regard, researchers have to solve new problems like those related to the manner in which drawing and handwriting become an efficient way to command various connected objects; or to validate graphomotor skills as evident and objective sources of data useful in the study of human beings, their capabilities and their limits from birth to decline

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Recommended from our members

Signal separation of musical instruments: simulation-based methods for musical signal decomposition and transcription

Author: Walmsley Paul Jospeh
Publication venue: University of Cambridge
Publication date: 29/05/2001
Field of study

This thesis presents techniques for the modelling of musical signals, with particular regard to monophonic and polyphonic pitch estimation. Musical signals are modelled as a set of notes, each comprising of a set of harmonically-related sinusoids. An hierarchical model is presented that is very general and applicable to any signal that can be decomposed as the sum of basis functions. Parameter estimation is posed within a Bayesian framework, allowing for the incorporation of prior information about model parameters. The resulting posterior distribution is of variable dimension and so reversible jump MCMC simulation techniques are employed for the parameter estimation task. The extension of the model to time-varying signals with high posterior correlations between model parameters is described. The parameters and hyperparameters of several frames of data are estimated jointly to achieve a more robust detection. A general model for the description of time-varying homogeneous and heterogeneous multiple component signals is developed, and then applied to the analysis of musical signals. The importance of high level musical and perceptual psychological knowledge in the formulation of the model is highlighted, and attention is drawn to the limitation of pure signal processing techniques for dealing with musical signals. Gestalt psychological grouping principles motivate the hierarchical signal model, and component identifiability is considered in terms of perceptual streaming where each component establishes its own context. A major emphasis of this thesis is the practical application of MCMC techniques, which are generally deemed to be too slow for many applications. Through the design of efficient transition kernels highly optimised for harmonic models, and by careful choice of assumptions and approximations, implementations approaching the order of realtime are viable.Engineering and Physical Sciences Research Counci

Apollo (Cambridge)

Probabilistic characterization and synthesis of complex driven systems

Author: Schoner Bernd, 1969-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2000
Field of study

Thesis (Ph.D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2000.Includes bibliographical references (leaves 194-204).Real-world systems that have characteristic input-output patterns but don't provide access to their internal states are as numerous as they are difficult to model. This dissertation introduces a modeling language for estimating and emulating the behavior of such systems given time series data. As a benchmark test, a digital violin is designed from observing the performance of an instrument. Cluster-weighted modeling (CWM), a mixture density estimator around local models, is presented as a framework for function approximation and for the prediction and characterization of nonlinear time series. The general model architecture and estimation algorithm are presented and extended to system characterization tools such as estimator uncertainty, predictor uncertainty and the correlation dimension of the data set. Furthermore a real-time implementation, a Hidden-Markov architecture, and function approximation under constraints are derived within the framework. CWM is then applied in the context of different problems and data sets, leading to architectures such as cluster-weighted classification, cluster-weighted estimation, and cluster-weighted sampling. Each application relies on a specific data representation, specific pre and post-processing algorithms, and a specific hybrid of CWM. The third part of this thesis introduces data-driven modeling of acoustic instruments, a novel technique for audio synthesis. CWM is applied along with new sensor technology and various audio representations to estimate models of violin-family instruments. The approach is demonstrated by synthesizing highly accurate violin sounds given off-line input data as well as cello sounds given real-time input data from a cello player.by Bernd Schoner.Ph.D

DSpace@MIT

Grafting Acoustic Instruments and Signal Processing: Creative Control and Augmented Expressivity

Author: Freed Adrian
Overholt Daniel
Publication venue
Publication date: 01/01/2013
Field of study

VBN

Blind dereverberation of speech from moving and stationary speakers using sequential Monte Carlo methods

Author: Evers Christine
Publication venue: The University of Edinburgh
Publication date: 01/01/2010
Field of study

Speech signals radiated in confined spaces are subject to reverberation due to reflections of surrounding walls and obstacles. Reverberation leads to severe degradation of speech intelligibility and can be prohibitive for applications where speech is digitally recorded, such as audio conferencing or hearing aids. Dereverberation of speech is therefore an important field in speech enhancement. Driven by consumer demand, blind speech dereverberation has become a popular field in the research community and has led to many interesting approaches in the literature. However, most existing methods are dictated by their underlying models and hence suffer from assumptions that constrain the approaches to specific subproblems of blind speech dereverberation. For example, many approaches limit the dereverberation to voiced speech sounds, leading to poor results for unvoiced speech. Few approaches tackle single-sensor blind speech dereverberation, and only a very limited subset allows for dereverberation of speech from moving speakers. Therefore, the aim of this dissertation is the development of a flexible and extendible framework for blind speech dereverberation accommodating different speech sound types, single- or multiple sensor as well as stationary and moving speakers. Bayesian methods benefit from – rather than being dictated by – appropriate model choices. Therefore, the problem of blind speech dereverberation is considered from a Bayesian perspective in this thesis. A generic sequential Monte Carlo approach accommodating a multitude of models for the speech production mechanism and room transfer function is consequently derived. In this approach both the anechoic source signal and reverberant channel are estimated using their optimal estimators by means of Rao-Blackwellisation of the state-space of unknown variables. The remaining model parameters are estimated using sequential importance resampling. The proposed approach is implemented for two different speech production models for stationary speakers, demonstrating substantial reduction in reverberation for both unvoiced and voiced speech sounds. Furthermore, the channel model is extended to facilitate blind dereverberation of speech from moving speakers. Due to the structure of measurement model, single- as well as multi-microphone processing is facilitated, accommodating physically constrained scenarios where only a single sensor can be used as well as allowing for the exploitation of spatial diversity in scenarios where the physical size of microphone arrays is of no concern. This dissertation is concluded with a survey of possible directions for future research, including the use of switching Markov source models, joint target tracking and enhancement, as well as an extension to subband processing for improved computational efficiency

Edinburgh Research Archive

Exploring visual representation of sound in computer music software through programming and composition

Author: Freeman Samuel David
Publication venue
Publication date
Field of study

Presented through contextualisation of the portfolio works are developments of a practice in which the acts of programming and composition are intrinsically connected. This practice-based research (conducted 2009–2013) explores visual representation of sound in computer music software. Towards greater understanding of composing with the software medium, initial questions are taken as stimulus to explore the subject through artistic practice and critical thinking. The project begins by asking: How might the ways in which sound is visually represented influence the choices that are made while those representations are being manipulated and organised as music? Which aspects of sound are represented visually, and how are those aspects shown? Recognising sound as a psychophysical phenomenon, the physical and psychological aspects of aesthetic interest to my work are identified. Technological factors of mediating these aspects for the interactive visual-domain of software are considered, and a techno-aesthetic understanding developed. Through compositional studies of different approaches to the problem of looking at sound in software, on screen, a number of conceptual themes emerge in this work: the idea of software as substance, both as a malleable material (such as in live coding), and in terms of outcome artefacts; the direct mapping between audio data and screen pixels; the use of colour that maintains awareness of its discrete (as opposed to continuous) basis; the need for integrated display of parameter controls with their target data; and the tildegraph concept that began as a conceptual model of a gramophone and which is a spatio-visual sound synthesis technique related to wave terrain synthesis. The spiroid-frequency-space representation is introduced, contextualised, and combined both with those themes and a bespoke geometrical drawing system (named thisis), to create a new modular computer music software environment named sdfsys

University of Huddersfield Repository

Advanced Information Systems and Technologies

Author
Publication venue: 'Sumy State University'
Publication date: 01/01/2017
Field of study

This book comprises the proceedings of the V International Scientific Conference "Advanced Information Systems and Technologies, AIST-2017". The proceeding papers cover issues related to system analysis and modeling, project management, information system engineering, intelligent data processing computer networking and telecomunications. They will be useful for students, graduate students, researchers who interested in computer science

Electronic Sumy State University Institutional Repository