4 research outputs found

    Towards the automatic assessment of spatial quality in the reproduced sound environment

    Get PDF
    The research in this thesis describes the creation and development of a method for the prediction of perceived spatial quality. The QESTRAL (Quality Evaluation of Spatial Transmission and Reproduction using an Artificial Listener) model is an objective evaluation model capable of accurately predicting changes to perceived spatial quality. It uses probe signals and a set of objective metrics to measure changes to low-level spatial attributes. A polynomial weighting function derived from regression analysis is used to predict data from listening tests, which employed spatial audio processes (SAPs) proven to stress those low-level attributes. A listening test method was developed for collecting listener judgements of impairments to spatial quality. This involved the creation of a novel test interface to reduce the biases inherent in other similar audio quality assessment tests. Pilot studies were undertaken which established the suitability of the method. Two large scale listening tests were conducted using 31 Tonmeister students from the Institute of Sound Recording (IoSR), University of Surrey. These tests evaluated 48 different SAPs, typically encountered in consumer sound reproduction equipment, when applied to 6 types of programme material. The tests were conducted at two listening positions to determine how perceived spatial quality was changed. Analysis of the data collected from these listening tests showed that the SAPs created a diverse range of judgements that spanned the range of the spatial quality test scale and that listening position, programme material type and listener each had a statistically significant influence upon perceived spatial quality. These factors were incorporated into a database of 308 responses used to calibrate the model. The model was calibrated using partial least-squares regression using target specifications similar to those of audio quality models created by other researchers. This resulted in five objective metrics being selected for use in the model. A method of post correction using an exponential equation was used to reduce non-linearity in the predicted results, thought to be caused by the inability of some metrics to scrutinise the highest quality SAPs. The resulting model had a correlation (r) of 0.89 and an error (RMSE) of 11.06% and performs similarly to models developed by other researchers. Statistical analysis also indicated that the model would generalise to a larger population of listeners.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Parametric stereo extension of ITU-T G.722 based on a new downmixing scheme

    No full text
    International audienceIn this paper, we present a novel frequency-domain stereo to mono downmixing, which preserves the energy of spectral components and avoids setting the left or right channel as a phase reference. Based on this downmixing technique, a parametric stereo analysis-synthesis model is described in which subband stereo parameters consist of interchannel level differences and phase differences between the mono signal and one of the stereo channels (left or right). Thismodel is applied to the stereo extension of ITU-T G.722 at 56+8 and 64+16 kbit/s with a frame length of 5ms. AB test results are provided to assess the quality of the proposed downmixing technique. In addition, the quality of the proposed G.722-based stereo coder is compared against reference coders (G.722.1 at 24 and 32kbit/s dual mono and G.722 at 64kbit/s dual mono) for clean speech, noisy speech and music

    Evaluation and modelling of perceived audio quality in popular music, towards intelligent music production

    Get PDF
    This thesis addresses three fundamental questions: What is mixing? What makes a high-quality mix? How can high-quality mixes be automatically generated? While these may seem essential to the very foundations of intelligent music production, this thesis argues that they have not been sufficiently addressed in previous studies. An important contribution is the questioning of previously-held definitions of a 'mix'. Experiments were conducted in which participants used traditional mixing interfaces to create mixes using gain, panning and equalisation. The data was analysed in a novel 'mix-space', 'panning-space' and 'tone-space' in order to determine if there is a consensus in how these tools are used. Methods were developed to create mixes by populating the mix-space according to parametric models. These mixes were characterised by signal features, the distributions of which suggest tolerance bounds for automated mixing systems. This was complemented by a study of real-world music mixes, containing hundreds of mixes each for ten songs, collected from on-line communities. Mixes were shown to vary along four dimensions: loudness/dynamics, brightness, bass and stereo width. The variations between individual mix engineers were also studied, indicating a small effect of the mix engineer on mix preference ratings (eta2 = 0.021). Perceptual audio evaluation revealed that listeners appreciate 'quality' in a variety of ways, depending on the circumstances. In commercially-released music, 'quality' was related to the loudness/dynamic dimension. In mixes, 'quality' is highly correlated with 'preference'. To create mixes which maximised perceived quality, a novel semi-automatic mixing system was developed using evolutionary computation, wherein a population of mixes, generated in the mix-space, is guided by the subjective evaluations of the listener. This system was evaluated by a panel of users, who used it to create their ideal mixes, rather than the technically-correct mixes which previous systems strived for. It is hoped that this thesis encourages the community to pursue subjectively motivated methods when designing systems for music-mixing
    corecore