1,775 research outputs found
Distortions of Subjective Time Perception Within and Across Senses
Background: The ability to estimate the passage of time is of fundamental importance for perceptual and cognitive processes. One experience of time is the perception of duration, which is not isomorphic to physical duration and can be distorted by a number of factors. Yet, the critical features generating these perceptual shifts in subjective duration are not understood.
Methodology/Findings: We used prospective duration judgments within and across sensory modalities to examine the effect of stimulus predictability and feature change on the perception of duration. First, we found robust distortions of perceived duration in auditory, visual and auditory-visual presentations despite the predictability of the feature changes in the stimuli. For example, a looming disc embedded in a series of steady discs led to time dilation, whereas a steady disc embedded in a series of looming discs led to time compression. Second, we addressed whether visual (auditory) inputs could alter the perception of duration of auditory (visual) inputs. When participants were presented with incongruent audio-visual stimuli, the perceived duration of auditory events could be shortened or lengthened by the presence of conflicting visual information; however, the perceived duration of visual events was seldom distorted by the presence of auditory information and was never perceived shorter than their actual durations.
Conclusions/Significance: These results support the existence of multisensory interactions in the perception of duration and, importantly, suggest that vision can modify auditory temporal perception in a pure timing task. Insofar as distortions in subjective duration can neither be accounted for by the unpredictability of an auditory, visual or auditory-visual event, we propose that it is the intrinsic features of the stimulus that critically affect subjective time distortions
Categorisation of distortion profiles in relation to audio quality
Since digital audio is encoded as discrete samples of the audio waveform, much can be said about a recording by the statistical properties of these samples. In this paper, a dataset of CD audio samples is analysed; the probability mass function of each audio clip informs a feature set which describes attributes of the musical recording related to loudness, dynamics and distortion. This allows musical recordings to be classified according to their “distortion character”, a concept which describes the nature of amplitude distortion in mastered audio. A subjective test was designed in which such recordings were rated according to the perception of their audio quality. It is shown that participants can discern between three different distortion characters; ratings of audio quality were significantly different (F(1; 2) = 5:72; p < 0:001; eta^2 = 0:008) as were the words used to describe the attributes on which quality was assessed (�Chi^2(8; N = 547) = 33:28; p < 0:001).This expands upon previous work showing links between the effects of dynamic range compression and audio quality in musical
recordings, by highlighting perceptual differences
Recommended from our members
Effects of sound-induced hearing loss and hearing AIDS on the perception of music
This is the final version of the article. It first appeared from the Audio Engineering Society via https://doi.org/10.17743/jaes.2015.0081Exposure to high-level music produces several physiological changes in the auditory system that lead to a variety of perceptual effects. Damage to the outer hair cells within the cochlea leads to a loss of sensitivity to weak sounds, loudness recruitment (a more rapid than normal growth of loudness with increasing sound level) and reduced frequency selectivity. Damage to inner hair cells and/or synapses leads to degeneration of neurons in the auditory nerve and to a reduced flow of information to the brain. This leads to poorer auditory discrimination and may contribute to reduced sensitivity to the temporal fine structure of sounds and to poor pitch perception. Hearing aids compensate for the effects of threshold elevation and loudness recruitment via multi-channel amplitude compression, but they do not compensate for reduced frequency selectivity or loss of inner hair cells/synapses/neurons. Multi-channel compression can impair some aspects of the perception of music, such as the ability to hear out one instrument or voice from a mixture. The limited frequency range and irregular frequency response of most hearing aids is associated with poor sound quality for music. Finally, systems for reducing acoustic feedback can have undesirable side effects when listening to music.This work was supported by the Medical Research Council (UK, grant number G0701870), Action on Hearing Loss, Phonak, and Starkey
Scalable and perceptual audio compression
This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner
Intelligent Tools for Multitrack Frequency and Dynamics Processing
PhDThis research explores the possibility of reproducing mixing decisions of a skilled audio
engineer with minimal human interaction that can improve the overall listening experience of
musical mixtures, i.e., intelligent mixing. By producing a balanced mix automatically
musician and mixing engineering can focus on their creativity while the productivity of music
production is increased. We focus on the two essential aspects of such a system, frequency
and dynamics. This thesis presents an intelligent strategy for multitrack frequency and
dynamics processing that exploit the interdependence of input audio features, incorporates
best practices in audio engineering, and driven by perceptual models and subjective criteria.
The intelligent frequency processing research begins with a spectral characteristic analysis of
commercial recordings, where we discover a consistent leaning towards a target equalization
spectrum. A novel approach for automatically equalizing audio signals towards the observed
target spectrum is then described and evaluated. We proceed to dynamics processing, and
introduce an intelligent multitrack dynamic range compression algorithm, in which various
audio features are proposed and validated to better describe the transient nature and spectral
content of the signals. An experiment to investigate the human preference on dynamic
processing is described to inform our choices of parameter automations. To provide a
perceptual basis for the intelligent system, we evaluate existing perceptual models, and
propose several masking metrics to quantify the masking behaviour within the multitrack
mixture. Ultimately, we integrate previous research on auditory masking, frequency and
dynamics processing, into one intelligent system of mix optimization that replicates the
iterative process of human mixing. Within the system, we explore the relationship between
equalization and dynamics processing, and propose a general frequency and dynamics
processing framework. Various implementations of the intelligent system are explored and
evaluated objectively and subjectively through listening experiments.China Scholarship Council
Tracking cortical entrainment in neural activity: auditory processes in human temporal cortex.
A primary objective for cognitive neuroscience is to identify how features of the sensory environment are encoded in neural activity. Current auditory models of loudness perception can be used to make detailed predictions about the neural activity of the cortex as an individual listens to speech. We used two such models (loudness-sones and loudness-phons), varying in their psychophysiological realism, to predict the instantaneous loudness contours produced by 480 isolated words. These two sets of 480 contours were used to search for electrophysiological evidence of loudness processing in whole-brain recordings of electro- and magneto-encephalographic (EMEG) activity, recorded while subjects listened to the words. The technique identified a bilateral sequence of loudness processes, predicted by the more realistic loudness-sones model, that begin in auditory cortex at ~80 ms and subsequently reappear, tracking progressively down the superior temporal sulcus (STS) at lags from 230 to 330 ms. The technique was then extended to search for regions sensitive to the fundamental frequency (F0) of the voiced parts of the speech. It identified a bilateral F0 process in auditory cortex at a lag of ~90 ms, which was not followed by activity in STS. The results suggest that loudness information is being used to guide the analysis of the speech stream as it proceeds beyond auditory cortex down STS toward the temporal pole.This work was supported by an EPSRC grant to William D.
Marslen-Wilson and Paula Buttery (EP/F030061/1), an ERC
Advanced Grant (Neurolex) to William D. Marslen-Wilson,
and by MRC Cognition and Brain Sciences Unit (CBU) funding
to William D. Marslen-Wilson (U.1055.04.002.00001.01).
Computing resources were provided by the MRC-CBU and the
University of Cambridge High Performance Computing Service
(http://www.hpc.cam.ac.uk/). Andrew Liu and Phil Woodland
helped with the HTK speech recogniser and Russell Thompson
with the Matlab code. We thank Asaf Bachrach, Cai Wingfield,
Isma Zulfiqar, Alex Woolgar, Jonathan Peelle, Li Su, Caroline
Whiting, Olaf Hauk, Matt Davis, Niko Kriegeskorte, Paul Wright,
Lorraine Tyler, Rhodri Cusack, Brian Moore, Brian Glasberg, Rik
Henson, Howard Bowman, Hideki Kawahara, and Matti Stenroos
for invaluable support and suggestions.This is the final published version. The article was originally published in Frontiers in Computational Neuroscience, 10 February 2015 | doi: 10.3389/fncom.2015.0000
Applications of loudness models in audio engineering
This thesis investigates the application of perceptual models to areas of audio engineering, with a particular focus on music production. The goal was to establish efficient and practical tools for the measurement and control of the perceived loudness of musical sounds. Two types of loudness model were investigated: the single-band model and the multiband excitation pattern (EP) model. The heuristic single-band devices were designed to be simple but sufficiently effective for real-world application, whereas the multiband procedures were developed to give a reasonable account of a large body of psychoacoustic findings according to a functional model of the peripheral hearing system. The research addresses the extent to which current models of loudness generalise to musical instruments, and whether can they be successfully employed in music applications. The domain-specific disparity between the two types of model was first tackled by reducing the computational load of state-of-the-art EP models to allow for fast but low-error auditory signal processing. Two elaborate hearing models were analysed and optimised using musical instruments and speech as test stimuli. It was shown that, after significantly reducing the complexity of both procedures, estimates of global loudness, such as peak loudness, as well as the intermediate auditory representations can be preserved with high accuracy. Based on the optimisations, two real-time applications were developed: a binaural loudness meter and an automatic multitrack mixer. This second system was designed to work independently of the loudness measurement procedure, and therefore supports both linear and nonlinear models. This allowed for a single mixing device to be assessed using different loudness metrics and this was demonstrated by evaluating three configurations through subjective assessment. Unexpectedly, when asked to rate both the overall quality of a mix and the degree to which instruments were equally loud, listeners preferred mixes generated using heuristic single-band models over those produced using a multiband procedure. A series of more systematic listening tests were conducted to further investigate this finding. Subjective loudness matches of musical instruments commonly found in western popular music were collected to evaluate the performance of five published models. The results were in accord with the application-based assessment, namely that current EP procedures do not generalise well when estimating the relative loudness of musical sounds which have marked differences in spectral content. Model specific issues were identified relating to the calculation of spectral loudness summation (SLS) and the method used to determine the global-loudness percept of time-varying musical sounds; associated refinements were proposed. It was shown that a new multiband loudness model with a heuristic loudness transformation yields superior performance over existing methods. This supports the idea that a revised model of SLS is needed, and therefore that modification to this stage in existing psychoacoustic procedures is an essential step towards the goal of achieving real-world deployment
- …