286 research outputs found

    Hybrid Multiresolution Analysis Of ‘Punch’ In Musical Signals

    Get PDF
    This paper presents a hybrid multi-resolution technique for the extraction and measurement of attributes contained within a musical signal. Decomposing music into simpler percussive, harmonic and noise components is useful when detailed extraction of signal attributes is required. The key parameter of interest in this paper is that of punch. A methodology is explored that decomposes the musical signal using a critically sampled constant-Q filterbank of quadrature mirror filters (QMF) before adaptive windowed short term Fourier transforms (STFT). The proposed hybrid method offers accuracy in both the time and frequency domains. Following the decomposition transform process, attributes are analyzed. It is shown that analysis of these components may yield parameters that would be of use in both mixing/mastering and also audio transcription and retrieval

    Speech Decomposition and Enhancement

    Get PDF
    The goal of this study is to investigate the roles of steady-state speech sounds and transitions between these sounds in the intelligibility of speech. The motivation for this approach is that the auditory system may be particularly sensitive to time-varying frequency edges, which in speech are produced primarily by transitions between vowels and consonants and within vowels. The possibility that selectively amplifying these edges may enhance speech intelligibility is examined. Computer algorithms to decompose speech into two different components were developed. One component, which is defined as a tonal component, was intended to predominately include formant activity. The second component, which is defined as a non-tonal component, was intended to predominately include transitions between and within formants.The approach to the decomposition is to use a set of time-varying filters whose center frequencies and bandwidths are controlled to identify the strongest formant components in speech. Each center frequency and bandwidth is estimated based on FM and AM information of each formant component. The tonal component is composed of the sum of the filter outputs. The non-tonal component is defined as the difference between the original speech signal and the tonal component.The relative energy and intelligibility of the tonal and non-tonal components were compared to the original speech. Psychoacoustic growth functions were used to assess the intelligibility. Most of the speech energy was in the tonal component, but this component had a significantly lower maximum word recognition than the original and non-tonal component had. The non-tonal component averaged 2% of the original speech energy, but this component had almost equal maximum word recognition as the original speech. The non-tonal component was amplified and recombined with the original speech to generate enhanced speech. The energy of the enhanced speech was adjusted to be equal to the original speech, and the intelligibility of the enhanced speech was compared to the original speech in background noise. The enhanced speech showed higher recognition scores at lower SNRs, and the differences were significant. The original and enhanced speech showed similar recognition scores at higher SNRs. These results suggest that amplification of transient information can enhance the speech in noise and this enhancement method is more effective at severe noise conditions

    EFFICACY OF THREE BACKWARD MASKING SIGNALS

    Get PDF
    Increased backward masking has been correlated with Auditory Processing Disorders (APD). An efficacious test of the backward masking function that is compatible with naïve listeners could have clinical utility in diagnosing APDs. In order to determine an appropriate probe for such a test, three 20-ms signal-types were compared for ease-of-task. Response times (RT) were taken as a proxy for ease-of-task. Seven participants used a method-of-adjustment to track threshold in the presence of a 50-ms broadband-Gausian-noise backward-masker. The signal-types yielded two comparisons: Linear rise-fall on a 1000Hz sine-wave versus a “chirp” (750 Hz-4000Hz); Linear rise-fall vs Blackman gating function on a 1000Hz sine-wave. The results suggest that signal-type is a significant factor in participant response time and hence, confidence. Moreover, the contribution of signal-type to RT is not confounded by any potential interaction terms, such as inter-stimulus interval (ISI). The signal-type that yielded the quickest RTs across all participants, ISIs, and intensity levels was the 20-ms, 1000 Hz sine-wave fitted with a trapezoidal gating function. This may be the most efficacious signal-type to serve as a probe in a clinical test of backward masking

    Hearing VS. Listening: Attention Changes the Neural Representations of Auditory Percepts

    Get PDF
    Making sense of acoustic environments is a challenging task. At any moment, the signals from distinct auditory sources arrive in the ear simultaneously, forming an acoustic mixture. The brain must represent distinct auditory objects in this complex scene and prioritize processing of relevant stimuli while maintaining the capability to react quickly to unexpected events. The present studies explore neural representations of temporal modulations and the effects of attention on these representations. Temporal modulation plays a significant role in speech perception and auditory scene analysis. To uncover how temporal modulations are processed and represented is potentially of great importance for our general understanding of the auditory system. Neural representations of compound modulations were investigated by magnetoencephalography (MEG). Interaction components are generated by near rather than distant modulation rhythms, suggesting band-limited modulation filter banks operating in the central stage of the auditory system. Furthermore, the slowest detectable neural oscillation in the auditory cortex corresponds to the perceived oscillation of the auditory percept. Interactions between stimulus-evoked and goal-related neural responses were investigated in simultaneous behavioral-neurophysiological studies, in which we manipulate subjects' attention to different components of an auditory scene. Our experimental results reveal that attention to the target correlates with a sustained increase in the neural target representation, beyond well-known transient effects. The enhancement of power and phase coherence presumably reflects increased local and global synchronizations in the brain. Furthermore, the target's perceptual detectability improves over time (several seconds), correlating strongly with the target representation's neural buildup. The change in cortical representations can be reversed in a short time-scale (several minutes) by various behavioral goals. These aforementioned results demonstrate that the neural representation of the percept is encoded using the feature-driven mechanisms of sensory cortex, but shaped in a sustained manner via attention-driven projections from higher-level areas. This adaptive neural representations occur on multiple time scales (seconds vs. minutes) and multiple spatial scales (local vs. global synchronization). Such multiple resolutions of adaptation may underlie general mechanisms of scene organization in any sensory modality and may contribute to our highly adaptive behaviors

    Subjective evaluation of the audio bandwidth extension program MaxxBass; Lab and web based listening tests monitoring the effect on single and mixed music tracks.

    Get PDF
    The following thesis presents a subjective evaluation of the audio bandwidth extension algorithms available in the MaxxBass Program. The thesis explains the principle of low frequency perception and low frequency synthesis algorithms. The listening tests are conducted using a portable devices, under laboratory conditions and through the website. The recordings consist of mixed original and processed tracks. The perception of the processed recordings were evaluated. Tests are conducted on a diverse group of people. The test results indicate factors which affect the differences in the evaluation record. Moreover, this thesis presents a functional and non functional requirements which are necessary to prepare and carry out the testing process. The analyzed results indicate that original recording are evaluated better, although individual preference depends on the music genre and algorithm settings

    Intelligent Tools for Multitrack Frequency and Dynamics Processing

    Get PDF
    PhDThis research explores the possibility of reproducing mixing decisions of a skilled audio engineer with minimal human interaction that can improve the overall listening experience of musical mixtures, i.e., intelligent mixing. By producing a balanced mix automatically musician and mixing engineering can focus on their creativity while the productivity of music production is increased. We focus on the two essential aspects of such a system, frequency and dynamics. This thesis presents an intelligent strategy for multitrack frequency and dynamics processing that exploit the interdependence of input audio features, incorporates best practices in audio engineering, and driven by perceptual models and subjective criteria. The intelligent frequency processing research begins with a spectral characteristic analysis of commercial recordings, where we discover a consistent leaning towards a target equalization spectrum. A novel approach for automatically equalizing audio signals towards the observed target spectrum is then described and evaluated. We proceed to dynamics processing, and introduce an intelligent multitrack dynamic range compression algorithm, in which various audio features are proposed and validated to better describe the transient nature and spectral content of the signals. An experiment to investigate the human preference on dynamic processing is described to inform our choices of parameter automations. To provide a perceptual basis for the intelligent system, we evaluate existing perceptual models, and propose several masking metrics to quantify the masking behaviour within the multitrack mixture. Ultimately, we integrate previous research on auditory masking, frequency and dynamics processing, into one intelligent system of mix optimization that replicates the iterative process of human mixing. Within the system, we explore the relationship between equalization and dynamics processing, and propose a general frequency and dynamics processing framework. Various implementations of the intelligent system are explored and evaluated objectively and subjectively through listening experiments.China Scholarship Council

    Analysis, modeling and wide-area spatiotemporal control of low-frequency sound reproduction

    Get PDF
    This research aims to develop a low-frequency response control methodology capable of delivering a consistent spectral and temporal response over a wide listening area. Low-frequency room acoustics are naturally plagued by room-modes, a result of standing waves at frequencies with wavelengths that are integer multiples of one or more room dimension. The standing wave pattern is different for each modal frequency, causing a complicated sound field exhibiting a highly position-dependent frequency response. Enhanced systems are investigated with multiple degrees of freedom (independently-controllable sound radiating sources) to provide adequate low-frequency response control. The proposed solution, termed a chameleon subwoofer array or CSA, adopts the most advantageous aspects of existing room-mode correction methodologies while emphasizing efficiency and practicality. Multiple degrees of freedom are ideally achieved by employing what is designated a hybrid subwoofer, which provides four orthogonal degrees of freedom configured within a modest-sized enclosure. The CSA software algorithm integrates both objective and subjective measures to address listener preferences including the possibility of individual real-time control. CSAs and existing techniques are evaluated within a novel acoustical modeling system (FDTD simulation toolbox) developed to meet the requirements of this research. Extensive virtual development of CSAs has led to experimentation using a prototype hybrid subwoofer. The resulting performance is in line with the simulations, whereby variance across a wide listening area is reduced by over 50% with only four degrees of freedom. A supplemental novel correction algorithm addresses correction issues at select narrow frequency bands. These frequencies are filtered from the signal and replaced using virtual bass to maintain all aural information, a psychoacoustical effect giving the impression of low-frequency. Virtual bass is synthesized using an original hybrid approach combining two mainstream synthesis procedures while suppressing each method‟s inherent weaknesses. This algorithm is demonstrated to improve CSA output efficiency while maintaining acceptable subjective performance

    A variable passive low-frequency absorber

    Get PDF

    Treatise on Hearing: The Temporal Auditory Imaging Theory Inspired by Optics and Communication

    Full text link
    A new theory of mammalian hearing is presented, which accounts for the auditory image in the midbrain (inferior colliculus) of objects in the acoustical environment of the listener. It is shown that the ear is a temporal imaging system that comprises three transformations of the envelope functions: cochlear group-delay dispersion, cochlear time lensing, and neural group-delay dispersion. These elements are analogous to the optical transformations in vision of diffraction between the object and the eye, spatial lensing by the lens, and second diffraction between the lens and the retina. Unlike the eye, it is established that the human auditory system is naturally defocused, so that coherent stimuli do not react to the defocus, whereas completely incoherent stimuli are impacted by it and may be blurred by design. It is argued that the auditory system can use this differential focusing to enhance or degrade the images of real-world acoustical objects that are partially coherent. The theory is founded on coherence and temporal imaging theories that were adopted from optics. In addition to the imaging transformations, the corresponding inverse-domain modulation transfer functions are derived and interpreted with consideration to the nonuniform neural sampling operation of the auditory nerve. These ideas are used to rigorously initiate the concepts of sharpness and blur in auditory imaging, auditory aberrations, and auditory depth of field. In parallel, ideas from communication theory are used to show that the organ of Corti functions as a multichannel phase-locked loop (PLL) that constitutes the point of entry for auditory phase locking and hence conserves the signal coherence. It provides an anchor for a dual coherent and noncoherent auditory detection in the auditory brain that culminates in auditory accommodation. Implications on hearing impairments are discussed as well.Comment: 603 pages, 131 figures, 13 tables, 1570 reference
    corecore