1,130 research outputs found

    Shaping the auditory peripersonal space with motor planning in immersive virtual reality

    Get PDF
    Immersive audio technologies require personalized binaural synthesis through headphones to provide perceptually plausible virtual and augmented reality (VR/AR) simulations. We introduce and apply for the first time in VR contexts the quantitative measure called premotor reaction time (pmRT) for characterizing sonic interactions between humans and the technology through motor planning. In the proposed basic virtual acoustic scenario, listeners are asked to react to a virtual sound approaching from different directions and stopping at different distances within their peripersonal space (PPS). PPS is highly sensitive to embodied and environmentally situated interactions, anticipating the motor system activation for a prompt preparation for action. Since immersive VR applications benefit from spatial interactions, modeling the PPS around the listeners is crucial to reveal individual behaviors and performances. Our methodology centered around the pmRT is able to provide a compact description and approximation of the spatiotemporal PPS processing and boundaries around the head by replicating several well-known neurophysiological phenomena related to PPS, such as auditory asymmetry, front/back calibration and confusion, and ellipsoidal action fields

    Computational Tonality Estimation: Signal Processing and Hidden Markov Models

    Get PDF
    PhDThis thesis investigates computational musical tonality estimation from an audio signal. We present a hidden Markov model (HMM) in which relationships between chords and keys are expressed as probabilities of emitting observable chords from a hidden key sequence. The model is tested first using symbolic chord annotations as observations, and gives excellent global key recognition rates on a set of Beatles songs. The initial model is extended for audio input by using an existing chord recognition algorithm, which allows it to be tested on a much larger database. We show that a simple model of the upper partials in the signal improves percentage scores. We also present a variant of the HMM which has a continuous observation probability density, but show that the discrete version gives better performance. Then follows a detailed analysis of the effects on key estimation and computation time of changing the low level signal processing parameters. We find that much of the high frequency information can be omitted without loss of accuracy, and significant computational savings can be made by applying a threshold to the transform kernels. Results show that there is no single ideal set of parameters for all music, but that tuning the parameters can make a difference to accuracy. We discuss methods of evaluating more complex tonal changes than a single global key, and compare a metric that measures similarity to a ground truth to metrics that are rooted in music retrieval. We show that the two measures give different results, and so recommend that the choice of evaluation metric is determined by the intended application. Finally we draw together our conclusions and use them to suggest areas for continuation of this research, in the areas of tonality model development, feature extraction, evaluation methodology, and applications of computational tonality estimation.Engineering and Physical Sciences Research Council (EPSRC)

    Tarsos: a platform to explore pitch scales in non-western and western music

    Get PDF

    Comparison of level discrimination, increment detection, and comodulation masking release in the audio- and envelope-frequency domains

    Get PDF
    In general, the temporal structure of stimuli must be considered to account for certain observations made in detection and masking experiments in the audio-frequency domain. Two such phenomena are (1) a heightened sensitivity to amplitude increments with a temporal fringe compared to gated level discrimination performance and (2) lower tone-in-noise detection thresholds using a modulated masker compared to those using an unmodulated masker. In the current study, translations of these two experiments were carried out to test the hypothesis that analogous cues might be used in the envelope-frequency domain. Pure-tone carrier amplitude-modulation (AM) depth-discrimination thresholds were found to be similar using both traditional gated stimuli and using a temporally modulated fringe for a fixed standard depth (m(s)=0.25) and a range of AM frequencies (4-64 Hz). In a second experiment, masked sinusoidal AM detection thresholds were compared in conditions with and without slow and regular fluctuations imposed on the instantaneous masker AM depth. Release from masking was obtained only for very slow masker fluctuations (less than 2 Hz). A physiologically motivated model that effectively acts as a first-order envelope change detector accounted for several, but not all, of the key aspects of the data

    Experimental study of acoustic displays of flight parameters in a simulated aerospace vehicle

    Get PDF
    Evaluating acoustic displays of target location in target detection and of flight parameters in simulated aerospace vehicle

    Automatic transcription of traditional Turkish art music recordings: A computational ethnomusicology appraoach

    Get PDF
    Thesis (Doctoral)--Izmir Institute of Technology, Electronics and Communication Engineering, Izmir, 2012Includes bibliographical references (leaves: 96-109)Text in English; Abstract: Turkish and Englishxi, 131 leavesMusic Information Retrieval (MIR) is a recent research field, as an outcome of the revolutionary change in the distribution of, and access to the music recordings. Although MIR research already covers a wide range of applications, MIR methods are primarily developed for western music. Since the most important dimensions of music are fundamentally different in western and non-western musics, developing MIR methods for non-western musics is a challenging task. On the other hand, the discipline of ethnomusicology supplies some useful insights for the computational studies on nonwestern musics. Therefore, this thesis overcomes this challenging task within the framework of computational ethnomusicology, a new emerging interdisciplinary research domain. As a result, the main contribution of this study is the development of an automatic transcription system for traditional Turkish art music (Turkish music) for the first time in the literature. In order to develop such system for Turkish music, several subjects are also studied for the first time in the literature which constitute other contributions of the thesis: Automatic music transcription problem is considered from the perspective of ethnomusicology, an automatic makam recognition system is developed and the scale theory of Turkish music is evaluated computationally for nine makamlar in order to understand whether it can be used for makam detection. Furthermore, there is a wide geographical region such as Middle-East, North Africa and Asia sharing similarities with Turkish music. Therefore our study would also provide more relevant techniques and methods than the MIR literature for the study of these non-western musics

    Automatic transcription of polyphonic music exploiting temporal evolution

    Get PDF
    PhDAutomatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous applications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing polyphonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains open. In this thesis, research on automatic transcription is performed by explicitly incorporating information on the temporal evolution of sounds. First efforts address the problem by focusing on signal processing techniques and by proposing audio features utilising temporal characteristics. Techniques for note onset and offset detection are also utilised for improving transcription performance. Subsequent approaches propose transcription models based on shift-invariant probabilistic latent component analysis (SI-PLCA), modeling the temporal evolution of notes in a multiple-instrument case and supporting frequency modulations in produced notes. Datasets and annotations for transcription research have also been created during this work. Proposed systems have been privately as well as publicly evaluated within the Music Information Retrieval Evaluation eXchange (MIREX) framework. Proposed systems have been shown to outperform several state-of-the-art transcription approaches. Developed techniques have also been employed for other tasks related to music technology, such as for key modulation detection, temperament estimation, and automatic piano tutoring. Finally, proposed music transcription models have also been utilized in a wider context, namely for modeling acoustic scenes

    PSYCHOACOUSTIC OPTIMIZATION OF THE VQ-VAE AND TRANSFORMER ARCHITECTURES FOR HUMAN-LIKE AUDITORY PERCEPTION IN MUSIC INFORMATION RETRIEVAL AND GENERATION TASKS

    Get PDF
    Despite incredible advancements in the utilization of learning-based architectures (AI) in natural language and image domains, their applicability to the domain of music has remained limited. In fact, the performance of state-of-the-art Automated Music Transcription (AMT) systems has seen only marginal improvements from novel AI architectures. Moreover, the importance of psychoacoustic perception and its incorporation into MIR systems have mostly stayed addressed, leading to shortcomings in current approaches. This thesis provides an overview of music processing and novel neural architectures, investigates the reasons behind the subpar performance achieved by their utilization in music information retrieval (MIR) tasks, and proposes several ways of adjusting both the music (data-related) pre-processing pipelines, and psychoacoustically-adjusted transformer-based model to improve the performance on MIR and AMT tasks. In particular, a new music transformer architecture is proposed, and various algorithms of music pre-processing for psychoacoustic optimization are implemented along with several adaptive models aimed at addressing the missing factor of modeling human music perception. The preliminary performance results exhibit promising outcomes, warranting the continued investigation of transformer architectures for music information retrieval applications. Several intriguing insights unveiled during the research process are discussed and presented. The thesis concludes by delineating a set of promising future research directions, paving the way for further advancements in the field of music information retrieval and generation using proposed architectures

    Automatic chord transcription from audio using computational models of musical context

    Get PDF
    PhDThis thesis is concerned with the automatic transcription of chords from audio, with an emphasis on modern popular music. Musical context such as the key and the structural segmentation aid the interpretation of chords in human beings. In this thesis we propose computational models that integrate such musical context into the automatic chord estimation process. We present a novel dynamic Bayesian network (DBN) which integrates models of metric position, key, chord, bass note and two beat-synchronous audio features (bass and treble chroma) into a single high-level musical context model. We simultaneously infer the most probable sequence of metric positions, keys, chords and bass notes via Viterbi inference. Several experiments with real world data show that adding context parameters results in a significant increase in chord recognition accuracy and faithfulness of chord segmentation. The proposed, most complex method transcribes chords with a state-of-the-art accuracy of 73% on the song collection used for the 2009 MIREX Chord Detection tasks. This method is used as a baseline method for two further enhancements. Firstly, we aim to improve chord confusion behaviour by modifying the audio front end processing. We compare the effect of learning chord profiles as Gaussian mixtures to the effect of using chromagrams generated from an approximate pitch transcription method. We show that using chromagrams from approximate transcription results in the most substantial increase in accuracy. The best method achieves 79% accuracy and significantly outperforms the state of the art. Secondly, we propose a method by which chromagram information is shared between repeated structural segments (such as verses) in a song. This can be done fully automatically using a novel structural segmentation algorithm tailored to this task. We show that the technique leads to a significant increase in accuracy and readability. The segmentation algorithm itself also obtains state-of-the-art results. A method that combines both of the above enhancements reaches an accuracy of 81%, a statistically significant improvement over the best result (74%) in the 2009 MIREX Chord Detection tasks.Engineering and Physical Research Council U

    Strategies for Reducing the Impact of Tumour Motion During Helical Tomotherapy

    Get PDF
    Tumour motion presents a significant limitation for effective radiotherapy of lung cancer, and more specifically for helical tomotherapy. The simultaneous and continuous movements of tomotherapy subsystems (gantry, couch, and binary multi-leaf collimator) can lead to inaccurate dose delivery, when combined with tumour motion. In this thesis, we have investigated the impact of tumour motion and strategies to reduce the resulting dose discrepancies for helical tomotherapy, through computer simulations and film measurements performed in a dynamic body phantom. Three distinctively different types of dose discrepancies have been isolated: dose rounding, dose rippling, and the intensity-modulated radiation therapy (IMRT) asynchronization effect. Each effect was shown to be affected by different combinations of tumour motion and treatment parameters. In clinical practice using a conventional fractionation scheme, the dose rounding effect remains the major concern, which can be compensated by assigning a larger treatment margin around the tumour volume. For hypofractionation schemes, the IMRT asynchronization effect can become an additional concern by introducing dose discrepancies inside the target volume, necessitating the use of a motion management technique. Two new motion management techniques have thus been developed for helical tomotherapy: loose helical tomotherapy with breath-holding and multi-pass respiratory gating. Both methods require the treatment couch to be reset to its starting position to repeat the entire helical treatment, until nearly all planned dose is delivered. For sinusoidal target motion, employing multi-pass respiratory gating was shown to reduce the dose deviation inside the target volume from 14% to 2% for a single fraction, using 4 gated passes. For non-sinusoidal tumour motion causing a dose deviation of 6% within the tumour volume, the required number of passes to keep the dose deviation below 1% was approximately 4 passes for 30 fractions and 5 passes for 3 fractions, demonstrating the feasibility of the multi-pass respiratory gating approach. Clinical implementation of the multi-pass respiratory gating technique would require a number of electronic control and communication modifications to the existing tomotherapy machine, which would lead to significant improvements in the dose distributions delivered for lung tomotherapy treatments – especially for patients exhibiting large tumour motion who are treated with hypofractionation schemes
    corecore