Search CORE

2,428 research outputs found

Minimising latency of pitch detection algorithms for live vocals on low-cost hardware

Author: Firth Matthew
Publication venue: 'University of Huddersfield Press'
Publication date: 16/12/2015
Field of study

A pitch estimation device was proposed for live vocals to output appropriate pitch data through the musical instrument digital interface (MIDI). The intention was to ideally achieve unnoticeable latency while maintaining estimation accuracy. The projected target platform was low-cost, standalone hardware based around a microcontroller such as the Microchip PIC series. This study investigated, optimised and compared the performance of suitable algorithms for this application. Performance was determined by two key factors: accuracy and latency. Many papers have been published over the past six decades assessing and comparing the accuracy of pitch detection algorithms on various signals, including vocals. However, very little information is available concerning the latency of pitch detection algorithms and methods with which this can be minimised. Real-time audio introduces a further latency challenge that is sparsely studied, minimising the length of sampled audio required by the algorithms in order to reduce overall total latency. Thorough testing was undertaken in order to determine the best-performing algorithm and optimal parameter combination. Software modifications were implemented to facilitate accurate, repeatable, automated testing in order to build a comprehensive set of results encompassing a wide range of test conditions. The results revealed that the infinite-peak-clipping autocorrelation function (IACF) performed better than the other autocorrelation functions tested and also identified ideal parameter values or value ranges to provide the optimal latency/accuracy balance. Although the results were encouraging, testing highlighted some fundamental issues with vocal pitch detection. Potential solutions are proposed for further development

Crossref

Directory of Open Access Journals

University of Huddersfield Repository

Musical notes classification with Neuromorphic Auditory System using FPGA and a Convolutional Spiking Network

Author: Cerezuela Escudero Elena
Domínguez Morales Manuel Jesús
Jiménez Fernández Ángel Francisco
Jiménez Moreno Gabriel
Linares Barranco Alejandro
Paz Vicente Rafael
Publication venue: IEEE Computer Society
Publication date: 01/01/2015
Field of study

In this paper, we explore the capabilities of a sound classification system that combines both a novel FPGA cochlear model implementation and a bio-inspired technique based on a trained convolutional spiking network. The neuromorphic auditory system that is used in this work produces a form of representation that is analogous to the spike outputs of the biological cochlea. The auditory system has been developed using a set of spike-based processing building blocks in the frequency domain. They form a set of band pass filters in the spike-domain that splits the audio information in 128 frequency channels, 64 for each of two audio sources. Address Event Representation (AER) is used to communicate the auditory system with the convolutional spiking network. A layer of convolutional spiking network is developed and trained on a computer with the ability to detect two kinds of sound: artificial pure tones in the presence of white noise and electronic musical notes. After the training process, the presented system is able to distinguish the different sounds in real-time, even in the presence of white noise.Ministerio de Economía y Competitividad TEC2012-37868-C04-0

idUS. Depósito de Investigación Universidad de Sevilla

Investigation of Auditory Encoding and the Use of Auditory Feedback During Speech Production

Author: Beamish Laura E
Publication venue: Scholarship@Western
Publication date: 14/08/2013
Field of study

Responses to altered auditory feedback during speech production are highly variable. The extent to which auditory encoding influences this varied use is not well understood. Thirty-nine normal hearing adults completed a first formant (F1) manipulation paradigm where F1 of the vowel /ε/ was shifted upwards in frequency towards an /æ/–like vowel in real-time. Frequency following responses (FFRs) and envelope following responses (EFRs) were used to measure neuronal activity to the same vowels produced by the participant and a prototypical talker. Cochlear tuning, measured by SFOAEs and a psychophysical method, was also recorded. Results showed that average F1 production changed to oppose the manipulation. Three metrics of EFR and FFR encoding were evaluated. No reliable relationship was found between speech compensation and evoked response measures or measures of cochlear tuning. Differences in brainstem encoding of vowels and sharpness of cochlear tuning do not appear to explain the variability observed in speech production

Scholarship@Western

Deep Cytometry: Deep learning with Real-time Inference in Cell Sorting and Flow Cytometry

Author: Chen Claire Lifan
Jalali Bahram
Li Yueqin
Mahjoubfar Ata
Niazi Kayvan Reza
Pei Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2019
Field of study

Deep learning has achieved spectacular performance in image and speech recognition and synthesis. It outperforms other machine learning algorithms in problems where large amounts of data are available. In the area of measurement technology, instruments based on the photonic time stretch have established record real-time measurement throughput in spectroscopy, optical coherence tomography, and imaging flow cytometry. These extreme-throughput instruments generate approximately 1 Tbit/s of continuous measurement data and have led to the discovery of rare phenomena in nonlinear and complex systems as well as new types of biomedical instruments. Owing to the abundance of data they generate, time-stretch instruments are a natural fit to deep learning classification. Previously we had shown that high-throughput label-free cell classification with high accuracy can be achieved through a combination of time-stretch microscopy, image processing and feature extraction, followed by deep learning for finding cancer cells in the blood. Such a technology holds promise for early detection of primary cancer or metastasis. Here we describe a new deep learning pipeline, which entirely avoids the slow and computationally costly signal processing and feature extraction steps by a convolutional neural network that directly operates on the measured signals. The improvement in computational efficiency enables low-latency inference and makes this pipeline suitable for cell sorting via deep learning. Our neural network takes less than a few milliseconds to classify the cells, fast enough to provide a decision to a cell sorter for real-time separation of individual target cells. We demonstrate the applicability of our new method in the classification of OT-II white blood cells and SW-480 epithelial cancer cells with more than 95% accuracy in a label-free fashion

arXiv.org e-Print Archive

eScholarship - University of California

Effects of Hearing Aid Amplification on Robust Neural Coding of Speech

Author: Boley Jonathan Daniel
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2013
Field of study

Hearing aids are able to restore some hearing abilities for people with auditory impairments, but background noise remains a significant problem. Unfortunately, we know very little about how speech is encoded in the auditory system, particularly in impaired systems with prosthetic amplifiers. There is growing evidence that relative timing in the neural signals (known as spatiotemporal coding) is important for speech perception, but there is little research that relates spatiotemporal coding and hearing aid amplification. This research uses a combination of computational modeling and physiological experiments to characterize how hearing aids affect vowel coding in noise at the level of the auditory nerve. The results indicate that sensorineural hearing impairment degrades the temporal cues transmitted from the ear to the brain. Two hearing aid strategies (linear gain and wide dynamic-range compression) were used to amplify the acoustic signal. Although appropriate gain was shown to improve temporal coding for individual auditory nerve fibers, neither strategy improved spatiotemporal cues. Previous work has attempted to correct the relative timing by adding frequency-dependent delays to the acoustic signal (e.g., within a hearing aid). We show that, although this strategy can affect the timing of auditory nerve responses, it is unlikely to improve the relative timing as intended. We have shown that existing hearing aid technologies do not improve some of the neural cues that we think are important for perception, but it is important to understand these limitations. Our hope is that this knowledge can be used to develop new technologies to improve auditory perception in difficult acoustic environments

Purdue E-Pubs

Evaluation of Auditory Evoked Potentials as a Hearing aid Outcome Measure

Author: Easwar Vijayalakshmi
Publication venue: Scholarship@Western
Publication date: 21/08/2014
Field of study

This thesis aimed to explore the applicability of Cortical Auditory Evoked Potentials (CAEPs) and Envelope Following Responses (EFRs) as objective aided outcome measures for use in infants wearing hearing aids. The goals for CAEP-related projects were to evaluate the effect of speech stimulus source on CAEPs, non-linear hearing aid processing on tone-evoked CAEPs, and the effect of inter-stimulus intervals on non-linear hearing aid processing of phonemes. Results illustrated larger amplitude CAEPs with shorter latencies for speech stimuli from word-medial positions than word-initial positions, and no significant effect of the tone burst onset overshoot due to non-linear hearing aid processing. Inter-stimulus intervals in CAEP protocols resulted in significantly lower aided phoneme levels compared to when they occurred in running speech, illustrating potential inaccuracies in representation of relevant hearing aid function during testing. The major contribution of this thesis includes the proposal and validation of a test paradigm based on speech-evoked EFRs for use as an objective aided outcome measure. The stimulus is a naturally spoken token /susashi/ modified to enable recording of eight EFRs from low, mid and high frequency regions. The projects aimed to evaluate previously recommended response analysis methods of averaging responses to opposite polarities for vowel-evoked EFRs as well as sensitivity of the proposed paradigm to changes in audibility due to level and bandwidth in adults with normal hearing and additionally, due to amplification in adults with hearing loss. Results demonstrated a vowel-specific effect of averaging opposite polarity responses when the first harmonic was present, however the averaging did not affect detection in the majority of participants. The EFR test paradigm illustrated carrier-specific changes in audibility due to level, bandwidth and amplification suggesting that the paradigm may be a useful tool in evaluating unaided and aided audibility, and therefore appropriateness of hearing aid fittings. Further validation is necessary in infants and children wearing hearing aids. In conclusion, CAEPs and EFRs vary in strengths and limitations, and therefore it is likely that a combination of measures may be necessary to address the variety of hearing disorders seen in a typical audiological caseload

Scholarship@Western