Development and Evaluation of a Real-Time Framework for a Portable Assistive Hearing Device by Mihaylov, Aleksandar Dimitrov
Western University 
Scholarship@Western 
Electronic Thesis and Dissertation Repository 
12-6-2012 12:00 AM 
Development and Evaluation of a Real-Time Framework for a 
Portable Assistive Hearing Device 
Aleksandar Dimitrov Mihaylov 
The University of Western Ontario 
Supervisor 
Vijay Parsa 
The University of Western Ontario 
Graduate Program in Electrical and Computer Engineering 
A thesis submitted in partial fulfillment of the requirements for the degree in Master of 
Engineering Science 
© Aleksandar Dimitrov Mihaylov 2012 
Follow this and additional works at: https://ir.lib.uwo.ca/etd 
 Part of the Signal Processing Commons 
Recommended Citation 
Mihaylov, Aleksandar Dimitrov, "Development and Evaluation of a Real-Time Framework for a Portable 
Assistive Hearing Device" (2012). Electronic Thesis and Dissertation Repository. 980. 
https://ir.lib.uwo.ca/etd/980 
This Dissertation/Thesis is brought to you for free and open access by Scholarship@Western. It has been accepted 
for inclusion in Electronic Thesis and Dissertation Repository by an authorized administrator of 
Scholarship@Western. For more information, please contact wlswadmin@uwo.ca. 
DEVELOPMENT AND EVALUATION OF A REAL-TIME FRAMEWORK
FOR A PORTABLE ASSISTIVE HEARING DEVICE
(Thesis format: Monograph)
by
Aleksandar Mihaylov
Graduate Program in
Enginering Science
Electrical And Computer Engineering
A thesis submitted in partial fulfillment
of the requirements for the degree of
Masters of Engineering Science
The School of Graduate and Postdoctoral Studies
The University of Western Ontario
London, Ontario, Canada
c© Aleksandar Dimitrov Mihaylov 2012
THE UNIVERSITY OF WESTERN ONTARIO
School of Graduate and Postdoctoral Studies
CERTIFICATE OF EXAMINATION
Supervisor:
. . . . . . . . . . . . . . . . . . . . .
Dr. Vijay Parsa
Examiners:
. . . . . . . . . . . . . . . . . . . . .
Dr. E. Macpherson
. . . . . . . . . . . . . . . . . . . . .
Dr. I. Polushin
. . . . . . . . . . . . . . . . . . . . .
Dr. R. K. Rao
The thesis by
Aleksandar Dimitrov Mihaylov
entitled:
DEVELOPMENT AND EVALUATION OF A REAL-TIME FRAMEWORK FOR A
PORTABLE ASSISTIVE HEARING DEVICE
is accepted in partial fulfillment of the
requirements for the degree of
Masters of Engineering Science
. . . . . . . . . . . . . . .
Date
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chair of the Thesis Examination Board
ii
Acknowledgements
First and foremost I would like thank my supervisor, Dr. Vijay Parsa for allowing me to take
part in this project, for providing crucial guidance and for his endless patience during the work
described herein and for being a great teacher.
I would like to thank my thesis examination panel: Dr. Raveendra Rao, Dr Ewan Machper-
son and Dr Ilia Polushin for taking the time to read and evaluate my work.
Special thanks to Maggie Sazio for all the endless nights spent on this thesis and many
aspects of it. Thank you for your support.
Many thanks to Steve Beulac, who made it possible for me to realize my dream of learning
Linux inside and out. For all of his suggestions and help during development and for all the
random chats we had.
Thanks to my father Dimitar Mihaylov, for all of the relevant talks we have had, thank you
also for the all the other irrelevant and amusing conversation that helped to keep me sane.
Thanks to Tobias Hertzke from Horetech, Germany for all his help with the MHA platform.
Finally thank you to all of my engineering friends and colleagues: Nazanin Pourmand,
Julie Ledges, Ben Morgan, John Pietrobon, Arvind Venkatasubramanian, Filip Aleksanderek,
Gregory Kish and many others. They have helped shape: me, my knowledge and my skills to
what they are today.
iii
Abstract
Testing and verification of digital hearing aid devices, and the embedded software and
algorithms can prove to be a challenging task especially taking into account time-to-market
considerations. This thesis describes a PC based, real-time, highly configurable framework for
the evaluation of audio algorithms. Implementation of audio processing algorithms on such
a platform can provide hearing aid designers and manufacturers the ability to test new and
existing processing techniques and collect data about their performance in real-life situations,
and without the need to develop a prototype device.
The platform is based on the Eurotech Catalyst development kit and the Fedora Linux OS,
and it utilizes the JACK audio engine to facilitate reliable real-time performance
Additionally, we demonstrate the capabilities of this platform by implementing an audio
processing chain targeted at improving speech intelligibility for people suffering from auditory
neuropathy. Evaluation is performed for both noisy and noise-free environments. Subjective
evaluation of the results, using normal hearing listeners and an auditory neuropathy simulator,
demonstrates improvement in some conditions.
Keywords: Intel high definition audio (Intel HDA), audio hardware design, Linux, JACK,
Real-time processing, digital signal processing, envelope enhancement, noise reduction, voice
activity detection.
iv
Contents
Certificate of Examination ii
Abstract iv
List of Figures viii
List of Tables x
List of Appendices xi
Nomenclature xii
1 Introduction 1
1.1 Human Hearing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Auditory Neuropathy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Portable Hearing Aid Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Thesis Scope and Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Literature Review and Background Information 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Portable DSP systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Master Hearing Aid (MHA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Eurotech Catalyst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 Description of the CPU Module and Development Kit . . . . . . . . . . 16
2.4.2 Audio Interface Alternatives . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.3 Audio Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Intel High Definition Audio - Azalia . . . . . . . . . . . . . . . . . . . . . . . 20
2.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5.2 Controller and Bus/Link Description . . . . . . . . . . . . . . . . . . . 21
2.5.3 Codecs, Nodes and Widgets . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 ALSA - Advanced Linux Sound Architecture . . . . . . . . . . . . . . . . . . 24
2.7 Real-Time Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.7.1 JACK Audio Connection Kit . . . . . . . . . . . . . . . . . . . . . . . 25
2.7.2 MHA and JACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.8 Considerations for Real-Time Algorithm Design . . . . . . . . . . . . . . . . . 26
v
2.8.1 Interrupt and processing Latency . . . . . . . . . . . . . . . . . . . . . 27
2.8.2 Frame Based DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.8.3 Execution Time and Memory Limitations . . . . . . . . . . . . . . . . 28
2.9 Real-Time Considerations and Hardware Summary . . . . . . . . . . . . . . . 29
2.10 Envelope Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.10.1 Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Hilbert Envelope Detection . . . . . . . . . . . . . . . . . . . . . . . . 31
Full-Wave Rectification . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Enhancing the Envelope . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.10.2 Real-Time Implementations of Envelope Enhancement . . . . . . . . . 34
2.11 logMMSE Noise Cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.11.1 Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.11.2 Real-Time Implementations of logMMSE . . . . . . . . . . . . . . . . 38
2.12 Auditory Neuropathy Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3 Design and Implementation of a Custom Audio Interface 40
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.1 Bandwidth Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.2 Power Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Circuit Complexity based on Audio Bus Selection . . . . . . . . . . . . . . . . 42
3.3.1 HDA Codec Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.2 PCB considerations and supporting hardware: Grounding, Motherboard
Interface and Power Supply Stabilization . . . . . . . . . . . . . . . . 44
3.3.3 Input/Output Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4 Final PCB implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5 Driver Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5.1 Intel HDA ALSA driver add-on . . . . . . . . . . . . . . . . . . . . . 47
3.5.2 Widget Configuration and Initialization . . . . . . . . . . . . . . . . . 48
3.5.3 Audio Widget and Stream Controls . . . . . . . . . . . . . . . . . . . . 48
3.6 DMA Limitations of the Eurotech Kit . . . . . . . . . . . . . . . . . . . . . . 50
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 Real-Time Framework and Algorithms 52
4.1 Operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.1.1 Basic Real-Time Scheduling . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.2 RTOS kernel and configuration . . . . . . . . . . . . . . . . . . . . . . 53
4.1.3 Required Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.4 JACK server setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.5 JACK client Description . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 RT Implementation Considerations . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.1 Memory and Computational Limitations . . . . . . . . . . . . . . . . . 57
4.2.2 Automatic Gain Control (AGC) . . . . . . . . . . . . . . . . . . . . . 58
4.2.3 Filtering Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
vi
4.2.4 Programming Practices . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.1 Envelope Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Pre-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Envelope Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Envelope Expansion/Compression . . . . . . . . . . . . . . . . . . . . 65
Output Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3.2 logMMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Pre-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Applying LogMMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Output Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3.3 Algorithm Tester . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4 Real-Time framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5 Hardware, Framework and Algorithm Evaluation 72
5.1 Audio Interface Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2 Framework Performance and Requirements Verification . . . . . . . . . . . . . 73
5.3 Speech Shaped Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.4 Subjective Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.4.1 Stimuli, Pre-processing and Test Set Composition . . . . . . . . . . . . 78
5.4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6 Conclusions and Future Work 83
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Major Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Bibliography 86
A SVM Classification 91
B HDA Pin Configuration and Verb Table 98
B.1 Pin/Port Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
B.2 Initialization Verb table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
C Software Libraries 100
C.1 Intel Performance Primitives(IPP) . . . . . . . . . . . . . . . . . . . . . . . . 100
C.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
C.1.2 Hardware and Software Requirements . . . . . . . . . . . . . . . . . . 101
C.2 GTK and Glade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
C.2.1 Provided GUI Widgets . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Curriculum Vitae 103
vii
List of Figures
1.1 Anatomy of the human ear . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Generic PDA based algorithm platform . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Layered structure of the master hearing aid . . . . . . . . . . . . . . . . . . . . 14
2.2 PHS prototype based on the Asus Eee PC . . . . . . . . . . . . . . . . . . . . 16
2.3 Catalyst Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Catalyst Development Kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Intel HD Audio Components . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Intel HDA controller DMA engines . . . . . . . . . . . . . . . . . . . . . . . . 21
2.7 Intel HDA bus signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.8 Intel HDA Codec node/widget hierarchy . . . . . . . . . . . . . . . . . . . . . 23
2.9 JACK, ALSA and Kernel Interaction Diagram . . . . . . . . . . . . . . . . . . 26
3.1 Power Supply Stabilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Headphone/Speaker Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 Microphone Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Circuit Schematic for the AD1989a codec, including supporting hardware . . . 46
3.5 PCB layout bottom copper layer . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.6 PCB layout bottom copper layer . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.7 Complete AD1989a prototype . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.8 Widget Diagram. Red lines show the active signal paths . . . . . . . . . . . . . 49
3.9 Alsamixer TUI displaying the exposed audio stream controls for the AD1989
codec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1 Generic JACK client structure . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 AGC topology using arbitrary length memory buffers. . . . . . . . . . . . . . . 59
4.3 Envelope Enhancement algorithm flow. Note buffer alignment trick allowing
us to simulate non-causal filtering. Short buffers act as a unit frame delay. . . . 62
4.4 Band and envelope waveforms for the sentence ’A boy fell from the window’. . 64
4.5 Envelope Enhancement: Input(top) and Output(bottom) waveforms for the sen-
tence ’A boy fell from the window’ . . . . . . . . . . . . . . . . . . . . . . . . 65
4.6 logMMSE algorithm, data flow diagram. . . . . . . . . . . . . . . . . . . . . . 66
4.7 Input and Output waveforms for the MMSE algorithm for the sentence ’The
birch canoe slid on the smooth planks’ at 5dB SNR. Left: input; Right: output . 68
4.8 GUI for the MMSE client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.9 GUI for the EE client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.10 Block Diagram for the Algorithm tester . . . . . . . . . . . . . . . . . . . . . 69
viii
4.11 Algorithm Tester GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.12 RT framework layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.13 Sample algorithm chain including the MMSE and EE clients . . . . . . . . . . 71
5.1 Sample speech-shaped noise spectrum . . . . . . . . . . . . . . . . . . . . . . 76
5.2 Processing of the sentence ’Smoke poured out of every crack’ . . . . . . . . . . 77
5.3 Envelope Enhancement Subjective Results for clean speech at two strengths of
the EE algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4 Envelope Enhancement Subjective Results for noisy speech at 5dB SNR . . . . 81
5.5 Envelope Enhancement Subjective Results for noisy speech at 10dB SNR . . . 82
A.1 Scatter plot for Λ6 and Λ7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
A.2 VAD decisions for a single speech segment from the training set. SNR=5dB . . 97
A.3 VAD decisions for a single speech segment from the training set. SNR=20dB . 97
ix
List of Tables
2.1 Parameter values used to simulate various AN conditions . . . . . . . . . . . . 39
3.1 Bandwidth requirements for common stream formats, given in bytes . . . . . . 41
3.2 Audio interfaces power consumption . . . . . . . . . . . . . . . . . . . . . . . 41
4.1 Library and software versions used in the framework . . . . . . . . . . . . . . 55
4.2 I/O Latency for various JACK server setups . . . . . . . . . . . . . . . . . . . 55
5.1 Measured audio interface parameters . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Measured Latency for Various Processing Schemes . . . . . . . . . . . . . . . 74
A.1 Error rates of GM and SVM based VAD for various SNR conditons, compared
to values reported in original study . . . . . . . . . . . . . . . . . . . . . . . . 96
B.1 Port Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
B.2 Verb Sequence Configuring the AD1989 to the required specifications . . . . . 99
x
List of Appendices
Appendix A: SVM Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Appendix B: HDA Pin Configuration and Verb Table . . . . . . . . . . . . . . . . . . . 98
Appendix C: Software Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
xi
Nomenclature
ABR Auditory Brainstem Response
ADC Analog to Digital Converter
ALSA Advanced Linux Sound Architecture
AN Auditory Neuropathy
API Application Programming Interface
BAHA Bone Anchored Hearing Aid
BPF Band-Pass Filter
CASLPA Canadian Association of Speech Language Pathologists
and Audiologists
CPU Central Processing Unit
DAC Digital to Analog Converter
dBFS dB level relative to maximum output before clipping
dBu RMS Voltage level relative to
√
6VRMS
DSP Digital Signal Processor/Processing
EE Envelope enhancement
FFT Fast Fourier Transform
FIR Finite Impulse Response
GPP General Purpose Processor
GUI Graphical User Interface
HPF High-Pass Filter
I/O Input/Output
IC Integrated Circuit
xii
iFFT Inverse Fast Fourier Transform
IIR Infinite Impulse Response
IPP Intel Performance Primitives
IRQ Interrupt Request
ISR Interupt Service Routine
LDO Low Drop-Out
LPF Low-Pass Filter
MHA Master Hearing Aid
MMU Memory Management Unit
MSI Message-Signaled Interrupt
OAE Otoacoustic Emission
OO Object-oriented programming paradigm
OS Operating System
PC Personal Computer/Desktop Computer
PCB Printed Circuit Board
PDA Personal Digital Assistant
PHS Personal Hearing System
RT Real-Time
RTE Real-Time Engine
RTOS Real-Time Operating System
SVM Support-Vector Machine
TUI Text-based User Interface
VAD Voice-Activity Detector
xiii
Chapter 1
Introduction
1.1 Human Hearing
Human hearing involves the perception of acoustic vibrations through the auditory system.
Figure 1.1 illustrates the structure of human ear, which is composed of outer, middle and inner
ears, as well as the central auditory nervous system.
Figure 1.1: Anatomy of the human ear [1]
The outer ear consists of the pinna and the external auditory canal which terminates at the
tympanic membrane. The acoustic signals channel through the auditory canal and impinge on
1
2 C 1. I
the tympanic membrane, setting it in vibration. The tympanic membrane is connected to three
bones that constitute the middle ear: malleus, incus and stapes. This arrangement of the middle
ear bones enables efficient transfer of acoustic energy to the inner ear. The inner ear can be
divided into three parts: the semicircular canals, the vestibule and the cochlea. The cochlea
is a spiral-shaped structure filled with fluid. The fluid is set in motion by the vibration of the
stapes, which in turn excites the hair cells contained within the cochlea. The resulting electrical
signals are combined in the auditory nerve and transmitted to the auditory cortex through the
brainstem, where it is interpreted.
A number of factors can lead to abnormalities in the auditory system which in turn can
lead to hearing loss or deafness. According to the Canadian Association of Speech Language
Pathologists and Audiologists (CASLPA), hearing loss is the third most common chronic dis-
ability among older adults, after arthritis and hypertension [2]. The incidence rate of hearing
loss increases with age; approximately 10% of the general population are estimated to suffer
from some form of hearing loss, but this statistic increases to 20% for the part of the population
over 65, and 40% for those over 75[2]. Comprehensive assessment of auditory function and
appropriate prescription and fitting of a hearing device are crucial for enhancing the commu-
nicative ability and restoring good quality of life for the affected persons.
Hearing loss has traditionally been classified into conductive and sensorineural types:
• Conductive hearing loss occurs when sound is not transmitted properly due to abnormal
functioning of the outer or middle ear [3]. A treatment approach to compensate conduc-
tive hearing loss is the utilization of bone anchored hearing aids (BAHAs). A BAHA
includes a titanium implant that is surgically inserted into the temporal bone and con-
nected to a sound processor through an external abutment. The sound processor converts
the acoustic signal inputs into mechanical vibrations of the implant, which travel through
the temporal bone and directly excite the cochlea.
• Sensorineural hearing loss is the most common form of hearing loss, which occurs when
the sensory cells, neural cells or their connections within the cochlea are functioning
1.2. A N 3
abnormally [3]. Persons with sensorineural hearing loss can be rehabilitated through the
use of properly tuned and fitted hearing aids.
In addition to the hearing loss caused by malfunctioning peripheral auditory system, dis-
orders in central auditory processing can also cause abnormal hearing. Auditory neuropathy
is one such disorder, which is characterized by normal or near-normal cochlear function, but
abnormal auditory nerve responses [4].
1.2 Auditory Neuropathy
Auditory neuropathy (AN) is a hearing disorder that affects the timing of neural activity in
the auditory pathway and therefore disrupts temporal aspects of auditory perception [5]. As
alluded to earlier, the most common type of hearing loss, sensorineural hearing loss, is caused
by damage to the outer hair cells. In contrast, AN can result from damage to the inner hair
cells, the synapse between the inner hair cells and the auditory nerve, and/or the auditory
nerve or brainstem pathways. Individuals aﬄicted with AN typically exhibit the following
characteristics: elevated thresholds on pure-tone audiogram by air and bone conduction, poor
understanding of speech in noisy surroundings, absence of the acoustic reflex response in any
configuration for any stimuli, and no auditory brainstem response (ABR) to acoustic stimuli
above their detection threshold [4].
Few options exist for rehabilitating patients suffering from AN. Conflicting results are re-
ported over the use of conventional hearing aids for people with AN. Although some studies
have shown that 50% of affected children benefit from conventional amplification hearing aids,
others have shown detrimental effects, including loss of otoacoustic emissions (OAEs) (some
without any change in pure-tone sensitivity) and permanent threshold shifts [6].
Cochlear implants include surgically implanted array of electrodes into the cochlea, which
directly stimulate the auditory nerve. As such, this device circumvents improper functioning
of the inner hair cells or synapses. While this technology has benefited a number of hearing
4 C 1. I
impaired people, it does have its own set of drawbacks including high cost, standard surgical
risks, and potential loss of residual hearing due to surgical implantation. Moreover, evidence
is mixed on the effectiveness of cochlear implants as a rehabilitation option for AN patients.
Some studies have shown considerable speech perception improvement, while others did not
[7].
Evidence does exist that alternative signal processing algorithms can be effective in im-
proving speech perception by AN patients. For example, temporal envelope enhancement al-
gorithms, which exaggerate the temporal peaks and valleys of a speech signal, have shown
promising results in AN patients [5, 8, 9]. However, these algorithms are not currently imple-
mented in hearing aids, nor are they tested for their real-time (RT) performance.
The algorithms developed for this thesis were focused on this particular group, as current
treatment methods are limited to conventional amplification or cochlear implantation. In either
case the exact effect of treatment is not well understood. Development of custom process-
ing chain for the condition can therefore greatly benefit patient that suffer from it, either by
modeling the disease’s effect on speech and correcting for it or providing further insight for
clinicians.
In summary, hearing loss aﬄicts a significant proportion of the human population. To im-
prove communication for these affected people, much research and effort has been put toward
assistive listening devices. These devices range from ear and body-worn hearing aids to bone
anchored hearing aids and cochlear implants, and have achieved significant results for many
different types and severities of hearing disorders. However, the algorithms required to assist
patients with certain disorders (such as AN) are not available in commercial hearing aids, and
in other cases such algorithms have requirements that are too high for the computational and
memory abilities of current hearing aids. This necessitates the development of new assistive
hearing solutions that cater to the needs of impacted persons.
1.3. P H A S 5
1.3 Portable Hearing Aid Systems
Digital signal processing (DSP) finds application in many devices such as: telephones, recorders,
equalizers, hearing aids, etc. Digital signal processors as well as general purpose processors
(GPP) nowadays provide much more processing power than their predecessors. Specialized
hardware solutions are as attractive as ever, and provide functionality as well as superior per-
formance. However, GPPs have also improved in performance, and combined with highly
customizable operating systems (OS) allow for the development of a real-time embedded DSP
framework for audio processing among other things.
Custom hardware solutions provide many advantages such as: high processing speed, min-
imized power consumption, and low costs in mass production. However, every re-design of
a custom-made integrated circuit (IC) or printed circuit board (PCB) is a very cost and time
consuming undertaking. That is, the algorithm to be realized ought to be tested thoroughly
in computer simulations prior to implementing it on a specific custom solution. Additionally
GPP implementations are capable of providing comparable performance with significantly less
effort related to the design and implementation of a complete system. The drawback of GPP
implementations is the trade-off between speed of development and power consumption and
efficiency, as well as limited scalability of the finished product/prototype.
Real-time implementation of audio processing algorithms on GPP systems can provide
hearing aid designers and manufacturers the ability to test new processing techniques in-vivo.
Testing and verification of such techniques will thus require less system design effort and in-
volvement from the algorithm designer, and produce a product that can be easily transferred to
other similar systems/platforms. Generally speaking, testing and verification of digital hearing
aid devices and the embedded software and algorithms requires the production of a proto-
type deice. Such devices are often limited in their ability to control various parameters of the
recording/playback process: sample rate, data bit-depth and latency limits, as well as the pa-
rameters of the embedded DSP algorithms. In this study, the focus is on developing and testing
a real-time capable audio processing platform with minimalistic hardware, allowing for the
6 C 1. I
deployment of highly flexible audio DSP algorithms.
Finalized, such a platform will be similar to body-worn hearing aids. However, thanks to
the advancement in processor and memory technology, such a platform can be designed to
be faster and capable of handling wider design constraints (computational complexity limits,
memory footprint bounds, etc). To further improve the usability of this platform in real-life
situations and facilitate flexibility in the embedded algorithms, the system should provide an
intuitive user interface while maintaining a small form factor. A simplified block diagram
of the required components is given in Figure 1.2. It must be noted that this thesis is solely
concerned with the development, testing and verification of the audio subsystem as well as the
real-time implementation of audio processing algorithms targeted to the AN population. The
rest of the required components are provided by the development kit used in this thesis, details
of which are given in the next section.
Figure 1.2: Generic PDA based algorithm platform
Many acoustical signal processing algorithms can be reliably judged only by critical sub-
jective evaluation by human subjects. Thus, the simulations should be run in real-time, with
realistic input signals. Moreover, it should be possible to alter the algorithm easily in order
to optimize its performance without the need for complex and extensive modifications of the
prototype. Since many electro-acoustic products are wearable and used in very complex sound
1.4. T S  P S 7
fields (e.g. conversation in a street with fluctuating traffic noises), the simulation device should
be at least as non-intrusive and portable as the desired end product.
In order for such a platform to be well received and useful to the hearing-impaired commu-
nity as well as the hearing aid developers, steps must be taken to improve user experience as
well as hardware capability.
1.4 Thesis Scope and Problem Statement
The final goal of this project is to develop a portable, mobile platform that will allow hearing
aid users to test and evaluate various hearing aid algorithms in real-life environments. Such
a platform requires several stages of design, and can be generally split into two streams of
implementation: hardware and software.
The hardware side of the project includes the specification and implementation of the var-
ious interfaces such as audio, video and input devices, as well as power supply system and
connectivity (TCP, WIFI, USB, etc.). The imposed hardware design constraints relevant to the
development of an audio interface for such a platform are:
• 6 mono input channels and 2 mono output channels
• Minimum sampling rate of 22kHz
• At least 16bit sample resolution
• Synchronous capture mode
• Matched input stages to ensure equalized performance
• Low power consumption, including sleep capabilities.
The software side includes the design and implementation of the required interface drivers,
customization and optimization of the OS environment. As previously mentioned the key
8 C 1. I
to reliable performance, as well as usability in real life situations is robust and predictable
real-time operation. To achieve this goal we focus part of this thesis on the selection and
configuration of a reliable real-time engine (RTE).
Once the hardware was developed and the required software implemented and configured
accordingly, several algorithms were implemented on top of this framework as a proof of con-
cept and to test for the common RT metrics: latency, memory consumption and service perfor-
mance.
1.5 Proposed Solution
The goal of this project is to develop a multi-channel audio capture and playback interface.
Due to the nature of most algorithms and audio enchantment techniques used in today’s hear-
ing aids, multiple channels of synchronized audio input must be available. This would allow
for the development of direction based audio algorithms such as beam-forming and source
localization.
The core of the system is the Eurotech XL Catalyst Module (1GB RAM and 1.2GHz Intel
Atom Z530), which provides comparable performance to most hand-held devices available on
the market such as: cellphones, PDAs, etc. The platform implements most of the hardware
interfaces and components required by a portable device.
The audio interface for this study is based on the Intel HDA standard and was designed
around, and a prototype developed using the AD1989 HDA codec from Conexant. Using
the AD1989 codec allowed us to have up to three stereo, simultaneously sampled recording
channels, and 1 stereo playback channel. The hardware is housed on the Eurotech development
kit[10].
For the purposes of this thesis, the focus was on the audio interface and the rest of the
hardware interfaces are left as provided by the development kit. For a fully portable system
these still need to be developed, optimized and integrated into a final portable solution.
1.6. T O 9
The software layer introduces the Fedora Linux operating system as a RT capable OS with
custom developed device drivers. Reliable RT operation is provided by the Jack Audio Con-
nection Kit (JACK)[11]. The modified Eurotech hardware is configured to allow reliable and
deterministic RT performance. Additionally, the OS kernel is modified to minimize unneces-
sary modules and drivers and enable full pre-emption (in stock kernels the model is voluntary
preemption, which introduces non-deterministic jitter between frames). JACK serves as the
RT engine for audio processing, and it also provides a powerful abstract interface to the audio
hardware.
This framework was then used for the development and validation of audio DSP algorithms,
limited in our case to: envelope enhancement, Ephraim and Malah’s logMMSE noise suppres-
sion and a support-vector machine based (SVM) voice-activity detector (VAD). The algorithms
were implemented as JACK clients as is described further in this thesis. Computational com-
plexity of audio DSP algorithms was addressed by using Intel’s performance primitives library
(IPP) (see Appendix C).
1.6 Thesis Organization
• Chapter 2 contains the literature review and background information. Description is
given of currently available portable DSP systems and in particular the Master hearing
aid (MHA) development platform. Details of the hardware and software used in the
development of the prototype platform are presented. Presented is a review of the prin-
ciples behind noise reduction and envelope enhancement algorithms as well as their RT
implementations.
• Chapter 3 contains details of the design and implementation of a custom audio interface.
Outlined are the objectives used in the design the audio interface. Interfacing options
for audio data streaming with the development kit are explored and finally, details of the
custom interface device driver developed under ALSA are presented.
10 C 1. I
• Chapter 4 explains the use and configuration of the RT framework as well as the RT
implementation methodology used in deploying the algorithms given in chapter 2.
• Chapter 5 demonstrates the performance characteristics of the developed custom audio
interface, RT framework and deployed algorithms. Parameters of the developed audio
interface are validated against numerical values discovered in literature. Algorithm per-
formance is validated using a pilot subjective study with normal hearing subjects.
• Chapter 6 contains the conclusions drawn from the work on this thesis. Results of study
are stated, areas for future development are suggested and the major contributions of the
thesis are outlined.
Chapter 2
Literature Review and Background
Information
2.1 Introduction
The goal of this chapter is to review the literature surrounding the key areas within the realm of
this thesis, and to present background information required for system design and implemen-
tation. Current state-of-the-art in portable DSP systems targeted for assistive hearing device
application are reviewed first. A description of the development platform chosen for this project
and the issues surrounding it are presented. This is followed by a technical description of the
two algorithms intended for real-time implementation and use by AN patients.
2.2 Portable DSP systems
Portable hearing aid simulators and general purpose DSP platforms are an active area of re-
search, as the benefits that can be gained from such systems are great, especially for people
with certain types of hearing loss as well as for people developing, validating and evaluating
hearing aid algorithms and their respective configurations configurations.
A systematic review of the literature shows that the existing solutions for portable hearing
11
12 C 2. L R  B I
systems are either PC-based DSP platforms capable of high audio data throughput or PDA
or cellphone based DSP platforms that are more portable but harder to program and config-
ure. Moreover the smaller form factor device implementations (such as cellphones, PDA and
tablets) suffer from the lack of a floating point processor on board, which creates program-
ming challenges. Lastly, there are the so called “full-custom” implementations that are built
to function as stand-alone hearing aid simulators, however in most cases such devices do not
provide for user interactivity, do not offer platform and algorithm flexibility (flexibility here
refers to the ability to modify the signal flow-path and algorithm parameters during the use of
the device) and/or require a large time and resource investment in their development. Some of
the more recent developments in the field of portable hearing assistive devices are summarized
below:
• A RT platform for general audio DSP algorithm evaluation was described by Mago-
tral and Stetzler in [12, 13]. The platform was capable of processing two input speech
channels at a sampling rate of 32 kHz and driving a stereo headphone output. It provided
algorithms for frequency shaping, noise suppression, multi-band amplitude compression,
and frequency dependent interaural time delay algorithms.
• A high performance system based on the signal processor Motorola DSP56309 was de-
scribed by Rass et. al. in [14]. The device included high quality external stereo ADCs
and DACs with 20 bit word length each. The reported dynamic range exceeded 88 dB.
User interactivity is limited to adjusting the input/output gains using digitally controlled
potentiometers. The authors developed a software fitting suite that was accessed from
an external PC, and used to adjust the parameters of the underlying algorithms. A set of
software tools and configurable assembler code modules implemented all hardware de-
pendent software routines. The authors demonstrated that such a device can be a helpful
means for development and field evaluations of advanced new hearing aid algorithms.
A few years after the original publication the authors came forward with a more so-
phisticated system based on the newer fixed-point DSP Motorola DSP56002, which was
2.2. P DSP  13
embedded in a complete stand-alone system[15].
• A general purpose personal computer (PC) was employed by Kruger et.al. in [16] as
a DSP testing platform. The goal of the study was to separate hardware and algorithm
related programming issues in order to liberate the designer from problems which are
not related to the development of the algorithm. The authors employed a commercial
PC (Pentium IV, 1800 MHz) to implement an acoustic echo control unit with two input
and two output channels. The system was capable of a sampling rates ranging from 8 to
32kHz.
• Grimm et.al. in [17, 18] developed a more portable, user friendly and flexible platform
as compared to [15]. The authors employed a netbook computer with a dedicated audio
interface in combination with a mobile phone as the development platform (see Figure
2.2). They reported a dynamic range of over 90dB for two and six input channel setups. It
was shown that the prototype system can integrate hearing aid functionality, telephony,
public announcement systems, and home entertainment. An example binaural speech
enhancement scheme that represents a large class of possible personal hearing system
(PHS) processing schemes was shown to be compatible with the general concept. Algo-
rithm deployment was done using the Master Hearing Aid framework that is discussed
in detail below.
• The authors in [19] presented a wavelet-based speech coding strategy for cochlear im-
plants, with an RT implementation on a personal digital assistant (PDA). The system
only allowed for a single channel cochlear implant and the corresponding microphone.
The authors demonstrated that the proposed strategy achieves higher analysis rates than
the existing strategies while being able to run in real-time on a PDA platform. Using
a PDA as host for the user interface allowed users to easily manipulate the parameters
involved in the processing chain and study their effects. The authors reported processing
using 11.2ms frames at 22kHz of 7-12ms on the 624MHz fixed-point PDA processor.
14 C 2. L R  B I
Programming on a fixed-point DSP platform can be more challenging as compared to
floating-point platforms. Additionally the user interface developed for the PDA was pro-
grammed using hybrid LabVIEW, in order to achieve user interactivity and RT display.
2.3 Master Hearing Aid (MHA)
The Ho¨rTech Master Hearing Aid is a software development platform for signal processing
algorithms and a software solution of a hearing aid prototype. MHA has a layered structure as
shown in Figure 2.1
Figure 2.1: Layered structure of the master hearing aid[17]
• The audio backbone is external to MHA. It is an abstraction for a source and a sink for
audio data. The MHA can use different audio backbones by selecting the corresponding
MHA IO library. Usable audio backbones are: file system, sound card drivers (Windows
MME, Windows ASIO, Linux ALSA, Linux OSS), JACK low latency sound server and
router, and network audio streams.
• The MHA framework selects the MHA I/O library and passes the fragmented signal to
the MHA plugins. The frameworks are platform dependent programs or libraries.
2.3. M H A (MHA) 15
• An MHA plugin generally performs the signal processing, but plugins for signal flow
management also exist. Each plugin forms one processing block; algorithms can consist
of one or more plugins. MHA plugins communicate with the framework and with other
plugins through a simple ANSI-C interface. Plugins can work on waveform data (time
domain) or short-time Fourier transformed data (frequency domain). Plugins can im-
plement domain translations, e.g. FFT (fast Fourier transform) and iFFT (inverse FFT)
plugins exist.
The MHA software development toolbox contains a number of libraries that facilitate de-
velopment of RT algorithms, provide access to basic signal analysis and modification proce-
dures, and allow access to the configuration space (e.g. GUI configuration interface, MATLAB
access). [17]
To summarize, the MHA platform offers a low-cost alternative to implementing hearing aid
algorithms directly on hearing aids by replacing the digital hearing aid with a standard off-the-
shelf PC. This allows for higher computational complexity algorithms to be deployed without
having to limit their performance based on the overall power consumption and available com-
puting capabilities. Moreover it allows for easy algorithm configuration without the need for
hardware modification. The MHA framework has been previously deployed as a portable hear-
ing aid simulator by Grimm et. al. [18] as shown in Figure 2.2. The authors utilized an Asus
Eee PC and a custom audio interface to facilitate portability and flexibility. The dedicated USB
audio interface was developed to fulfill the requirements of the authors’ PHS prototype. The
audio interface has been developed in two variants: a device with four inputs and two outputs to
drive two audio headsets with two microphones in each headset (USBSC4/2), and a device with
six inputs and two outputs, for two audio headsets with three microphones each (USBSC6/2).
While the MHA offers significant potential and is actively developed and used, during the
pilot stages of this project several issues were identified with the MHA:
• There is no reliable and detailed documentation on the configuration and troubleshooting
of the MHA framework.
16 C 2. L R  B I
Figure 2.2: PHS prototype based on the Asus Eee PC[18]
• The initialization configuration scripts that ship with versions 4.4.27 and prior, are not
executable as is, and require modifications and debugging, making the use of examples
and algorithm chains harder until the user is comfortable with the structure of those files.
• It is not truly portable, as it still requires large hardware. Changes to algorithm param-
eters by a user during field operation are not easy to make, and require modification of
the processing chain configuration files.
• It requires MATLAB to display a control panel GUI; without it configuration and control
of the signal chain can be difficult. However the MHA host does not have to be on the
same platform or even same computer as the MATLAB GUI.
2.4 Eurotech Catalyst
2.4.1 Description of the CPU Module and Development Kit
The work presented in this thesis is based on the Eurotech Catalyst Module. It is an embed-
ded module based on the Intel Atom Platform. In particular the Catalyst Module houses an
2.4. E C 17
Intel Atom Z530 along with 1GB of DDR2 RAM, as well as the system control hub (SCH
US15WP). The SCH integrates graphics, memory, and I/O into a small package. Despite the
small form-factor of the Catalyst module (see Figure 2.3) it offers comprehensive I/O and mul-
timedia capabilities including high definition graphics and audio while maximizing the MIPS
to mW ratio and consuming less than 5W of power in high performance mode [10].
The Catalyst module is designed to be an encapsulated embedded module facilitating rapid
prototyping of embedded systems without the need for many additional external hardware com-
ponents. The carrier motherboard is the key component of the Eurotech Catalyst development
kit, and it serves the purpose of extending the capabilities of the Catalyst module, by providing
an ATX style power supply and hardware interfaces for the various data buses and I/O modules
present on the Catalyst module itself. These include the video and audio interfaces, USB host
and client interfaces, memory card readers, networking, etc. Additionally, the kit includes a
touch screen display and controller (see Figure 2.4).
The Catalyst offers further flexibility by supporting several operating system options as
well as some real-time operating systems (RTOS).[10]
Figure 2.3: Catalyst Module[10] Figure 2.4: Catalyst Development Kit[10]
18 C 2. L R  B I
2.4.2 Audio Interface Alternatives
Based on the available interfaces on the Catalyst Module[10], several design alternatives are
possible. Nonetheless they all require the same building blocks: ADCs, DACs, data bus and bus
controller as well as external components. Several high-fidelity ADCs and DACs are available
on the market and can be deployed on the target platform utilizing one of the following bus
implementations and controllers:
• PCI: PCI based audio interfaces can either be embedded on the motherboard of the device
or implemented as a slot-in card. Maximum theoretical bandwidth for the PCIexpress
x1 available on the Catalyst module is 300MB/s, with 4 of the 5 available PCI slots
multiplexed through a PCIe switch. Signaling levels on the PCI bus can be either 3.3
or 5V with an average minimum power consumption of 0.3W to 0.5W per PHY layer.
Additional power consumption must be allocated for the ADC and DAC section, as well
as the PCI controller on the Catalyst Module. Data is delivered serially over one lane, due
to hardware constraints on the Catalyst Module[20], which include, but are not limited
to, a single lane of PCI data available on the Eurotech Development Kit [10].
• USB: the USB bus is widely used for various peripherals, among which are sound cards
(audio interfaces). The Catalyst module offers five USB 2.0 ports on a single USB host,
with a maximum bandwidth of 57MB/s at high-speed mode of operation. Implementa-
tion of a USB audio interface will require a bus client such as the TUSB3200 (Intel 8052
based) and an audio codec such as the PCM3794, both from Texas Instruments. The two
devices can be connected through an I2S bus (Inter-IC sound interface), and the required
control signals have to be implemented using the I2C bus (Inter-IC communication in-
terface). For the given example components, power consumption of the codec is 50mW,
and of the USB streaming controller is 200mW, additional supply power will be required
for external components. Data is delivered through the USB host controller serially to
the DMA engine for CPU processing.
2.4. E C 19
• Intel HDA: Intel High-Definition Audio (Azalia or Intel HDA) is a task specific bus
available on most modern chipsets, and allows for a maximum sample rate of 192kHz, at
a maximum of 32bits/sample. Maximum playback bandwidth is 6MB/s and maximum
capture bandwidth is 3MB/s, both independent of the command interface which uses the
same bus. The distinguishing feature of Intel HDA is the codec, which incorporates the
bus client, ADC and DAC stages, as well as mixers and multiplexers all on the same
IC. Both data and control flow on the same bus line, thus lowering the number of wired
connections required to implement a solution. Power consumption of a typical HDA
codec is dependent on the number of active ADC and DAC pairs and typically ranges
between 150mW to 300mW. Data is delivered serially on the HDA bus and directed to
dedicated DMA engines housed within the south-bridge of the chipset. Unlike USB,
the dedicated nature of the DMA engines guarantees their availability when needed, and
simplifies the device driver development [21].
2.4.3 Audio Interface
As described above, the Catalyst module provides various bus interfaces, many of which can be
used to develop the audio system of the platform. These include PCI, I2S , USB and Intel HDA.
There is an audio system already present on the development kit, built on top of the Intel HDA
interface, and uses an 92HD45 IDT codec. Unfortunately this interface has limited record-
ing capabilities and as such is unsuited for our needs. Therefore, the first step in this thesis
project was to develop a custom audio system, capable of multi-channel synchronous record-
ing. Our system also utilized the HDA bus, however a more advanced codec from Conexant,
the AD1989 was used. The original development of this codec was done by Analog Devices
(under the name AD1988a/b), until the production line was acquired by Conexant.
20 C 2. L R  B I
2.5 Intel High Definition Audio - Azalia
Intel HDA is a high-fidelity audio bus developed for the consumer market. A detailed de-
scription and guidelines for the development of Intel HDA based systems is provided in [22].
Following is a brief summary of the structure of the Intel HD audio link and related compo-
nents, as well as data transmission specifics using the HDA bus.
2.5.1 Introduction
There are three key components to any Intel HDA solution: the Intel HD Audio Controller,
an Intel HD Audio Link, and one or more Intel HD Audio Codecs, as shown in Figure 2.5.
The controller is responsible for communicating audio and control data from the codec to the
CPU and memory through a direct memory access (DMA) engine. The bi-directional link
is the hardware bus along which audio data and commands are exchanged. The codec sends
analog data to physical transducers (i.e. speakers, headphones) and receives analog data from
an attached audio device such as a microphone or a line in connection. It is also responsible
for digital to analog and analog to digital conversion of the audio data which is communicated
along the HDA link.
Figure 2.5: Intel HD Audio Components[22]
2.5. I H D A - A 21
Figure 2.6: Intel HDA controller DMA engines[22]
2.5.2 Controller and Bus/Link Description
On system start-up (as well as during forced initialization conditions), the HDA controller is
responsible for the discovery and enumeration of all the available codecs on the bus/link. It is
responsible for the relay of instructions and data to and from each codec, and it contains the
Direct Memory Access (DMA) engines which stream audio data to and from the codec (see
Figure 2.6). DMA engines are an efficient method of communicating data between peripheral
devices and the system’s memory without having to go through the CPU and usually DMA
transfers are faster than non-DMA transfers.
The link itself, which connects a controller with one or more codecs, is a multi-wire electri-
cal interface consisting of the BITCLK, SYNCFRAME, and RESET signals. It also contains
one to four SDO signals, denoted S DO0 − S DO3 respectively, which are used to transmit play-
back data from memory to the codecs. The SDO lines can be multi-point, i.e. a single SDO line
can be attached to multiple codecs. The HDA link contains one to fifteen SDI signals, denoted
S DI0−S DIN , which transfer data from the codecs to memory, through the input DMA engines.
Only one SDI line can be associated with a given codec at a time, thus limiting the maximum
number of codecs associated to a controller to fifteen.
Control signal transmission uses the SDO line to deliver a command sequence and the
22 C 2. L R  B I
Figure 2.7: Intel HDA bus signals[22]
SDI line to recover the command response if one is available (command responses for some
verbs are not necessary or simply not implemented). Audio data transmission on the SDI lines
happens on the raising edge of the clock, unlike the SDO lines, where the data packets are
double pumped, i.e. on both raising and falling edges of the clock, thus doubling the effective
available bandwidth for playback modes as compared to capture modes using the same audio
stream setup. This process is illustrated in Figure 2.7.
2.5.3 Codecs, Nodes and Widgets
An Intel HDA Codec is modular and its design contains a hierarchy of standardized modules.
The organization and availability of the internal modules varies between codecs. The codec
architecture includes a discovery and addressing scheme that allows for a single driver to easily
support a wide variety of codecs, this however only applies a default configuration which is
inferred from the current structure of the tree hierarchy of codec nodes and may not match the
design intent.
A node is either a single module within a codec or it is a collection of a module and all its
children modules that are connected below it in the hierarchy, as shown in Figure 2.8. Each
node has a unique address, known as a node ID (NID). An NID is usually 7 bits and has a
set of read-only capabilities, and each can be controlled and configured using command verbs
2.5. I H D A - A 23
Figure 2.8: Intel HDA Codec node/widget hierarchy[22]
targeted at that node.
The root node (NID 0x00) is the node at the top of the hierarchy and serves as a pointer to
the function groups contained in the codec. Even though the HDA specification does not limit
the set of available function groups, currently there are only two supported :modem function
group (MFG) or an audio function group (AFG). This thesis is solely concerned with the AFG,
as it is the only available function group on the considered HDA codecs.
Within each function group node is a collection of modules known as widget nodes or wid-
gets. Widgets can be interconnected different ways, as defined by the codec designer/manufacturer.
This allows support of an arbitrary number of audio input and output channels. Widgets are
either 1-channel (mono) or 2-channel (stereo). An AFG as a whole can support greater than
2-channel sound by using multiple widgets.
Any widget can be connected to more than one other widget. These hard-wired connections
are pre-defined when the codec is designed and are stored in the widget’s connection list. Upon
initialization the driver is responsible for interrogating each widget about its connection list.
Selection of the required connection is achieved by a command verb targeted at the NID of the
widget that contains the list(behaviour inherited from digital multiplexers).
The types of standardized audio widgets are:
• Audio Output Converter (DAC) Widget
24 C 2. L R  B I
• Audio Input Converter (ADC) Widget
• Pin Widget
• Mixer Widget
• Selector Widget
• Power Widget
• Volume Knob Widget
• Beep Generator Widget
2.6 ALSA - Advanced Linux Sound Architecture
Advanced Linux Sound Architecture (ALSA) is a free and open source software framework
providing an API for device drivers for sound cards. It provides support for multiple sound card
architectures, and is capable of low-latency performance given the right platform configuration
and capabilities. Additionally ALSA is capable of automatically configuring most common
commercial sound hardware as well as some professional systems. More importantly at the
core of its design lies a well tested error handling mechanism that aids during development of
modules/drivers that are to operate under the framework.
Development of a custom driver for the chosen HDA codec was done within the ALSA API
using the already developed HDA generic driver as a template. This included the configuration
of the codec, power control of the codec and instantiation the required audio controls: mixers,
volume and mute controls.
2.7 Real-Time Engine
Implementing a portable DSP system capable of processing the signals originating withing the
users’ environment and providing near instantaneous results to the user, requires a low-latency
RT engine. The requirement of low-latency ensures that normal operation of the device will
2.7. R-T E 25
not impose large amount of lag between an audio event (such as speech production, environ-
mental sounds, etc) and the users’ perception of the event. Low-latency, as applied to DSP
algorithm modification, should limit the quantity and amplitude of artifacts introduced by the
modification.
For this purpose, Linux is chosen as the real-time operating system. This choice is based
on the ease of customization of the Linux OS, as well as wide availability of know-how on
the topic of DSP in Linux. Additionally optimization of the Linux distribution was required to
allow for real-time performance.
2.7.1 JACK Audio Connection Kit
JACK is an RT audio routing engine capable of high performance, high fidelity operation.
Originally developed for the Linux OS, its authors have developed ports for Mac OS as well
as Windows. JACK comes as an intermediary layer between the hardware driver stack ALSA,
and the DSP algorithm framework developed within this thesis. Interactions between ALSA
and JACK are done in kernel space, thus allowing for efficient memory operations (see Figure
2.9).
The structure of a JACK application and the JACK RT engine allows audio clients to be
connected in an arbitrary length processing chain. The implication is that each DSP algorithm
implemented on the framework can be broken down into sub-processes which can then be
implemented as separate JACK clients, debugged and tested separately.
2.7.2 MHA and JACK
MHA and JACK are not mutually exclusive processing frameworks as noted above. Moreover,
JACK clients can be developed to complement the function of MHA.
• JACK offers better documentation and fewer bugs due to the larger user base and debug-
ger interest.
26 C 2. L R  B I
Figure 2.9: JACK, ALSA and Kernel Interaction Diagram[23]
• Similar plug-in based algorithm implementation is achievable as when using MHA.
• Chaining several algorithm blocks using MHA can be harder, and may require collecting
all the needed blocks into a single plug-in
• Many of the complex algorithm blocks available in MHA are not available in JACK,
and if they are, their performance cannot be compared to the available MHA algorithms
without extensive testing on the target platform.
• As MHA can require JACK for certain tasks, implementing blocks directly in JACK
can simplify the process by eliminating one level of the abstract interface, thus possibly
improving RT performance in low-power devices.
2.8 Considerations for Real-Time Algorithm Design
Implementing DSP algorithms in an RT environment requires special considerations and care
must be taken in order to ensure reliable, artifact free performance. Metrics and evaluation
techniques can be found in [24]. These include: latency, memory footprint and CPU utilization.
2.8. C  R-T A D 27
2.8.1 Interrupt and processing Latency
Latency, in terms of data acquisition and data propagation is dependent on the OS and driver
layer of the system used as an RT host. The combined interrupt latency is given by:
τIL = τOS + τH (2.1)
Where τH is the hardware dependent time contributing to interrupt latency which depends
on the interrupt controller on the board as well as the type of the interrupt. τOS is the OS
induced overhead in processing the interrupt. This quantity has a best case and a worst case
scenario worth considering. The best case situation is when only minimal overhead is added
before the interrupt service routine (ISR) starts. The worst case scenario occurs when the kernel
has disabled interrupts to protect critical sections. In this case τOS is the sum of best case and
the longest interrupt lockout time in the kernel. It should be noted here that this latency applies
to both audio data recording and playback.
Additionally, processing latency comes into play once the acquired audio data has been
delivered to the host memory. Any processing performed on the host computer will introduce
filter and buffering delays. This is also where execution time bottlenecks occur due to the high
computational complexity of the section.
The term sampling latency is used to refer to the combination of data acquisition and data
collection faced when using an RT engine, such as JACK, to gather data. This metric is based on
the size of the buffers used to store incoming and outgoing data, the sample rate and the number
of buffers requested by the JACK client when its processing callback is evoked. Specifically
for JACK this metric can be calculated using (2.2):
τS L =
number o f bu f f ers ∗ bu f f er size
sample rate
(2.2)
This gives the absolute minimum latency that can be encountered when using an audio DSP
algorithm in real-time within the JACK framework. An accurate value for the total latency
28 C 2. L R  B I
delay is given by the addition of the result of (2.2) to the result of (2.1), which is dependent on
the OS configuration and capabilities of the chosen audio interface.
2.8.2 Frame Based DSP
It is obvious that in the RT application of a processing algorithm it will be impossible to access
to all of the audio data including past and present. It is highly unlikely to have close to un-
limited memory in which to store more than a few seconds of previously acquired input data.
Additionally, an algorithm that is to be realized in an RT setting must be causal, obviously
due to the systems inability to predict or acquire any future samples. Moreover, facilitating
audio DSP processing on a limited memory system requires that the algorithm is capable of
“forgetting” older samples, replacing them with the incoming data. Therefore, a frame-based
implementation for the required algorithms must be utilized, i.e. dependence on old samples
should be limited to a specified time window, and long-term information should be stored in
terms of a statistical estimate. For example, noise floor level or harmonic component ampli-
tudes can be remembered in short-time implementations, and in the cases where long-time
information is needed they can be stored in terms of statistical averages, or as in the case of
noise as a mathematical model, incorporating the noise statistics that are changing over time.
2.8.3 Execution Time and Memory Limitations
As Linux is developed to operate on processors equipped with memory management unit
(MMU), this limits the possible choices for a hardware development platform. However JACK
and the rest of the RT framework developed herein can also be run on µCLinux, which is de-
veloped to operate without a MMU [25]. The purpose of a MMU is to provide separation
between running “un-trusted” (user-space) for “trusted“ critical code(kernel space), preventing
user-space memory access errors to result in a kernel crash.
As is the case with any RT and/or embedded system, memory and CPU constraints must
be taken into consideration. Depending on the nature of the algorithms or processes that are
2.9. R-T C  H S 29
to be deployed on such a system these constrains vary. The upper bound is always given by
the total system hardware capability. However that can rarely be fully utilized by the audio
DSP processes alone, as any such system requires various miscellaneous tasks that take care of
the system itself. These can include hard-disk/memory monitors and services, networking and
anything else that the designer may deem necessary for the normal system operation.
Memory limitations in DSP arise mostly in situations where the algorithm requires large
amounts of memory for its regular operations, such as long-term data storage or large vector
and matrix operations. This, in the case of audio DSP, is directly related to the length of the
filters used as well as the number of filtering/processing stages that the signal must undergo
from the input to the process to the eventual output of processed data. Additional memory is
often required by the OS to facilitate data transactions between the device driver and user-space
process. In the case of ALSA, all incoming data buffers are located in kernel-space, in order to
protect their contents and read/write rights, and it is the device drivers’ job to copy the buffers
to user-space where applications can access and manipulate the data contained within[26].
2.9 Real-Time Considerations and Hardware Summary
Above a list of important issues regarding the RT framework and supporting hardware were
given. Theses areas must be addressed in the development of the custom RT framework. Im-
portance of CPU utilization and memory management were outlined as they are related to the
operation of the RT engine. Description of the Linux OS and the abstract audio interface layer
provided by ALSA and JACK was given. Additionally, the choice of the Intel HDA bus was
justified for use in the Eurotech development kit. The next sections focus on algorithms target-
ing people with AN, given are details of their theory and some implementations found in the
literature.
30 C 2. L R  B I
2.10 Envelope Enhancement
Envelope enhancement (EE) has shown that it can improve word intelligibility scores in pa-
tients with AN[27]. The goal of this part of the thesis is to validate the results given by[27],
and to develop the EE algorithm within the RT framework discussed herein[28, 8, 5]. These
studies have shown an increase in word identification scores when the envelope of the speech
was enhanced. A series of studies utilize EE as a speech enhancement method for speech
corrupted by noise[29, 30].
2.10.1 Principles
For either speech enhancement, or to improve ineligibility in patients with AN, EE implements
several common steps. First the input signal is filtered at a frequency lower than the Nyquist
rate of the signal. Then it is split into frequency bands, using one of the following two ap-
proaches:
• Uniform bands: The input signal is split into bands of uniform width (in terms of Hz).
This is the approach utilized in [29, 30], and it takes advantage of the fact that for uniform
bands, finding the Hilbert envelope is equivalent to finding the modulus of the short-time
Fourier transform using a window that spans the range of interest.
• Octave bands: The input signal is split into bands spanning 3rd octave ranges. A given
frequency range is divided into octaves where the upper frequency of each band is twice
the lower frequency of the same band. Third octaves split the octave range further into
three by having the upper frequency limit being 21/3 (1.26) times the lower frequency.
This approach is utilized in [27, 28].
Once the input signal has been split into the corresponding bands the envelopes of the
resulting waveforms are extracted. There are two common methods for envelope detection:
Hilbert Envelope Detector and low-pass full-wave rectification, which when applied to an of-
fline application yield similar results.
2.10. E E 31
Hilbert Envelope Detection
A common and efficient technique used for envelope detection is based on the Hilbert Trans-
form. It involves computing an analytic signal with the real part taken as the original input
signal and the imaginary part is a 90 degree phase shifted version of the input. The required
envelope can then be estimated by taking the magnitude of the resulting complex vector. To
obtain a smooth envelope of the input signal a low pass FIR or IIR filter can be introduced
following the Hilbert transform.
Using the description given in [31] the Hilbert transform of any function f (x) is given by:
F (t) =
1
pi
∫ ∞
−∞
f (x)
t − xdx (2.3)
The above integral can be evaluated using the Cauchy principle value theorem and it can
be written as the following convolution:
F (t) =
1
pit
? f (x) = F −1 {F { f (x)} [−i ∗ sgn (ω)]} (2.4)
where using the convolution property of the Fourier transform, the convolution is converted to
a multiplication, and sgn(x) is the sign function.
Having defined the Hilbert transform for an arbitrary real function f (x), construction of the
analytical signal is given by:
Y (t) = y (t) + jh (t) = A (t) e jω(t) (2.5)
where y(t) and Y(t) are the input signal and analytic signal respectively and h(t) is the Hilbert
transform of y(t) calculated using equation (2.4). Y(t) can then be converted to polar form,
where A(t) is the envelope of the input signal and ω(t) is the phase of the analytic signal, which
can be discarded. In [29, 30] the authors use this approach to calculate the envelopes in their
RT implementation of EE.
32 C 2. L R  B I
Full-Wave Rectification
Full-wave rectification is another approach to calculate an envelope of a given input signal.
Depending on the length and type of filters used it often is more computationally intensive
than the Hilbert envelope detector. The Hilbert transofrm method requires one trivial fitering
operation and then a summation accross the input frame to compute the envelope, whereas
full-wave rectification requires taking the absoulte value followed by two filtering stages to
compensate for the filter lag. It is used by [28, 5, 8] for the envelope calculation in an non-
RT implementation of EE. The input signal is split into bands just like above, after which the
absolute value of the signal in each band is passed through a low-pass filter (LPF) with a cutoff
frequency in the range 5-25Hz(corresponding to syllabic rate in human speech). Narne et al.[5]
use a filter with a cutoff at 32Hz, which is the value used in [27] as well. It must be noted that
a review of the literature uncovered that both IIR and FIR filters have been used to implement
full-wave rectification, however for the purpose of EE the phase response, or in particular the
group delay though the filter should be constant, thus facilitating proper reconstruction of the
enhanced signal from the band data.
Enhancing the Envelope
Calculating the enhanced envelope involves multiplying each band signal by a corresponding
band gain vector, derived from the original envelope within each band. The gain equation
relates the original and resulting envelope thorough a non-linear function. Several approaches
for calculating the gain vector were encountered in the literature review:
• Power Law: This is the simplest way of non-linearly enhancing a signal envelope and is
given by Clarkson et.al, as per equation (2.6), where k indicates the band of interest, yk
is the expanded speech envelope and Ak is the calculated band envelope. It was utilized
in a real-time implementation of the EE algorithm targeted at speech enhancement in
low-noise conditions [29].
2.10. E E 33
yk (n) = {Ak(n)}ν (2.6)
Equation (2.6) can be implemented in two ways, in the first approach, as it is in[29], yk
is used directly as the information carrier and it is mixed with the noisy phase from the
band prior to reconstruction. And in the second approach a gain vector can be calculated
as the element-by-element ratio between yk and Ak which is then used to multiply the
band signal resulting in an expanded band vector.
• Spectral Threshold: This is an evolution of the previous method proposed by the same
authors in [30]. The expansion method is intended to pass spectral magnitudes greater
than the threshold (α), while smaller values of the original envelope are attenuated. The
threshold can be either fixed or adaptive, based on the spectral variance within the band
of interest. The expansion is given by:
Sˆ (n) =
[Ak(n)/α]ν
1 + [Ak(n)/α]ν
Ak(n) (2.7)
As mentioned above, the threshold α can be fixed, or adaptive. The authors in [30]
have found good results for ν = 2 and α = 3σ where σ is the standard deviation of
the additive noise. The threshold value can be normalized for each band to facilitate
variation of signal power between the bands during speech:
α (k) = α
[
A¯kN
A¯
]
(2.8)
where A¯k is the long-time average of the speech plus noise envelope observed in band
k, A¯ is the long-time average over both time and band number, and N is the number of
bands. Unvoiced speech segments may be lower in amplitude, but are still perceptually
important. They can also be hard to detect in low SNR conditions, and to alleviate that
34 C 2. L R  B I
problem the authors suggest using the spectral variance as defined by equation (2.10) to
supplement equation (2.8) and calculate an adaptive value for α:
α (k, n) = α
[
A¯kN
A¯
] [
C
γ(n)
]
(2.9)
γ (n) =
√√
1
N
N∑
j=1
[
A j(n) − A¯(n)
]2
(2.10)
where C is a normalization constant.
• Exponential Law: This is the method used in [27, 8]. The time-domain envelope in each
band (Ak) is raised to a power K, ranging from a highly compressive value to a highly
expansive value. The calculation of K is given by the following equation:
Ki = e
Amin−Ai
τ (Kmax − Kmin) + Kmin (2.11)
where Kmin = 0.3, Kmax = 4, Amin is the minimum amplitude of the envelope within
ith band, Ai is the instantaneous amplitude value of the envelope within the ith band
and τ serves as a time constant determining the disparity between “low” and “high”
envelopes. A good value for τ is reported in [8] as 0.5, however the author of [27]
discovered that for the case of EE applied to AN a better value for τ is 0.001. The
algorithm is evaluated for two values of τ: 0.001 and 0.0001. K and Emin are calculated
for each band independently.
2.10.2 Real-Time Implementations of Envelope Enhancement
EE has been used in previous studies to perform speech-enhancement in RT situations. Clark-
son et. al[29] in 1989 implemented the algorithm on a Texas Instruments DSP chip: TMSS320-
C25. Due to the limitations of their chosen hardware they were required to use a relatively low
sampling rate of 10 kHz and band extraction was limited to second order FIR filters. Nonethe-
2.11. MMSE N C 35
less the algorithm used 20 uniform bands in the range from 0 to 4 Hz. In each of the bands the
envelope expansion was achieved according to equation (2.6). This approach has considerably
less computational needs than the method used in [28], which is the method of choice for the
implementation in this thesis. The more intensive method is chosen as it has the advantage of
incorporating signal dynamics by remembering the last seen minimum value of the input signal
within the band of interest.
2.11 logMMSE Noise Cancellation
Since the aim is to develop a portable, stand-alone hearing aid testing platform, a safe assump-
tion is that such a platform will be used in real-life environments. This implies the presence of
a variety of noise sources coming from the environment. Limiting the effects of this noise is
key to the proper operation of the embedded algorithms, and therefore some processing will be
required at the input of the system to control the amount of environmental noise that propagates
to the audio DSP algorithm.
2.11.1 Principles
The logMMSE spectral estimator, as described by Ephraim and Malah in [32] was used to
perform noise cancellation for the needs of the framework. This algorithm should be applied
at the input of any processing chain used in real-life situation to ensure minimum interference.
A brief description is given next:
Assuming that the input signal y (t) is the sum of a speech signal x (t) and an uncorrelated
additive noise signal d (t). Introducing two hypotheses: H1 and H0, indicating speech presence
and speech absence respectively:
y (t) = x (t) + d (t) (2.12)
36 C 2. L R  B I
where x (t) =

x (t) if speech is present, H1
0 if speech is absent, H0
Taking the L-point FFT of the input signal (2.12):
Yk (n) = Xk (n) + Dk (n) (2.13)
where k denotes the kth frequency bin(k ∈ [0, L − 1]) and n denotes the nth frame of the
input signal. The value of Xk (n) is, as above, conditioned on the hypotheses H0 and H1 and
the amplitude in the kth frequency bin is given by [33, 34], making use of the assumption that
each Fourier expansion coefficient can be modeled as a separate independent Gaussian random
variable [35]:
H0 speech absent: Yk (n) = Dk (n) (2.14)
H1 speech present: Yk (n) = Dk (n) + Xk (n) (2.15)
Let us express Xk = Ake jαk and Yk = Rke jθk , where Ak and Rk are the spectral amplitude
values in the kth frequency bin of the noise-less speech signal and speech plus noise signal,
respectively. Therefore, the probability density function for a given spectral value in the input,
conditioned on the H0 and H1 hypotheses is given as:
p (Rk|H0) = 1
piλd,k
e
{
− |Yk |2λd,k
}
(2.16) p (Rk|H1) = 1
pi
(
λd,k + λx,k
)e− |Yk |2λd,k+λx,k (2.17)
where λd,k and λx,k represent the variances of the additive noise and the clean speech, re-
spectively. The likelihood ratio is thus defined in [34] as the ratio between equations (2.17) and
(2.16):
Λk =
p (Rk|H1)
p (Rk|H0) =
1
1 + ξk
eνkγk (2.18)
where ξk ≡ λx,kλd,k and γk ≡ Rkλd,k are called the a priori and a posteriori signal to noise ratios
2.11. MMSE N C 37
(SNR), and νk ≡ ξk1+ξk . The a posteriori SNR is easy to obtain, as it only requires an accurate
update of the noise variance, which can be done during noise only frames according to:
λd,k (n) = (1 − β) R2k (n) + βλd,k (n − 1) (2.19)
β should be chosen to match the rate of change in the noise statistics. For the purpose of this
thesis β = 0.98 as per [36]. The a priori SNR is calculated based on the Decision-Directed(DD)
approach described in [35] and is given by
ξˆk (n) = α
Aˆk (n − 1)
λd,k (n − 1) + (α − 1) max
[
γk (n) − 1, 0] (2.20)
Traditionally, the likelihood ratio test is used as the decision statistic of VAD and is derived
from the geometric mean(GM) of the likelihood ratios over all available frequency bins as
follows [37, 33, 34, 36]:
log Λ (n) =
1
L
L−1∑
k=0
log Λk (n) ≷
H1
H0
η (2.21)
η is the threshold for speech detection, a generally accepted value of 0.15 is given by[36].
Thus, online VAD is performed by calculating the signal spectral amplitude estimator, followed
by the estimate of the a priori SNR using the DD approach, calculating the likelihood ratios
for each frequency bin and finally, calculating the GM of the likelihood ratios and comparing
it to the threshold. Frames with values less than the threshold are classified as noise only or
noise-inactive. Frames with values greater than the threshold are classified as voice-active.
The estimate of the original speech signal (Aˆk) is obtained by using the logMMSE spectral
estimator[38] given as:
Aˆk = νke
1
2
∫ ∞
νk
e−t
t dtRk (2.22)
The integral in equation (2.22) is known as the exponential integral of νk and can be effi-
38 C 2. L R  B I
ciently calculated using the method described in [39].
Using the amplitude estimate given by equation(2.22) and the noisy phase θk from Yk (n)
an estimate of the speech-only signal Xˆk (n) can be composed. Performing inverse Fourier
transform on the result recovers an estimate of the original x (n):
xˆ (n) = F −1
{
Aˆk (n) ∠θk
}
(2.23)
2.11.2 Real-Time Implementations of logMMSE
Pathak et.al. [40] introduced an RT implementation of single channel logMMSE speech en-
hancement algorithms on a floating point DSP to evaluate its ability to enhance speech cor-
rupted in a noisy MRI environment with very low SNR. The purpose of their experiments was
to reduce the fatigue of the listener by eliminating the strong MRI noise present in the target
environment. The authors applied the RT implementation to an actual data set collected from
a 3-Tesla MRI machine. They reported limiting the effects of MRI induced noise, and thus
fatigue on the listener[40].
2.12 Auditory Neuropathy Simulator
In order for normal hearing listeners to evaluate the performance of an algorithm chain, de-
veloped to assist patient with AN, a need arises to replicate the effects that AN has on speech.
An AN simulator based on Narne’s work in [5] with the implementation and parameter values
adopted from Morgan’s thesis paper [27] (see Table 2.1 is used to facilitate this need. The
AN effects are simulated by splitting the input signal into 16 3rd octave bands distributed be-
tween 140 Hz and 5.7 kHz. In each band the envelope is calculated using the Hilbert transform
method. The envelope is the passed through an LPF with the cutoff frequency extracted from
the temporal modulation transfer functions of individuals with AN. The modulation depth of
the resulting envelope is then reduced by a modulation factor (m). The output is formed by
2.13. S 39
adding up the original signals in each band after being multiplied by the corresponding modi-
fied and filtered envelope.
2.13 Summary
This chapter described the literature review regarding relevant portable DSP systems. The Mas-
ter Hearing Aid software framework is described and a parallel was drawn to the RT framework
developed within this thesis. Detailed description of the hardware used in the development of
our portable DSP system is given. Additionally a review of modern audio transfer interfaces
used in the PC architecture is provided. Based on that review the HDA audio bus is selected as
the custom audio interface developed for this thesis. To this end a brief introduction into the
architecture of the HDA standard is provided.
The software required to deploy a reliable RT framework in a Linux environment is also
described. ALSA is presented as a driver development API for custom HDA interfaces, and
JACK is given as a common and powerful RT engine. A comparison is drawn between the
MHA framework and the RT framework developed here.
Finally a detailed description of the theory of the logMMSE and Envelope Enhancement
algorithms is given. In the case of EE a list of different expansion techniques is provided, and
the selection of one for implementation in this thesis is justified. RT implementations, found
in literature, of these two algorithms are also discussed.
The next chapter focuses on the design, development and prototyping of the custom audio
interface outlined previously. The work is based on a list of requirements given in the beginning
of next chapter.
Table 2.1: Parameter values used to simulate various AN conditions
Level of AN Modulation reduction factor [m] Cutoff Frequency [Hz]
Mild 0.71 8
Moderate 0.5 6
Severe 0.25 4
Profound 0.125 2
Chapter 3
Design and Implementation of a Custom
Audio Interface
3.1 Introduction
This section outlines the design and the steps taken to develop an audio subsystem that is to be
used as an extension to the Eurotech development platform: Catalyst[10]. The design of this
subsystem (additional audio interface) was based on the Intel High-Definition Audio (HDA)
specification, as described in [21] and [22]. The design and construction of a prototype meeting
the requirements of a portable audio algorithm testing platform was based on the requirements
outlined in the HDA specification along with application notes from Analog Devices [41, 42]
and Conexant[43].
3.2 Design and Implementation
The specific solution implemented was based on the above design constraints and the available
interfaces on the Catalyst module. In all cases availability of synchronous sampling of the
input channels is strictly required.
40
3.2. D  I 41
3.2.1 Bandwidth Requirements
Bandwidth calculation was based on six channels of mono input and two channels of mono
output. Focus is placed on 16 and 32 bit word lengths at sample rates available on most audio
data bus implementations listed in the design alternatives section in the previous chapter. These
bandwidth requirements for common stream formats are summarized in the following table:
Table 3.1: Bandwidth requirements for common stream formats, given in bytes
Word Length[bits]
Sample Rate
8 kHz 32 kHz 44.1 kHz
16 93.75 kB/s 375 kB/s 516.8 kB/s
32 187.5 kB/s 750 kB/s 1.01 MB/s
Note that the available audio interfaces on the Eurotech platform cover the above bandwidth
requirements with reserve.
3.2.2 Power Consumption
Power consumption is strongly related to the hardware used and the following comparison is
based on devices available for the given bus implementation. Additionally, power consumption
relates to the output voltage level used in analog audio transmission, as well as the frequency of
host memory refresh cycles and CPU utilization; however the latter are application dependent
and an estimate would be inappropriate. Refer to Table 3.2 for an estimation of audio interface
power consumption based on the audio bus used.
Based on the estimates shown in Table 3.2 and taking into account the number of different
power supply voltage lines that are required for prototype implementation, the Intel HDA bus
offers the lowest power consumption for the application. Note that a high number of different
Table 3.2: Audio interfaces power consumption
PCI USB Intel HDA
Supply Current[mA] 100 80 60
Supply Voltage[V] 5 or 3.3 5 3.3
Estimated Power[mW] 500 or 330 400 200
42 C 3. D  I   C A I
power supply lines can introduce electronic hum and noise into the signal lines on the prototype
PCB, especially considering the requirement of a small final form-factor.
3.3 Circuit Complexity based on Audio Bus Selection
Any implementation of a custom audio interface would have hardware components that are
common with other implementations. Those include power supply topography and power sup-
ply stabilization, filters for each input and output port as well as microphone reference voltage
generators. The major differences as they are related to the prototype PCB construction are
outlined next:
• PCI PHY devices usually require high number of external components, and peripherals
on the PCI bus have to be multiplexed before being processed. This can cause problems
especially on low-performance systems, or when a high-throughput video controller is
operating on the bus. Moreover, depending on the audio codec selected for the imple-
mentation a data converter IC may be required between the codec and the PCI PHY.
• USB implementations, in most cases, require one codec chip that contains the ADC and
DAC stereo pairs as well as amplifiers and mixers. The system also needs to include a
USB client controller. Depending on the codec selected for the purpose of audio I/O,
data conversion between the format of the PCM packets produced and consumed by
the codec and USB compatible packets can be done using an external IC controller (i.e.
Intel 8051 or 8052 based controllers). Additionally, the inclusion of such a converter
would require an external EEPROM containing the stream formatting and conversion
configuration rules.
• Intel HDA implementations would take advantage of the HDA bus controller present
on the south bridge of the Eurotech SCH. This eliminates the need for a translation
between data packets produced by the codec and the packets transmitted on the audio bus.
3.3. C C   A B S 43
Moreover, the HDA standard is already well suited for high fidelity audio performance
(see section 2.5).
Therefore, based on the above discussion, it can be concluded that on average both USB
and PCI implementations will require at least three separate ICs as well as external biasing
components and power stabilizers. In the case of Intel HDA codecs, the bus controller is
already implemented on the IC and thus such an implementation will only require one IC, along
with the power stabilizers and biasing components. Overall the best form factor is offered by
the Intel HDA option.
Highly configurable and open-source drivers for all three bus types: PCI, USB and Intel
HDA are available under Linux, through the Advanced Linux Sound Architecture (ALSA),
and under Windows, through the Universal Audio Architecture (UAA). ALSA is licensed under
GNU, and thus allows for free of charge development and support. PCI sound card drivers are
usually more specialized, as the hardware required to build such an interface varies greatly for
different implementations and there is no unified specification.
3.3.1 HDA Codec Selection
Most modern consumer end systems contain Intel HDA as the audio backbone of the chipset,
therefore there are a plethora of available codecs that can fit most of the requirements of our
audio interface. However, most of these devices are intended to serve as audio interfaces to
consumer entertainment systems, and as such the focus during their design has been placed
on the availability of playback channels (which relates to the number of available on board
DACs). As was mentioned in the design requirements, the need was to be able to serve 6 input
mono channels (3 stereo pairs), which limits the number of codecs to choose from. Prototype
development was based on the AD1989a from Conexant[43], which was selected based on the
number of available ADCs. The device offers 3 ADCs and 4 DACs stereo pairs which can
easily facilitate three simultaneously sampled stereo pairs and one stereo output, moreover the
device includes a class-D stereo headphone amplifier on one of the ports. The Conexant device
44 C 3. D  I   C A I
is based on an older device from Analog Devices - AD1988, which has identical characteristics,
but is no longer in production.
3.3.2 PCB considerations and supporting hardware: Grounding, Moth-
erboard Interface and Power Supply Stabilization
Circuit board design was based on the reference schematic provided by Analog Devices (for
the AD1988B version of the codec)[42], where modifications were made to ensure a proper
fit of the Conexant codec. Miscellaneous features such as: variable bus voltage support, jack
detection and port re-tasking were disabled both to save printed circuit board (PCB) area, as
well as to conserve power. The prototype was implemented on a dual-layer PCB to minimize
line capacitance, which can de-rate the high speed clock used by the Intel HDA bus (24MHz).
A major concern, thermal power dissipation, is also addressed by the two layer design which
allows for large grounding planes.
As with most mixed-signal ICs careful attention was paid to the proper grounding of both
the analog and digital sections as well as provision for point to point ground current paths,
thus eliminating ground loops which can affect the quality of the recordings and playback. The
analog and digital grounds were joined together with a single PCB line on the bottom layer,
directly underneath the HDA codec as is strongly suggested by IDT in [44]
The mother board interface was built with a 24 pin 0.5 header which interfaces to the J27
header on the Eurotech development kit motherboard[10]. This allowed our PCB access to the
3.3V, 5V and 12V supply lines from the motherboard as well as all of the required Intel HDA
link lines: SDO, SDI, BITCLOCK, FRAMESYNC and RESET.
Power supply stabilization was achieved using a linear LDO voltage regulator: TL1963.
Input and output stage capacitors were provided to aid in the output voltage stabilization, and
consume any voltage spikes on the input power supply line (see Figure 3.1). The output stage
utilized one small capacitor (0.33µF), to filter out high-frequency variations on the output line,
and one big polar capacitor(10µF) which smoothens low-frequency variations on the output
3.4. F PCB  45
line.
Figure 3.1: Power Supply Stabilization
3.3.3 Input/Output Stages
Input and output stages, were based on the IDT reference design [44], and provide 2nd order
band-pass filters to limit external interference that would normally be transferred to the codec.
The input stage was also responsible for providing the required reference voltage for the mi-
crophone inputs.
Figure 3.2: Headphone/Speaker Interface
3.4 Final PCB implementation
The following figures show the final schematic of the prototype housing the AD1989 codec,
along with the required miscellaneous circuitry. The PCB layout diagrams are shown, however
46 C 3. D  I   C A I
Figure 3.3: Microphone Interface
the bottom and top layer ground planes are removed for clarity (see Figures 3.6 and 3.5). The
complete ad1989a prototype PCB is shown in Figure 3.7.
Figure 3.4: Circuit Schematic for the AD1989a codec, including supporting hardware
Figure 3.5: PCB layout bottom cop-
per layer
Figure 3.6: PCB layout bottom cop-
per layer
3.5. D D 47
Figure 3.7: Complete AD1989a prototype
3.5 Driver Design
3.5.1 Intel HDA ALSA driver add-on
The software stack servicing the audio subsystem in use is the ALSA set of drivers provided in
most Linux distributions. Linux was chosen as the software development platform as it allows
for rapid implementation and prototyping due to the free availability of driver source code and
the modularity of drivers for the kernel.
The driver for the above custom codec implementation was implemented as a kernel mod-
ule component that is dynamically loaded into the kernel upon booting of the platform. This
allowed for rapid debugging as each load/unload cycle did not require a full reboot of the ker-
nel, but instead a simple module registration and reinsertion. ALSA already provides an HDA
controller driver as well as a generic HDA codec driver, which was used as a template for our
custom device driver. The HDA codec/controller driver modules already provide methods for:
• Specification of the type of interrupt engine to be utilized (MSI or IRQ)
• Codec probe masks, allowing for multiple codecs on the same bus, note however sam-
pling synchronicity can only be guaranteed within the same codec IC and not across
multiple codecs on the same bus [21])
48 C 3. D  I   C A I
• Initialization of audio signal format parameters: sample rate, bit depth, endianness and
etc.[26]
The custom driver module acted as an extension to the HDA controller driver and was also
responsible for initial configuration of the audio widgets present on the codec, using the com-
mand verb structure used by HDA devices[3]. Additionally the module/driver was responsible
for setting up the required audio streams, channels, possible configurations (sample rates, word
sizes, etc) and audio mixers to control the volume for each I/O channel.
3.5.2 Widget Configuration and Initialization
The detailed widget configuration and interconnection can be seen in Figure 3.8. Port A of the
AD1989a was selected as the output port as it is the only one that provides a headphone am-
plifier on the output stage[41]. The three ports used as microphone/line-in jacks were selected
based on their relative position on the IC package of the AD1989a, in order to simplify the
PCB layout. During initialization only the required widgets were powered up, the rest were
kept powered down to limit power consumption. Details of the configuration and sequence of
operations during initialization can be seen in Appendix B.
3.5.3 Audio Widget and Stream Controls
Using the ALSA API to expose each of the port amplifier controls to the OS, they can be
modified by sending a command verb using the HDA bus to the corresponding widget. This
was done both for input (microphone or line-in) and output (headphone or line-out). All of
the controls corresponding to the same signal path were combined into one named stream,
and then the stream properties were exposed to the OS. ALSA provides the software package
“alsamixer” (see Figure 3.9) to allow the user to modify the controls.
For an input stream, the exposed parameters are given below, along with the corresponding
node IDs:
3.5. D D 49
Figure 3.8: Widget Diagram. Red lines show the active signal paths
• Volume Control: Controlled by the input amplifier of each ADC (NIDs: 0x08, 0x09 and
0x0F), it provides fine control over the gain of the input stage, prior to signal digitizing.
• Boost Volume Control: Coarse volume control located immediately after each port wid-
get (NIDs 0x3C, 0x39 and 0x3A).
50 C 3. D  I   C A I
Figure 3.9: Alsamixer TUI displaying the exposed audio stream controls for the AD1989 codec
• Mute Control: Located on the port widget itself (NIDs 0x14, 0x15 and 0x17).
For an output stream the exposed parameters were:
• Mute Control - Located on the port widget itself (NID 0x11).
• Volume Control - Controlled by the output amplifier of each DAC (NID 0x03), it pro-
vides fine control over the gain of the output stage, after signal digitizing.
3.6 DMA Limitations of the Eurotech Kit
The Eurotech motherboard chipset provides only two DMA engines for the purposes of record-
ing, and therefore only two channels can be operated at the maximum sampling frequency(192kHz)
at the same time. Since the original specification does not call for sampling rates greater than
48kHz, other options have been explored: interleaving of channels on the HDA data bus and
time de-multiplexed in software following the ALSA drivers. Unfortunately these methods
did not yield stable results. Functionality of the available audio inputs on our device has been
tested on the ICH7 on a different platform, which provides four DMA channels.
3.7. S 51
3.7 Summary
This chapter described the implementation of a custom-audio interface based on the Intel HDA
bus specification. The hardware side of the design was outlined and the choices made were
justified based on the imposed requirements. Additionally, discussion the ALSA driver layer
and how it is used to implemented a custom driver module to support our prototype was pre-
sented. This module provides an abstraction layer between the data acquisition system (built
from the hardware implementation and the correspondingly developed driver) and the DSP RT
processing system which will be discussed in the next chapter.
Chapter 4
Real-Time Framework and Algorithms
The complete hardware platform cannot operate within the requirements of this thesis without
a suitable operating system, as well as miscellaneous software tools and libraries. These tools
are responsible for configuring and maintaining all hardware components that are not in the
scope of this thesis. Additionally, some of the software tools and libraries were used to instan-
tiate a development environment to develop the required audio driver, RT framework tools and
algorithms.
4.1 Operating system
There are many commercially available real-time operating systems (RTOS), each with its own
benefits and drawbacks[45]. Our choice of OS was based on the limitations imposed by the
Eurotech development platform (CPU type, memory architecture and controller type) as well
as availability of libraries that were required in the development of audio algorithms. Based
on the above, Linux is chosen as the RTOS for the framework, as it provides one of the largest
knowledge bases for RT development, as well as multiple programming environments. Specif-
ically Fedora Core 14 is selected, as it is a common OS utilized in audio and DSP applications
(ex. PlanetCCRMA[46]).
52
4.1. O  53
4.1.1 Basic Real-Time Scheduling
In strict terms, available Fedora Core distributions are not capable of deterministic RT opera-
tion. By default the stock kernel of the OS is compiled with round-robin preemptive schedul-
ing, which is satisfactory for most end-user applications and needs, however it was not able
to fulfill the requirements of a high-fidelity RT framework. This particular type of scheduling
allows each process thread to run until it either comes to completion, or it voluntarily releases
control back to the kernel, where execution rights can be passed to a new process/thread. In the
case of RT audio algorithms, deterministic behaviour is key to uninterrupted and artifact free
performance. With round-robin scheduling a thread can lock or take longer than usual, either
by waiting for an I/O resource to be released, or by executing a high computational complexity
code section. Since it is very difficult to predict when thread locks will occur or to estimate
their duration, on average it can be expected that the period between audio process execution
cycles will be non-deterministic in duration. The exact increase in the period cannot be em-
pirically determined as process execution on a complex OS is random and difficult to predict.
Scheduling using a forced preemptive algorithm such as time-slicing allows for deterministic
execution of any thread regardless of its complexity, resource usage or total required execution
time. Therefore such scheduling algorithms are well suited for audio RT applications. Such
scheduling algorithms require that any thread completes within a given time period. If the
thread does not release control, the kernel will interrupt it and pass control to the next thread
in the thread pool. Selection is usually based on a priority list in conjunction with available
resources that the threads utilize during their execution.
4.1.2 RTOS kernel and configuration
At the heart of any OS is the kernel, which is responsible for scheduling and controlling CPU
and memory usage. Older Linux kernel versions (before 2.6.24) do not contain preemptive RT
scheduling algorithms, and therefore must be modified using the PREEMPT-RT patch from
Ingo Molnar[47]. Once the patch is applied the kernel can be configured in true preemption
54 C 4. R-T F  A
mode (using time-slicing), instead of the default voluntary preemption. Patched and config-
ured, the kernel can then be compiled and used for high-fidelity RT applications. If the kernel
version is above 2.6.24 it already includes the patch and the it only requires setting up the CON-
FIG RT PREEMPT variable in the kernel specification and re-compiling the kernel source.
In an effort to minimize the size of the kernel image, and thus the total memory footprint of
the RTOS framework, unnecessary modules have been excluded from the kernel source tree.
This also improves the overall boot up time of the system and lowers the memory footprint
associated with the OS and its modules/drivers. More importantly it limits the number of
running processes at any one time, and therefore allows high priority RT tasks (such as our
audio framework) to execute more often.
Once the kernel was configured and minimized, appropriate permission rights regarding
memory and RT scheduler utilization were set. The method of changing these permissions
varies between Linux version and distributions. This allowed our applications and RT engine
(JACK) to use the RT scheduler of the OS and unlimited memory access, used in buffer creation
and maintenance. This was necessary as the Linux kernel protects pages that contain data from
RT processes. Without unlocking the memory space, RT processes can quickly overrun the
memory that has been made available on the protected page.
4.1.3 Required Libraries
Implementation of the herein described framework is based upon several library packages that
are available under GNU licenses. Successful operation of the framework is not contingent on
having the exact same versions of this software; however, it has not been tested for versions
other than the ones specified below:
For additional information on the libraries and software utilities used in this thesis refer to
Appendix B.
4.1. O  55
Table 4.1: Library and software versions used in the framework
Product Version
Fedora Core 14
Linux Kernel 2.3.37
ALSA 1.0.25
JACK 1.9.8
sndfile 1.0.25
IPP 1.8.273
4.1.4 JACK server setup
The JACK server provided the required RT operation and audio signal routing for the frame-
work. JACK interfaces directly with ALSA to access recording and playback HDA widgets.
Additionally, it sets the audio stream format for both directions of audio data transfer by re-
questing specific hardware configurations from ALSA. The configuration profiles for the audio
hardware are limited to the capabilities made available by the HDA codec such as: available
sample rates, bit depths, channel number, etc. Specifying values outside the available ranges
results in the engine running in its default configuration. The I/O latency of the JACK server
is given by equation 2.2 and it depends on the configuration used. JACK utilizes one buffer
to store several sampling periods, and therefore the total buffer size is given by the number
of samples in each period multiplied by the number of periods in each buffer. Algorithms de-
ployed in this thesis were run using different configurations of the JACK server to establish a
good configuration. The following table lists five of these test configurations and associated
software induced latency.
Table 4.2: I/O Latency for various JACK server setups
Samples per Period Periods per Buffer Sample Rate[Hz] I/O Latency[ms]
128 2 44100 5.8
256 2 44100 11.6
1024 2 44100 46.4
128 2 16000 16.0
1024 2 16000 128.0
Note that only the first two configurations satisfied the constraints of low-latency processing
56 C 4. R-T F  A
needed to simulate hearing aid operation in real-life and in the field situations.
4.1.5 JACK client Description
JACK allows for rapid client development through the use of its extensive API. Every JACK
client can be viewed as a template program with a general layout as given in Figure 4.1[11]. At
a minimum there must be at least one input and one output port, as well as a process callback.
The process callback is what the JACK server will call on each iteration of the RT engine. On
one hand the server is responsible for gathering a full buffer of data for the input port before
it executes the process callback of the corresponding client. On the other hand the process
callback is responsible for providing a full buffer of data to the output ports before terminating.
This ensures smooth and artifact free operation.
Additionally, the JACK client is responsible for handling all possible events that can occur
during both normal and abnormal JACK server operation, including: sampling rate changes,
buffer changes, channel drop-outs, server crashes and etc. This is achieved by registering
callbacks for each event that needs to be handled as described in the JACK API.
Figure 4.1: Generic JACK client structure
4.2 RT Implementation Considerations
The RT framework described above imposes certain limitations on the development of hearing
aid algorithms. Fitting within the framework requires that the application is developed as a
4.2. RT I C 57
JACK client and is capable of frame-based DSP computations. As is the case within this the-
sis, the Object-Oriented (OO) programming paradigm is recommended in conjunction with the
high-performance Intel Performance Primitives (IPP) libraries (see Appendix C), where possi-
ble. IPP is optimized to work on machines based on Intel’s CPUs and it provides a significant
reduction in the time required to evaluate one frame of data. To facilitate user interaction, GTK
was used (see Appendix C) to develop graphical user interfaces (GUIs) capable of monitoring
and modifying the underlying algorithm parameters in RT.
4.2.1 Memory and Computational Limitations
Memory allocation and de-allocation are both very computationally expensive and time con-
suming operations. Therefore it is strongly suggested that each algorithm implemented for
this framework allocates the full amount of memory before the first execution of the process
callback (either through dynamic or static allocation). Conforming to the best coding prac-
tices and standards the allocated blocks must always be accounted for and released once the
application is finished or terminated in error. Moreover, re-allocation of used memory can
be performed during program execution; however care must be taken to ensure that no illegal
memory accesses are invoked during the switch-over. As mentioned above, JACK clients run
in a dedicated, protected memory page therefore are very sensitive to memory leaks and illegal
memory reads/writes. As such, the design of the process callback has to protect the memory
that is used in the case that it is inadvertently released or modified while the application is still
active. The simplest approach to do so is using a binary semaphore protecting critical sections
of code and common memory buffers. This is the approach taken in this thesis.
Evaluation of the execution time of each algorithm, on per-frame basis, must be performed
before the algorithm is introduced within the framework. Otherwise there is a risk of over-
whelming the computational abilities of the CPU and JACK server which will result in our
client being disconnected from the server when a sufficient number of processing overruns
occurs. Additionally, this behaviour does not allow in-line debugging of the client applica-
58 C 4. R-T F  A
tion (more specifically the process callback) once the client is connected. If the execution of
the process callback was to be halted for debugging, it is guaranteed to lock the JACK thread
until the debugger returns, thus behaving like a crashed client and getting disconnected from
the server. Testing the computational requirements before the algorithm is deployed is much
easier.
Furthermore, algorithms must be safeguarded against common floating point errors that
can occur during computation of the output buffer values. Failure to do so can result in a crash
in the framework or unexpected, possibly harmful output conditions. For example, saturating
the output of an algorithm due to a division by zero error will be propagated to the DAC and
then to the speakers/headphones. This results in very large power dissipation on the trans-
ducer (as compared to normal operation) which can permanently damage the output device.
Through testing it is discovered that even though the JACK server will protect the underlying
driver/hardware layer from over-voltage conditions it will not account for saturation.
4.2.2 Automatic Gain Control (AGC)
Each algorithm that involves frame-based processing of any number of input channels is im-
plemented using an AGC module at each output. The function of the AGC module is to com-
pensate for volume differences introduced by the non-linear effect of some of the implemented
algorithms, as well as audio power loss introduced by band-limiting filtering operations applied
during processing.
The AGC method uses two memory buffers per mono I/O channel. Each buffer contains
previous values of the mean frame power for input and output frames respectively, and its
length is set to 100ms, which can be user defined at runtime. The gain applied to each sample in
the output buffer of the algorithm was determined as the ratio of input to output average power
over the memory duration. Each element in the output frame was scaled by the computed
gain. Actual buffer size in samples is determined at run time based on the requested sample
rate and memory length. This structure effectively implements a volume equalizing running
4.2. RT I C 59
Figure 4.2: AGC topology using arbitrary length memory buffers.
average filter. Each new frame advances the memory pointer to the next available slot. When
the buffers are full the pointer wraps around to the beginning.
4.2.3 Filtering Objects
Each filter used in the framework was implemented as a modular C++ object based on a generic
base filter class. The base filter class contains the input/output buffers as well as the length of
the filter, then through inheritance the basic filter types: low-pass, high-pass and band-pass fil-
ters are created. Filter object functionality is based on routines provided within the IPP library.
The filter object also contains the handles for the required buffers, a list of filter parameters
as well as accessor functions. Additionally, the objects include functionality, such as a binary
semaphore, to safeguard against illegal memory accesses. By using such an implementation
memory allocation of any of these objects during runtime was trivial and it also allows chang-
ing of object parameters, without raising memory error exceptions. Here it must be noted that
memory allocation was only necessary on object creation and when changing buffer size and
filter length parameters.
60 C 4. R-T F  A
4.2.4 Programming Practices
Ensuring efficient and stable operation of the described algorithms was a primary concern while
developing for a limited hardware platform such as the Eurotech Catalyst. The algorithms de-
scribed herein were developed in a way ensuring compatibility with the MHA framework,
even though they are conceptually separate from the MHA structure. Additionally, they im-
plemented the following methods, which are also recommended for future development of
algorithms that are to be deployed on the platform:
• Binary Semaphores − used to protect the process callback in the case of errors or in-
tentional memory re-allocation. This approach allows the user to modify parameters of
the underlying algorithm (filter length, number of bands used, etc.) without having to
re-load the client application.
• Exception Handling − used to verify the results of critical mathematical operations, elim-
inating the possibility of computational errors and special cases, such as Infinity (Inf) and
Not a Number (NaN), propagating through the signal processing chain.
• Memory Pre-Allocation − memory requirements of most DSP algorithms can be easily
computed before processing begins, eliminating the need for computational expensive
operations during processing.
• OO implementation − beside allowing better readability, an OO approach allowed for
rapid debugging and testing of the algorithm both on the RT platform, and oﬄine using
the file system and manual scheduling. To do so all of the required JACK client callbacks
were implemented as member methods of the algorithm object, which was then compiled
as a library. By doing this the algorithm can be either included within the structure of a
JACK client, or used oﬄine without the need to produce two separate executables.
4.3. A 61
4.3 Algorithms
4.3.1 Envelope Enhancement
This section gives details of the implementation of an EE algorithm shown to improve word
identification scores in people with AN. It is adopted from [28], and the theoretical analysis is
discussed in section 2.10.
As described by Narne et.al. in [28], EE is a non-linear modification of the signal enve-
lope after it has been split into a number of predefined bands. Each band undergoes identical
processing, allowing us to utilize multi-threading (on systems that allow it). To ensure proper
operation of the algorithm in a frame based environment (whereas in [28] and in [27] process-
ing is done on the entire signal at once) care must be taken when reconstructing the output
signal. A necessary and sufficient condition for artifact free reconstruction is that that the
group delay encountered by the signal in each signal path is the same. This implies that the
band-pass filters shaping the signal into each band must have an identical group delay. Addi-
tionally, filtering delays introduced in the calculation of the envelope of the band signal must
be compensated for to ensure a time-domain match between band signals and envelopes. Elab-
orating on the above, envelope extraction performed using an LPF technique introduces a lag
on the output manifested as a time difference between the band signal and the calculated enve-
lope. To properly enhance the envelope of the band signal the lag must be compensated, which
is achieved through simulation of a non-causal system. This simulation is based on buffer
alignment and will be referred to as the buffer alignment “trick”. However, using the Hilbert
transform envelope detector introduces no lag and therefore such “tricks” are not needed.
A visual representation of the processing flow is given by the block diagram in Figure 4.3,
where the envelope is extracted using an LPF based envelope detector.
The implemented algorithm included a GUI(see Figure 4.9) that allows the user to modify
the following parameters on the fly: filter lengths, number of bands, band distribution, algo-
rithm parameters(Kmin, Kmax and τ) and the processing bandwidth.
62 C 4. R-T F  A
Figure 4.3: Envelope Enhancement algorithm flow. Note buffer alignment trick allowing us to
simulate non-causal filtering. Short buffers act as a unit frame delay.
Pre-Processing
The input signal was first passed through an LPF that attenuates frequencies higher than the
upper bound of the user-specified bandwidth. The length of the filter was also user specified.
The input buffer FIR LPF implements the overlap-save technique for frame-based processing.
4.3. A 63
For each execution of the process callback (iteration) the input buffer was built up by con-
catenating the last “M” samples of the input buffer in its previous iteration (where M is the
length of the input LPF filter) and the filtered version of the new input frame, thus at this point
the input buffer is of length M plus the size of the input frame(N), a quantity referred to as
the working length of the algorithm. This takes care of the transient effects introduced by the
LPF filter and it extends the duration of the working buffer size to accommodate for the buffer
alignment “trick”, described below. In the case of FIR filtering the transient duration is given
by the length of the filter.
Envelope Extraction
Each input frame signal was filtered into a set of frequency bands (the endpoints of these bands
are user-specified using their number, band distribution type and requested bandwidth, see sec-
tion 2.10.1). The filters used to extract the band signals are FIR, and were designed at runtime
with symmetric coefficients allowing for linear group-delay to ensuring proper reconstruction.
The downside was the need for a higher filter order to increase the sharpness of the cutoff and
limit the “bleeding” of data between frequency bands. Additionally, each band-pass FIR filter
object implemented overlap-save to facilitate frame-based processing.
Each band envelope was extracted by either using the Hilbert transform or the full-wave
rectification method as described in section 2.10. When full-wave rectification was employed
it is important to compensate for the phase shift introduced by the LPF used in the method.
Since both the input and output of any JACK client must contain the same number of elements
per frame(N), one way to compensate for the filter delay is to have a non-causal system. Un-
fortunately, such a system cannot be implemented; however, non-causality can be simulated
using a causal system at the expense of added latency.
The buffer alignment trick allowed us to simulate such a system and this effect was achieved
by following this procedure: each band buffer was prepended by values taken from the end of
the same band buffer in the previous iteration. The number of samples taken from the previous
64 C 4. R-T F  A
iteration are equal to the lag introduced by the IIR LPF used to estimate the envelope (this
value is fixed as the IIR LPF filter is also fixed regardless of band distribution or other filter
length). Next the envelope is calculated for this new extended band buffer, and as a result it
is also longer than the standard working length by an amount equal to the number of samples
prepended to the band buffer. Therefore the envelope buffer now contains enough samples to
cover both the lag amount and the working length. More importantly the end of the envelope
buffer contains the envelope corresponding to the beginning of the band buffer. Ignoring the
beginning of the envelope buffer (which contains lag data) and combining it with the beginning
of the band buffer (where we ignore the end of the buffer for which we cannot calculate the
envelope) offers a perfect phase alignment. Note that by only reading the beginning of the band
buffer (which contains data from both the current and previous iterations) a latency proportional
to the lag amount was introduced.
Figure 4.4: Band and envelope waveforms for the sentence ’A boy fell from the window’. Left
axis is the scale for the band data and original envelope, right axis is the scale for the enhanced
envelope
4.3. A 65
Envelope Expansion/Compression
Once each pair of band and envelope signals were phase matched, the expanded envelope
within a band was calculated for each frame according to equation (2.11). The element-by-
element ratio of the two envelope buffers (enhanced and original) gives the gain factor that was
applied to the non-modified band signal. See Figure 4.4 for a 4 band example.
Output Reconstruction
The band signals were added together to form the output. Note that for this addition to proceed
without introducing extra artifacts in the output, the band filters must be have matched group
delay characteristics. The result can be seen in Figure 4.5.
Figure 4.5: Envelope Enhancement: Input(top) and Output(bottom) waveforms for the sen-
tence ’A boy fell from the window’
66 C 4. R-T F  A
4.3.2 logMMSE
The logMMSE noise reduction algorithm was adopted from Liozou’s implementation given in
[36]. The author employed double frame processing rather than using overlap-save or overlap-
add to avoid framing effects (see Figure 4.6). Double framing/buffering of the input data was
used in this particular implementation as well. Additionally, a GUI for the algorithm was
developed (see Figure 4.8), that allows the user to turn the algorithm on or off while the rest of
the parameters are kept at the constant values given in [36], as they provide acceptable results.
Figure 4.6: logMMSE algorithm, data flow diagram.
4.3. A 67
Pre-Processing
The input buffer was composed by appending the current input frame to the old input frame
taken from a delay buffer. The resulting sequence was windowed using a Hamming window,
which limited frequency spread introduced when applying FFT to a the double frame. The
windowed vector was then passed through a 256 point FFT. The resulting spectrum was stored
in polar form, where the amplitude spectrum was used in the logMSME algorithm and the
noisy phase was kept for later signal reconstruction.
Applying LogMMSE
Based on the acquired amplitude spectrum and estimated noise statistic the a posteriori SNR
(γ) was calculated. The value calculated for γ was limited to the range of 0 to 40, eliminating
overflow and saturation effects.
Using the signal estimate from the previous double frame, the value obtained for the a
posteriori SNR, and the Decision-Directed approach from equation (2.20) the a priori SNR
was calculated. Which in turn was used in equation (2.22) to calculate the logMMSE spectral
estimator for the double frame.
The estimate of noise variance was calculated recursively for each voice-inactive frame as
given by equation (2.19). To differentiate between voice-active and voice-inactive frames two
Voice Activity Detector (VADs) schemes were tested:
• Geometric Mean VAD: The likelihood ratios for each frame were calculated based on
equation 2.18 and then the VAD decision was based on equation 2.21
• Support Vector Machine(SVM) based VAD: An SVM model using likelihood ratios as
feature vector components was trained on a manually labeled data set. The VAD decision
was based on the sign of the output of the SVM model given the set of likelyhood ratios
for each frequency component as an input(see Appendix A for details).
68 C 4. R-T F  A
The final implementation of the logMMSE noise reduction algorithm used the Geometric
Means VAD as it demonstrated better performance than the SVM based VAD(see Appendix A,
table A.1).
Output Reconstruction
The output was then formed from the inverse Fourier transform of the second half of the double
framed logMMSE estimator, combined with the noisy phase of the input as given by equation
(2.23).
Applying logMMSE spectral reduction to a noisy speech signal at 5 dB SNR (see Figure
4.7) resulted in almost complete elimination of the underlying noise; however, in low SNR
conditions it had detrimental effect on intelligibility.
Figure 4.7: Input and Output waveforms for the MMSE algorithm for the sentence ’The birch
canoe slid on the smooth planks’ at 5dB SNR. Left: input; Right: output
Figure 4.8: GUI for the MMSE client Figure 4.9: GUI for the EE client
4.3. A 69
4.3.3 Algorithm Tester
For the purpose of subjectively evaluating the performance of the above processing schemes
an algorithm tester was developed using GTK and JACK. The program was developed as a
JACK client, thus taking advantage of the signal routing mechanisms offered by the JACK
API. An intuitive GUI was developed to facilitate testing of subjects with minimal supervision
(see Figure 4.11). The GUI gathers all required personal information about the subject and
initiates the test sequence. The main goal of the tester was to aid the algorithm designer to
select one of two algorithm options. Each option is fed to one of the tester’s inputs.
The structure of the tester can be seen in Figure 4.10. The program randomly assigns each
of its two inputs to one of the playback buttons on the GUI. The test participant then plays
each one and rates the two options on a scale between -10 and 10. Where -10 represents
high preference for the first option and 10 represents high preference for the second option, the
values inbetween allow for a range of subjective answers. The responses were then summed up
in their respective categories (option 1 or option 2) and stored into a file which can be accessed
at the end of the test.
Figure 4.10: Block Diagram for the Algorithm tester
Figure 4.11: Algorithm Tester GUI
70 C 4. R-T F  A
4.4 Real-Time framework
The RT framework is composed of the JACK server and all of the JACK clients connected to
the server at any one time. Each client is executed once per processing cycle as requested by
the JACK server. Refer to Figure 4.12 representing the user view of the RT framework. From
top to bottom, depicted are: server status window, connection list window (implementing the
algorithm chain given in Figure 4.13), algorithm GUIs for two MMSE clients and one EE
client, and finally RT oscilloscope displays showing the output (left) and inputs(right) for the
algorithm chain. External JACK clients were utilized for RT data visualization (meterbridge).
This was the configuration (clients and interconnections) used to generate the data sets used in
the subjective evaluation.
4.5 Summary
This chapter described the structure of the RT framework, it provided details of the JACK server
operation and gave a template for future JACK client development. A brief description of RT
scheduling and the modifications required for a stock Linux kernel to make be viable in high-
fidelity RT audio processing environment were given. Recommendations were provided for
the development of algorithms for the framework, including special case handling, algorithm
structure and best practices. Finally a detailed description of the implementation of Envelope
Enhancement and logMMSE algorithms is given. A tool used in deciding between algorithm
configuration options was described as it was developed for the framework.
The next chapter focuses on the evaluation of the performance, parameters and subjective
merits of the developed audio interface, RT framework and algorithms.
4.5. S 71
Figure 4.12: RT framework layout
Figure 4.13: Sample algorithm chain including the MMSE and EE clients
Chapter 5
Hardware, Framework and Algorithm
Evaluation
5.1 Audio Interface Parameters
The audio interface parameters were based on the requirements imposed on the project, as well
as the capabilities of the AD1989a codec. The available sampling rates ranged from 8 to 192
kHz at word lengths of either 16 or 32 bits.
Measurement of the input stage parameters were performed when the port was configured
as a microphone input. Input sensitivity calculation is based on the maximum available am-
plification provided by the AD1989a pre-amplifier. Maximum power output occurred at an
input level as low as 20mV under loaded condition (i.e. the input amplifier has its output con-
nected internally to the codec, and therefore sources current). Input impedance was measured
by adding a series precision resistor (1Ω). The input RMS current was then measured as a volt-
age drop across the resistor. A function generator was used to provide the required frequency
for the test signal. The dynamic range was measured as the ratio of the full-scale deflection
possible on the input and the noise floor.
Measurements of the output stage included the headphone amplifier, which is internal to
72
5.2. F P  R V 73
the codec. The output level for full scale deflection (before clipping occurs) was measured
as 1.25V (2.1dBu). The output impedance measurement was performed similarly to the input
stage, however, this time the audio interface itself was used as the function generator and a
precision multimeter was used to monitor the RMS voltage output. As with the input stage,
dynamic range was given as the ratio between full-scale output and the noise RMS level at the
output.
Available Stream Formats
Sampling Rates 8, 16, 32, 44.1, 48, 96, 192 kHz
Word Lengths 16, 32 bits
Input Stage Metrics(Ports B, C and E)
Input Sensitivity -16dBu
Impedance(@ 4kHz) 150kΩ
Dynamic Range 90dB
Output Stage Metrics (Port A, HP enabled)
Output Level 0 dBFS 2.1dBu
Impedance (@ 4kHz) 1Ω
Dynamic Range 96dB
Average Power Consumption 195mW
Table 5.1: Measured audio interface parameters
Note that due to the CPU limitations of the Eurotech platform, sampling rates higher than
48 kHz are not recommended, as they require more frequent execution of the processing loop
and can result in frequent buffer overruns.
5.2 Framework Performance and Requirements Verification
The primary metric of the RT framework is the audio latency introduced by processing be-
tween the capture of audio data and replaying the corresponding output. The amount of delay
introduced by the processing scheme is given as average on per-frame basis. Values are given
for these configurations of the RT engine satisfying the minimum delay criteria, see Table 3.1.
Note that the RT engine latency includes the delay introduced by the processing algorithm,
because in the cases when the algorithm has a longer delay than one RT processing period it
74 C 5. H, F  A E
runs into an overrun condition which was not allowed. Therefore, the latency added to the total
by the processing scheme can only be due to delays introduced in the processing and not the
actual time each frame takes to compute. The measured values were gathered and summarized
in Table 5.2. A feed-through algorithm that took the input and copied it to the output without
any processing was introduced as a control and to establish a best case scenario baseline. As
discussed in Chapter 2 latency has many sources, and below a list of the ones that could either
be estimated or measured is given:
• Hardware I/O latency: associated with the time required for the ADC/DAC to settle
before the data is made available to the DMA, and subsequently copied to the input
buffer of the first JACK client in an algorithm chain.
• Sampling latency: associated with JACK and the refresh rate of the I/O buffers utilized
in the RT engine. It is equal to the period of processing iterations during RT engine
operation.
• Processing latency: This is the time it takes for the CPU to calculate the output based on
the client’s algorithm. It includes any lag introduced by filtering operations and buffer
alignment operations.
• Total latency: This is the amount of time between to introduction of a stimulus at the
microphone port and the observation of the corresponding effect on the output port (e.g.
headphones).
Table 5.2: Measured Latency for Various Processing Schemes. Each frame is 128 samples at a
sampling rate of 44100 Hz.(*) latency with correct client ordering
Algorithm Hardware latency Sampling latency Processing latency Total latency
Feed-Through < 1 ms 5.8 ms 0.0 ms 6.0 ms
EE < 1 ms 5.8 ms 1.8 ms 6.0 ms
logMMSE < 1 ms 5.8 ms 2.9 ms 6.0 ms
EE + logMMSE < 1 ms 5.8 ms 4.7 ms 12.0(6.0*) ms
5.3. S S N 75
Additionally, it should be noted that chaining JACK clients increases latency. This effect
comes from the way JACK buffers the data for each client. In any one processing iteration the
input to a given JACK client is the output from the preceding client for the previous frame. In
other words, the first client will process the input frame and passes it to its output, where JACK
will copy the data to the input buffer of the next client in the chain; however, this client will
process the data on the next frame iteration when it polls its input buffer. Therefore, each client
in a processing chain increases the I/O latency by an amount equal to the RT engine I/O latency.
One should work around the problem by explicitly ordering the JACK clients on the execution
“wheel”, ensuring that the client execution follows the order of chaining. This is done simply
by launching the clients in the appropriate order, thus ensuring that they are registered in the
correct spot on the execution “wheel”. In other words, the first client in the chain (closest to
the audio source) should be launched first, and the last client in the chain (closest to the audio
sink) should be started last. By doing this, data will flow from client to client within the same
frame iteration.
Based on the above it is recommended that a particular hearing-aid algorithm is imple-
mented first as blocks to allow for easier debugging; however, in production it should be com-
bined into a single client, thus eliminating intermediary buffers. This should be applied even if
explicit client ordering can be achieved. Such an approach limits the memory footprint of the
overall algorithm chain, and saves on memory copy operations, which are both time consuming
and potential thread blockers.
5.3 Speech Shaped Noise
Speech shaped-noise (SSN) was used for validation of the performance of the logMMSE algo-
rithm. This type of noise has a spectrum which approximates the average long term spectrum
of the speech of an adult male and has a slope below 100 Hz of +6 dB/octave, a flat spectrum
between 100 Hz and 320 Hz, and above 320 Hz a slope of -6 dB/octave (this means that each
76 C 5. H, F  A E
time frequency is doubled, i.e. increases by an octave, the intensity drops by 6 dB). SSN has
a similar effect to the masking produced by a number of other speakers speaking at the same
time (multi-speaker babble) and as such it is a more realistic type of noise to use in speech
perception tests, compared to white or pink noise. Figure 5.1 demonstrates the spectrum of a
sample SSN waveform. The SSN filter characteristic function is derived from the long-time
average of sentences in the hearing in noise test (HINT) database, and then modeled as a 5th
order IIR filter. To generate an SSN sequence, first a random vector with values drawn from a
standard normal distribution, i.e. Gaussian was produced. This resulted in a vector containing
purely white noise, excluding anti-aliasing effects. The noise vector was then passed through
the SSN filter producing the required SSN sequence. For any given SNR a gain factor can be
calculated, that increases the power of the SSN sequence relative to the target speech segment,
thus producing the required SNR once the two are mixed.
Figure 5.1: Sample speech-shaped noise spectrum
5.4. S V 77
5.4 Subjective Validation
Subjective verification of the proposed system was performed using the developed framework
to evaluate the subjective quality of algorithm output and to verify framework performance.
Additionally, two experiments were performed to verify previously published results showing
improvement of word identification scores using EE on speech corrupted by AN [27]. Intro-
ducing noisy speech at the algorithm input expanded on that study in an attempt to emulate real
world situations. AN effects were simulated using a MATLAB based simulator (see Section
2.12).
The first experiment evaluated the effect EE has on clean speech when the result was pre-
sented to the AN simulator. The second experiment evaluated the combination of the logMMSE
and EE algorithms on noisy speech signals at two SNR levels: 5 and 10dB.
An example of the audio signal waveform as it travels through the processing chain is given
in Figure 5.2.
Figure 5.2: Processing of the sentence ’Smoke poured out of every crack’
78 C 5. H, F  A E
5.4.1 Stimuli, Pre-processing and Test Set Composition
Speech stimuli were taken form the HINT database. It consists of 28 wordlists, 10 sentences
each sampled at 44.1 kHz. The RT engine was configured to match the sampling rate to avoid
resampling of the database stimuli. Using the transcript of each sentence, keywords were
extracted. Each subject was asked to repeat anything they heard, and the number of correctly
matched keywords was recorded. Words from the transcript were recognized as keywords only
if they were longer than 4 letters and were not connection words such as: and, is, on, etc.
Finally, volume levels of each sentence were equalized and two noisy datasets were generated
using SSN at 5 and 10dB SNR. Pre-processing of the stimuli, as well as their presentation to
the subjects and the associated data collection was done using a custom MATLAB script.
Stimuli were presented randomly for each participant and for each AN condition and al-
gorithm configuration. No stimulus sentence was repeated twice for any given subject, thus
ensuring that the subject was always presented with a new set of keywords to identify. Before
the beginning of each test, the participant was presented with a sample stimulus and asked if
the volume level was comfortable. The volume level was then adjusted if needed. During the
tests volume equalization was guaranteed by the pre-processing of the stimuli. For each partic-
ipant a test table was generated describing the test procedure. Each table column represents an
algorithm configuration, and each table row represents an AN condition. The cells of the table
were assigned one of the HINT wordlists at random, and during the test, cells were selected
at random for each iteration, until the wordlist was exhausted, then the test continued with the
next randomly selected AN/EE configuration pair.
5.4.2 Methodology
In experiment 1, the clean speech signals from the generated datasets were processed in RT
through the EE algorithm. This included an input anti-aliasing filter followed by the EE algo-
rithm as described in Section 4.3.1. The equalized, clean speech samples were processed in
RT through the EE algorithm and saved as a separate dataset. This dataset was then processed
5.4. S V 79
in non-RT by the AN simulator and the resulting files were used as the stimuli presented to
the subjects. Samples were generated for each AN condition given in Table 2.1. For this ex-
periment, two different settings for the EE algorithm were used: strong and weak. The strong
EE has more pronounced envelope enhancement due to the higher value of τ = 0.001. The
weak EE has τ = 0.0001. The motivation of this experiment was to verify the results given in
[27] and evaluate the applicability of the proposed framework. Using two EE setups allowed
a choice to be made between the two, and the better performing setting was used in the next
experiment.
In Experiment 2 a noisy speech signal (corrupted by SSN at 5 and 10dB SNR) was pro-
cessed in RT first through the logMMSE noise reduction algorithm followed by the weak EE
algorithm. The resulting dataset was then processed in non-RT with the AN simulator. The
motivation for this experiment was to demonstrate the applicability of the framework and de-
veloped system in real-life situations, where noise is omnipresent.
For both experiments the EE algorithm was initialized with Kmax = 4 and Kmin = 0.1. Band
distribution was set to four separate 3rd octave bands between 100 Hz and 4 kHz. Band signal
extraction was achieved using BPF FIRs of order 51. The main parameter that controls the
level of expansion of small envelopes and the level of compression of large envelopes (there
exists an indirect relationship between τ and the modulation factor) was set to two different
levels:
• Strong: τ = 0.001 - Very pronounced envelope enhancement.
• Weak: τ = 0.0001 - Less pronounced envelope enhancement.
The parameters of the logMMSE algorithm in the second experiment corresponded to the
original author’s implementation [48].
For both experiments the control condition was presented by the clean un-processed speech
segments modified only by the AN simulator for the respective AN condition.
80 C 5. H, F  A E
5.4.3 Results and Discussion
Subjective verification of the RT framework and AN algorithm chain on five normal hearing
subjects was performed. Stimuli presentation was done in non-RT, oﬄine mode, due to the
limitation imposed by the AN simulator. Experiment 1 results are shown in Figure 5.3. Exper-
iment 2 results are shown in Figures 5.4 and 5.5 for the two levels of SNR investigated.
Experiment 1 results demonstrate improvement in keyword identification for all AN con-
ditions (from mild to severe) when the signal is pre-processed using the weak EE algorithm
and in the absence of noise. The improvement ranges from 10% at the severe AN condition
to less than 0.1% at the mild AN condition. When the strong EE algorithm is applied, speech
degradation is observed. This results in much lower identification scores as shown in Figure
5.3 where speech degradation is severe enough that identification scores were lower than the
control condition results for the same AN severity.
Figure 5.3: Envelope Enhancement Subjective Results for clean speech at two strengths of the
EE algorithms, error bars show standard deviation
Experiment 2 results demonstrated improvement of speech identification scores for both
investigated SNR levels, however, the improvement was limited to the profound and severe
AN conditions. Moreover, the improvement difference is lower than what was observed in
Experiment 1 for the same AN condition and weak EE. This is due to speech degradation
5.5. S 81
introduced by the logMMSE algorithm. At lower SNR levels, noise reduction using a spectral
estimator can remove much of the original signal’s spectral components. This was confirmed
by the 10dB SNR test (see Figure 5.5) which showed slightly better results than the 5dB test
(see Figure 5.4). In both cases applying the logMMSE+EE algorithm chain to mild or moderate
AN conditions was found to be detrimental to keyword identification. Further investigation
with other noise types, such as: babble and car noise and more SNR levels is warranted for
additional insight on the effect of the combination of logMMSE and EE algorithms.
Figure 5.4: Envelope Enhancement Subjective Results for noisy speech at 5dB SNR, error bars
show standard deviation
5.5 Summary
This chapter illustrates the effects of the EE and logMMSE algorithms on clean and noisy
speech segments. The generation of SSN sequences was explained, and the results are applied
in the evaluation of the logMMSE and logMMSE+EE algorithm chains. Subjective validation
of the performance of the proposed algorithm chains is performed on speech corrupted by AN
through the use of a simulator and normal hearing subjects. The acquired results demonstrate
some improvement in word identification scores for the mild to moderate settings of the simu-
82 C 5. H, F  A E
Figure 5.5: Envelope Enhancement Subjective Results for noisy speech at 10dB SNR, error
bars show standard deviation
lator. In addition, an evaluation of two different strengths of the EE algorithm was performed.
Measurement methodology for the hardware parameters of the custom audio interface was
given, and the parameters were presented. Assessment of latency figures was performed on
the low-power Eurotech development kit. Based on this assessment a couple of configurations
for the RT engine were suggested that yielded latencies equal to or lower than the imposed
requirements.
The next chapter outlines the major contributions of this thesis, a detailed summary of
the achieved results and knowledge acquired during the work on this thesis. Finally, future
development areas are outlined.
Chapter 6
Conclusions and Future Work
6.1 Summary
Within the scope of this thesis a multichannel sound interface was designed and developed.
The requirements were that such an interface has to target portable hearing aid applications.
The sound system was based on the Intel HDA standard and designed to fit the Eurotech devel-
opment kit, including the Eurotech Catalyst CPU and memory module. A driver specific to the
developed system was implemented as part of the Advanced Linux Sound Architecture. The
implementation of the sound system was verified against other similar systems and was found
to have comparable or better performance.
Additionally, a real-time framework for the development, testing and deployment of DSP
audio algorithms was implemented based on the the Jack Audio connection kit. Moreover,
a description and guidelines for developing JACK clients and implementing them within the
real-time framework are described. To complement the goal of the thesis an algorithm testing
application based on the RT framework was developed.
Envelope enhancement of speech signals, targeting people with auditory neuropathy, was
implemented as a part of the real-time framework. The algorithm was also used to assess the
viability of the framework for the target environment. Previously published results demonstrat-
83
84 C 6. C  F W
ing improvement in word identification scores are verified against the obtained results. Data
processing was done in real-time and subjective validation was performed in non-RT by normal
hearing listeners with the aid of an auditory neuropathy simulator.
It is well known that testing hearing-aid algorithms in real-life situations will be unques-
tionably corrupted by environmental noise. For such situations, the logMMSE noise reduction
algorithm has been implemented as part of the framework. In previous studies the algorithm
has been shown to that has been shown to perform better than other classes of speech enhance-
ment algorithms.
The implementation of both algorithms complies with the guidelines given for clients of
the the real-time framework.
6.2 Major Contributions
The main contributions of this thesis include:
• Design, development, and evaluation of a multichannel sound system for portable hear-
ing device applications. Designed to match the Intel HDA specifications, this module
directly plugs into the Catalyst board, and provides low latency audio functionality at
sampling rates of 48 kHz and lower.
• Development of a framework for algorithm development and testing. This framework
was implemented on the Linux OS, and it utilizes the JACK Audio interface, and Intel
signal processing libraries.
• Implementation of sample DSP algorithms for envelope enhancement and noise reduc-
tion utilizing this framework. Performance verification using pilot subjective tests.
6.3. F W 85
6.3 Future Work
• Further subjective testing to assess the effect the logMMSE algorithm has on speech
perception in individuals with AN. Such a study would compare the results from EE
alone applied to noisy speach and compare it to the results given herein.
• Implementation of other algorithms shown to improve speech perception by persons with
auditory neuropathy, including time-stretching and spectral enhancement.
• Completion of a handheld prototype will require design and implementation of the rest
of the modules outside the scope of this thesis. So far the Eurotech development kit has
served that purpose, but further development will be required. These modules include a
touch screen and the corresponding touch screen controller, power management, as well
as permanent storage devices including storage for the RT framework and RTOS.
• Deploying the complete handheld prototype in real-life situation and evaluating the per-
formance of the modules developed within this thesis and future work sections. Such
field testing of the portable device should be done with both normal hearing and persons
with auditory neuropathy.
Bibliography
[1] L. Chittka and A. Brockmann. Perception spacethe final frontier. PLoS biology,
3(4):e137, 2005.
[2] Canadian Association of Speech-Language Pathologists and Audiologists. Adult hearing
disorders, fact sheet, 2000.
[3] B.A. Stach. Clinical audiology: an introduction. Cengage Learning, 2008.
[4] Fan-Gang Zeng and Sheng Liu. Speech perception in individuals with auditory neuropa-
thy. Journal of speech, language, and hearing research : JSLHR, 49(2):367–380, 2006.
[5] V.K. Narne and CS Vanaja. Perception of speech with envelope enhancement in indi-
viduals with auditory neuropathy and simulated loss of temporal modulation processing.
International Journal of Audiology, 48(10):700–707, 2009.
[6] B. Cone-Wesson, G. Rance, and Y. Sininger. for patients with auditory neuropathy. Au-
ditory neuropathy: A new perspective on hearing disorders, page 233, 2001.
[7] J.C. Mason, A. De Michele, C. Stevens, R.A. Ruth, and G.T. Hashisaki. Cochlear im-
plantation in patients with auditory neuropathy of varied etiologies. The Laryngoscope,
113(1):45–49, 2003.
[8] Vijaya K Narne and C S Vanaja. Perception of envelope-enhanced speech in the presence
of noise by individuals with auditory neuropathy. Ear and hearing, 30(1):136–142, 2009.
[9] V.K. Narne and C.S. Vanaja. Perception of envelope-enhanced speech in the presence of
noise by individuals with auditory neuropathy. Ear and Hearing, 30(1):136, 2009.
[10] Eurotech. Catalysis module user manual. Technical report, Eurotech, 2007.
[11] P. Davis and T. Hohn. Jack audio connection kit. In Proc. of the Linux Audio Developer
Conference. ZKM Karlsruhe, 2003.
[12] N. Magotra and S. Sirivara. Real-time digital speech processing strategies for the hearing
impaired. In Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE
International Conference on, volume 2, pages 1211–1214. IEEE, 1997.
[13] T. Stetzler, N. Magotra, P. Gelabert, P. Kasthuri, and S. Bangalore. Low power real-
time programmable dsp development platform for digital hearing aids. In Acoustics,
86
BIBLIOGRAPHY 87
Speech, and Signal Processing, 1999. ICASSP’99. Proceedings., 1999 IEEE International
Conference on, volume 4, pages 2339–2342. IEEE, 1999.
[14] U Rass and G H Steeger. Evaluation of digital hearing aid algorithms on wearable signal
processor systems. In Proc. 8th European Signal Processing Conf.(EUSIPCO), pages
475–478. Edizioni LINT, Trieste, Italy, 1996.
[15] U Rass and G H Steeger. A high performance pocket-size system for evaluations in
acoustic signal processing. Acta Acustica united with Acustica, 86(2):374–375, 2000.
[16] H Kruger, T Lotter, G Enzner, and P Vary. A pc based platform for multichannel real-time
audio processing. In Proceedings of International Workshop on Acoustic Echo and Noise
Control (IWAENC), Kyoto, Japan, 2003.
[17] G. Grimm, T. Herzke, D. Berg, and V. Hohmann. The master hearing aid: a pc-based
platform for algorithm development and evaluation. Acta acustica united with Acustica,
92(4):618–628, 2006.
[18] G Grimm, G Guilmin, F Poppen, M S M G Vlaming, and V Hohmann. The personal
hearing system: a software hearing aid for a personal communication system. EURASIP
Journal on Advances in Signal Processing, 2009:9, 2009.
[19] V. Gopalakrishna, N. Kehtarnavaz, and P.C. Loizou. A recursive wavelet-based strategy
for real-time cochlear implant speech processing on pda platforms. Biomedical Engineer-
ing, IEEE Transactions on, 57(8):2053–2063, 2010.
[20] E. Solari and G. Willse. PCI and PCI-X hardware and software: architecture and design.
www. digitalguru. com, 2005.
[21] Intel. Intel hda specification. Technical report, Intel Corporation, 2005.
[22] D Roach. High Definition Audio for the Digital Home. Intel Press, 2006.
[23] Paul Davis. Jack schematic diagram. http://jackaudio.org/documentation, September
2012.
[24] D.B. Stewart. Measuring execution time and real-time performance. In Embedded Sys-
tems Conference (ESC). Citeseer, 2001.
[25] Michael Durrant Jeff Dionne. Jack schematic diagram. http://www.uclinux.org/, Septem-
ber 2012.
[26] T. Iwai. Writing an alsa driver, 2002.
[27] Benjamin Morgan. Development and evaluation of envelope, spectral and time enhance-
ment algorithms for auditory neuropathy. Master’s thesis, University of Western Ontario,
2011.
[28] Vijaya K Narne and C S Vanaja. Effect of envelope enhancement on speech perception in
individuals with auditory neuropathy. Ear and hearing, 29(1):45–53, 2008.
88 BIBLIOGRAPHY
[29] P M Clarkson and S Bahgat. Real-time speech enhancement system using envelope ex-
pansion technique. Electronics Letters, 25(17):1186–1188, 1989.
[30] P M Clarkson and S F Bahgat. Envelope expansion methods for speech enhancement.
The Journal of the Acoustical Society of America, 89(3):1378–1382, 1991.
[31] J. Duoandikoetxea and D. Cruz-Uribe. Fourier analysis. American Mathematical Society
Providence, RI, 2001.
[32] Y Ephraim and D Malah. Speech enhancement using a minimum-mean square error
short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and
Signal Processing, 32(6):1109–1121, 1984.
[33] J. Sohn, N.S. Kim, and W. Sung. A statistical model-based voice activity detection. Signal
Processing Letters, IEEE, 6(1):1–3, 1999.
[34] J. Sohn and W. Sung. A voice activity detector employing soft decision based noise
spectrum adaptation. In Acoustics, Speech and Signal Processing, 1998. Proceedings of
the 1998 IEEE International Conference on, volume 1, pages 365–368. IEEE, 1998.
[35] Y. Ephraim and D. Malah. Speech enhancement using a minimum-mean square error
short-time spectral amplitude estimator. Acoustics, Speech and Signal Processing, IEEE
Transactions on, 32(6):1109–1121, 1984.
[36] P.C. Loizou. Speech enhancement: theory and practice, volume 30. CRC, 2007.
[37] J. Ram’irez, P. Ye´lamos, JM Go´rriz, and JC Segura. Svm-based speech endpoint detection
using contextual speech features. Electronics letters, 42(7):426–428, 2006.
[38] Y. Ephraim and D. Malah. Speech enhancement using a minimum mean-square error
log-spectral amplitude estimator. Acoustics, Speech and Signal Processing, IEEE Trans-
actions on, 33(2):443–445, 1985.
[39] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes
Source Code CD-ROM 3rd Edition: The Art of Scientific Computing. Cambridge Univer-
sity Press, 2007.
[40] N. Pathak, I. Panahi, P. Devineni, and R. Briggs. Real time speech enhancement for the
noisy mri environment. In Engineering in Medicine and Biology Society, 2009. EMBC
2009. Annual International Conference of the IEEE, pages 6950 –6953, sept. 2009.
[41] Analog Devices. Ad1988a/b datasheet. Technical report, AD, 2007.
[42] Analog Devices. Ad1989 pcb layout. Technical report, AD, 2008.
[43] Conexant. Ad1989 datasheet. Technical report, Conexant, 2008.
[44] IDT. 92hd89 datasheet. Technical report, IDt, 2009.
BIBLIOGRAPHY 89
[45] Wikipedia. Real-time operating systems. https://en.wikipedia.org/wiki/List of real-
time operating systems, September 2012.
[46] Fernando Lopez-Lezcano. Planet ccrma at home, 2012.
[47] Robert Schwebel Luotao Fu. Rt preempt howto.
https://rt.wiki.kernel.org/index.php/RT PREEMPT HOWTO, September 2012.
[48] Y Ephraim and D Malah. Speech enhancement using a minimum mean-square error
log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal
Processing, 33(2):443–445, 1985.
[49] C. Cortes and V. Vapnik. Support-vector networks. Machine learning, 20(3):273–297,
1995.
[50] J.C. Platt. Sequential minimal optimization: A fast algorithm for training support vector
machines. Technical report, Microsoft Research, 1998.
[51] Q.H. Jo, J.H. Chang, JW Shin, and NS Kim. Statistical model-based voice activity detec-
tion using support vector machine. Signal Processing, IET, 3(3):205–210, 2009.
[52] V.N. Vapnik. An overview of statistical learning theory. Neural Networks, IEEE Trans-
actions on, 10(5):988–999, 1999.
[53] T. Joachims. Making large-scale svm learning practical. Advances in Kernel Methods
Support Vector Learning, pages 169–184, 1999.
[54] E. Osuna, R. Freund, and F. Girosit. Training support vector machines: an application to
face detection. In Computer Vision and Pattern Recognition, 1997. Proceedings., 1997
IEEE Computer Society Conference on, pages 130–136. IEEE, 1997.
[55] S.S. Keerthi, S.K. Shevade, C. Bhattacharyya, and K.R.K. Murthy. Improvements to
platt’s smo algorithm for svm classifier design. Neural Computation, 13(3):637–649,
2001.
[56] A. Barbero and J.R. Dorronsoro. Momentum sequential minimal optimization: an ac-
celerated method for support vector machine training. In Neural Networks (IJCNN), The
2011 International Joint Conference on, pages 370–377. IEEE, 2011.
[57] ChihChung Chang and ChihJen Lin. LIBSVM: A library for support vector machines.
ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software
available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
[58] D. Enqing, L. Guizhong, Z. Yatong, and Z. Xiaodi. Applying support vector machines
to voice activity detection. In Signal Processing, 2002 6th International Conference on,
volume 2, pages 1124–1127. IEEE, 2002.
[59] T. Kinnunen, E. Chernenko, M. Tuononen, P. Franti, and H. Li. Voice activity detection
using mfcc features and support vector machine. In Int. Conf. on Speech and Computer
(SPECOM07), Moscow, Russia, volume 2, pages 556–561, 2007.
90 BIBLIOGRAPHY
[60] A. Temko, D. Macho, and C. Nadeu. Enhanced svm training for robust speech activ-
ity detection. In Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE
International Conference on, volume 4, pages IV–1025. IEEE, 2007.
[61] J.W. Shin, J.H. Chang, and N.S. Kim. Voice activity detection based on statistical models
and machine learning approaches. Computer Speech & Language, 24(3):515–530, 2010.
Appendix A
SVM Classification
In its simplest linear form, an SVM is a hyperplane that separates a set of positive examples
from a set of negative examples with a maximal margin. In the more complicated case, when
the input feature vector is more than 2 dimensional, the SVM builds an optimal hyperplane
with number of dimensions equal to the length of the feature vector. This hyperplane, upon
optimization, is guaranteed to be at a maximal distance from the closest vector of either class.
This binary type of classification is readily applied to VAD, however care must be taken in
the composition of the feature vectors which are used as training data and consequently are
the inputs to the trained SVM. Given training data consisting of M samples in the form of N-
dimensional patterns ~xi =
[
xi,1 xi,2 . . . xi,N
]
and the corresponding classifications zi = ±1, i =
1, 2, . . . ,M, the formula for the output of a linear SVM is
u = ~w · ~x − b (A.1)
~w is the normal vector to the hyperplane and ~x is the input vector. The separating hyperplane is
the plane for which u = 0, and in the case of perfectly separable data, the nearest vectors to the
plane lie on u = ±1, and are known as support vectors(SVs). The margin is therefore given by:
m =
1
||~w||2 (A.2)
91
92 C A. SVM C
Maximizing the margin can be expressed via the following optimization problem[49]:
Minimize: φ
(
~w
)
=
1
2
||~w||2 (A.3)
Subject to:
{(
~w · ~xi) − b} zi ≥ 1,∀i (A.4)
We can solve the above optimization problem through the use of Lagrange multipliers as fol-
lows:
L
(
~w, b, αi
)
=
1
2
||~w||2 −
M∑
i=1
αi
[{(
~w · ~xi) − b} zi − 1] (A.5)
, αi ≥ 0, ∀i
αi are Lagrange multipliers. The optimization can be converted to a dual Quadratic program-
ming(QP) problem[50], where the objective function is solely dependent on a set of Lagrange
multipliers αi:
min
~α
Ψ(~α) = min
~α
1
2
M∑
i=1
M∑
j=1
ziz j
(
~xi · ~x j
)
αiα j −
M∑
i=0
αi (A.6)
Subject to one inequality constraint and one equality constraint:
αi ≥ 0,∀i (A.7)
M∑
i=0
ziαi = 0 (A.8)
Note that each training sample corresponds to only one Lagrange multiplier and vice versa.
Once we have solved the QP optimization problem given by (A.6) and obtained ~α we can
compute the optimal weight vector(~w) and threshold (b) using the following[50]:
~w =
M∑
i=0
ziαi~xi, b = ~w · ~xk − zk, for some αk > 0 (A.9)
Not all data was linearly separable, i.e. a hyperplane that splits the training data without
error may not exist. However Cortes and Vapnik[49] suggest a solution to this problem by
93
introducing “slack” variables i to each training sample, which allows training data errors, but
imposes a penalty for each failure. By adding this condition to the original problem (A.3) we
get:
min
~w,b,~
1
2
||~w||2 + C
N∑
i=1
i (A.10)
Subject to: zi
(
~w · ~xi − b) ≥ 1 − i,∀i (A.11)
C is a parameter that depicts the trade-off between wide margin and small number of mar-
gin failures. Once (A.10) is transformed into a dual QP problem the only change is that the
inequality constraint given by (A.7) is changed to a box constraint[50]:
αi ∈ [0,C] ,∀i (A.12)
Here we note that the slack parameters only appear in the limit of the Lagrange multipliers and
as such determine which of the training set vectors will become support vectors (vectors with
αi , 0 or C).
The novel feature for SVM classification in Sohn, et.al[34], was reported to be the LR, and
in Jo, et.al[51] it is used as the elements of the feature vector. Therefore the decision statistic
becomes:
f
(
~Λ(n)
)
= ~w · ~Λ(n) + b (A.13)
where ~w and b are the optimized weight vector and threshold respectively.
The Karush-Khun-Tucker (KKT) conditions are necessary and sufficient conditions for an
optimal point of a positive definite QP problem. In our case, the QP problem given by (A.6)
with the box constraint given by (A.12) is solved, according to [50], when:
αi = 0⇐⇒ zi f
(
~Λi
)
≥ 1
αi ∈ (0, )⇐⇒ zi f
(
~Λi
)
= 1
94 C A. SVM C
αi = 0⇐⇒ zi f
(
~Λi
)
≤ 1
Note that the above implies that in the calculation of the decision statistic we need not pay
attention to any training samples that have a Lagrange multiplier equal to 0 or C, as they
correspond to correctly classified and misclassfied samples, respectively. The same is not true
during training, as most training algorithms for SVM involve sequential optimization, and
therefore the values for αis can and will be changing during optimization. Any Lagrange
multiplier in (0,C) paired with the corresponding zi and xi represent a support vector. Therefore
to obtain the weight vector and threshold by only using the non-bound Lagrange multipliers
we can simplify (A.9) to:
~w =
∑
ziαi~xi, for ∀αi , 0 or C (A.14)
b = ~w · ~xk − zk, for some αk , 0 or C
Scatter plot analysis as performed in[51] reveals significant overlap in feature space (see
Fig.A.1). To help alleviate this problem a kernel function is employed. The kernel function can
be non-linear, and thus allows for a linear plane in kernel space to exhibit non-linear behaviour
in feature space. The kernel function that has shown the most promise for separating LR based
feature vectors was reported in [51] as the Radial Basis Function (RBF) kernel:
K
(
~xi, ~x j
)
= exp
(
−||~xi − ~x j||
2
σ2
)
(A.15)
where σ is the kernel width. Using the result from (A.14) and modifying the VAD decision
statistic (A.13) to account for a kernel function results in the final decision statistic given by:
f
(
~Λ(n)
)
=
∑
αiziK
(
~Λi, ~Λ (n)
)
+ b, ∀αi , 0 or C (A.16)
Λi are the support vectors obtained during optimization and Λ is the input to the trained SVM
Solving the QP problem to obtain the Lagrange multipliers can be quite involved and there
95
Figure A.1: Scatter plot for Λ6 and Λ7
are several methods to do so, including a couple of freely available software packages:
• Chunking is the method proposed by Vapnik[52] and it breaks down the QP problem
into smaller QP problems, by eliminating support vectors with corresponding Lagrange
multipliers equal to 0
• S V Mlight is a software package the uses theory developed by Joachims et al.[53] and is
based on successive shrinking of the training set.
• Osuna et al.[54] proved the theorem showing that a large QP problem can be broken
down to sub-problems. As long as there is at least one sample in the training set that
violates the KKT conditions, that sample can be added to the working set and jointly
optimized.
• Sequential Minimal Optimization (SMO) was defined by Platt et al.[50] and is the algo-
rithm of choice for this paper. SMO takes advantage of the Osuna theorem and chooses
to jointly optimize only two Lagrange multipliers at a time. Modifications to the SMO
algorithm are proposed by Keerthi et al.[55]
• LibSVM is a software package that is based on SMO, with the addition of a momentum
parameter that allows the optimization to speed up under certain conditions[56, 57]
Previous work with VAD and SVM in terms of feature vector composition includes:
96 C A. SVM C
• Enqing et al.: The authors use the full-band energy difference, the low-band energy
difference and the zero crossing rate as features to the SVM[58].
• Kinnunen et al.: The authors use the Mel-frequency cepstrum coefficients as the basis
for the feature vectors[59].
• Temko et al.: The authors use Linear Discriminant Analysis (LDA) of FF parameters as
well as metrics estimating the dynamics of the speech signal[60].
• Shin et al.: The authors use the a priori and a posteriori SNR as well as predicted
SNR, estimated by the long-term smoothed power spectra of the background noise and
speech[61]
Fig.A.2 and fig.A.3 demonstrate the output of the proposed SVM and compare it to the
manually generated labels, and the GM method for two different SNR conditions (5dB and
10dB)
Table A.1: Error rates of GM and SVM based VAD for various SNR conditons, compared to
values reported in original study
SNR[dB] GM[%] RBF[%] RBF[%][51] Linear[%][51]
0 53.2 76.9 - -
5 30.5 56.5 30.7 32.8
10 18.2 37.9 22.7 28.0
20 5.47 17.0 21.6 26.9
∞ 0.0 0.06 - -
97
Figure A.2: VAD decisions for a single speech segment from the training set. SNR=5dB
Figure A.3: VAD decisions for a single speech segment from the training set. SNR=20dB
Appendix B
HDA Pin Configuration and Verb Table
B.1 Pin/Port Configuration
0x11 0x01211170 Port A - headphone output, Stream ID 7
0x14 0x01a11110 Port B - microphone input, Stream ID 1
0x15 0x01a11130 Port C - microphone input, Stream ID 3
0x17 0x01a11150 Port E - microphone input, Stream ID 5
0x12 0x40f01020 Port D - unconnected
0x16 0x40f01020 Port F - unconnected
Table B.1: Port Configurations
98
B.2. I V  99
B.2 Initialization Verb table
NID Verb Payload Description
0x03 0x705 0x03 Power down DAC0 to D3 State
0x04 0x705 0x03 Power down DAC1 to D3 State
0x05 0x705 0x03 Power down DAC2 to D3 State
0x06 0x705 0x03 Power down DAC3 to D3 State
0x0a 0x705 0x03 Power down DAC4 to D3 State
0x08 0x705 0x03 Power down ADC0 to D3 State
0x09 0x705 0x03 Power down ADC1 to D3 State
0x0f 0x705 0x03 Power down ADC2 to D3 State
0x11 0x707 0x60 Port A Vref-¿gnd enable headphone amp
0x22 0x701 0x00 Mixer A select input from NID37 = Mux A
0x22 0x3 0xf000 Mixer A unmute
0x37 0x701 0x00 Mux A select input from NID03 = DAC0
0x03 0x705 0x00 Power up DAC0 to D0 state
0x03 0x2 0x0211 Set DAC0 format to 16khz 16bit, stereo
0x03 0x706 0x07 Assign stream ID 07
0x08 0x705 0x00 Power up ADC0 to D0 state
0x08 0x2 0x0211 Set format to 16khz 16bit stereo
0x08 0x706 0x01 Assign stream id 01
0x0c 0x701 0x01 ADC0 select input from NID39 = Microphone boost B
0x0c 0x3 0x7080 ADC0 unmute and gain 0db
0x39 0x3 0xb002 Set Boost b gain to 20db and unmute
0x14 0x707 0x22 Set port B as input and Vref to ground Gnd
0x09 0x705 0x00 Power up ADC1 to D0 state
0x09 0x2 0x0211 Set format to 16khz 16bit stereo
0x09 0x706 0x03 Assign stream id 03
0x0d 0x701 0x02 ADC1 select input from NID3a = Microphone boost C
0x0d 0x3 0x7080 ADC1 unmute and set gain to 0db
0x3a 0x3 0xb002 Set Boost C gain to 20db and unmute
0x15 0x707 0x22 Set port B as input and Vref to ground Gnd
0x0f 0x705 0x00 Power up ADC2 to D0 state
0x0f 0x2 0x0211 Set format to 16khz 16bit stereo
0x0f 0x706 0x05 Assign stream id 05
0x0e 0x701 0x04 ADC2 select input from NID3c = Microphone boost E
0x0e 0x3 0x7080 ADC2 unmute and set gain to 0db
0x3c 0x3 0xb002 Set Boost E gain 20db and unmute
0x17 0x707 0x22 Set port C as input and Vref to Ground
Table B.2: Verb Sequence Configuring the AD1989 to the required specifications
Appendix C
Software Libraries
C.1 Intel Performance Primitives(IPP)
C.1.1 Introduction
Intel Integrated Performance Primitives (Intel IPP) is an extensive library of multicore-ready,
highly optimized software functions for multimedia, data processing, and communications ap-
plications. Intel IPP offers thousands of optimized functions covering frequently used funda-
mental algorithms. It is a multi-threaded library of functions for multimedia and data process-
ing applications, produced by Intel. The library supports Intel and compatible processors and
is available for Windows, Linux, and Mac OS X operating systems. It is available separately
or as a part of Intel Parallel Studio.
The Intel IPP for Intel architecture software enables taking advantage of the parallelism of
the single-instruction, multiple-data (SIMD) instructions that make up the core of the MMX
technology and Streaming SIMD Extensions. These technologies improve the performance
of computation-intensive signal, image, and video processing applications. Use of Intel IPP
primitive functions can help to drastically reduce development costs and accelerate time-to-
market by eliminating the need of writing processor-specific code for computation intensive
routines.
100
C.2. GTK  G 101
C.1.2 Hardware and Software Requirements
The Intel IPP for Intel architecture software runs on personal computers that are based on
processors using IA-32, Intel 64 or IA-64 architecture and running Microsoft Windows* OS,
Linux* OS, or Apple Mac OS* X. Intel IPP can be integrated into the customers application or
library written in C or C++.
Intel IPP for Intel architecture software runs on Windows* OS, Linux* OS, and Mac OS*
X platforms. The code and syntax used in this manual for function and variable declarations
are written in the ANSI C style. However, versions of Intel IPP for different processors or
operating systems may, of necessity, vary slightly.
C.2 GTK and Glade
GTK+, or the GIMP Toolkit, is a multi-platform toolkit for creating graphical user interfaces.
Offering a complete set of widgets, GTK+ is suitable for projects ranging from small one-
off tools to complete application suites. GTK+ is written in C but has been designed from
the ground up to support a wide range of languages, not only C/C++. Using GTK+ from
languages such as Perl and Python (especially in combination with the Glade GUI builder)
provides an effective method of rapid application development.GTK+ is free software and part
of the GNU Project. However, the licensing terms for GTK+, the GNU LGPL, allow it to be
used by all developers, including those developing proprietary software, without any license
fees or royalties.
C.2.1 Provided GUI Widgets
• Windows (normal window or dialog, about and assistant dialogs)
• Displays (label, image, progress bar, status bar)
• Buttons and toggles (check buttons, radio buttons, toggle buttons and link
102 C C. S L
• buttons) Numerical (horizontal or vertical scales and spin buttons) and
• text data entry (with or without completion) Multi-line text editor
• Tree, list and icon grid viewer (with customizable renderers and
• model/view separation) Combo box (with or without an entry)
• Menus (with images, radio buttons and check items)
• Toolbars (with radio buttons, toggle buttons and menu buttons)
• GtkBuilder (creates your user interface from XML)
• Selectors (color selection, file chooser, font selection)
• Layouts (tabulated widget, table widget, expander widget, frames,
• separators and more) Status icon (notification area on Linux, tray icon on Windows)
Printing widgets
• Recently used documents (menu, dialog and manager)
Curriculum Vitae
Name: Aleksandar Mihaylov
Post-Secondary The University Of Western Ontario
Education and London, ON, Canada
Degrees: 2005-2010 B.E.Sc., Electrical Engineering
Major In Medical Biophysics
University of Western Ontario
London, ON
2010-2012 M.E.Sc., Electrical Engineering
Honours and NSERC CGS M
Awards: 2010-2012
Natural Science and Engineering Research Council (NSERC)
Undergraduate Student Research Award(USRA)
2008-2010
TD Bank Financial Group Scholarships in Engineering
2007-2009
Gonder Family award
2006
Related Work Teaching Assistant
Experience: The University of Western Ontario
2010 - 2012
Research Assistant
Canadian Surgical Technologies and Advanced Robotics(CSTAR)
The Univeristy of Western Ontraio
2008-2010
103
