512 research outputs found
Non-Intrusive Subscriber Authentication for Next Generation Mobile Communication Systems
Merged with duplicate record 10026.1/753 on 14.03.2017 by CS (TIS)The last decade has witnessed massive growth in both the technological development, and
the consumer adoption of mobile devices such as mobile handsets and PDAs. The recent
introduction of wideband mobile networks has enabled the deployment of new services
with access to traditionally well protected personal data, such as banking details or
medical records. Secure user access to this data has however remained a function of the
mobile device's authentication system, which is only protected from masquerade abuse by
the traditional PIN, originally designed to protect against telephony abuse.
This thesis presents novel research in relation to advanced subscriber authentication for
mobile devices. The research began by assessing the threat of masquerade attacks on
such devices by way of a survey of end users. This revealed that the current methods of
mobile authentication remain extensively unused, leaving terminals highly vulnerable to
masquerade attack. Further investigation revealed that, in the context of the more
advanced wideband enabled services, users are receptive to many advanced
authentication techniques and principles, including the discipline of biometrics which
naturally lends itself to the area of advanced subscriber based authentication.
To address the requirement for a more personal authentication capable of being applied
in a continuous context, a novel non-intrusive biometric authentication technique was
conceived, drawn from the discrete disciplines of biometrics and Auditory Evoked
Responses. The technique forms a hybrid multi-modal biometric where variations in the
behavioural stimulus of the human voice (due to the propagation effects of acoustic
waves within the human head), are used to verify the identity o f a user. The resulting
approach is known as the Head Authentication Technique (HAT).
Evaluation of the HAT authentication process is realised in two stages. Firstly, the
generic authentication procedures of registration and verification are automated within a
prototype implementation. Secondly, a HAT demonstrator is used to evaluate the
authentication process through a series of experimental trials involving a representative
user community. The results from the trials confirm that multiple HAT samples from
the same user exhibit a high degree of correlation, yet samples between users exhibit a
high degree of discrepancy. Statistical analysis of the prototypes performance realised
early system error rates of; FNMR = 6% and FMR = 0.025%. The results clearly
demonstrate the authentication capabilities of this novel biometric approach and the
contribution this new work can make to the protection of subscriber data in next
generation mobile networks.Orange Personal Communication Services Lt
Quality of media traffic over Lossy internet protocol networks: Measurement and improvement.
Voice over Internet Protocol (VoIP) is an active area of research in the world of
communication. The high revenue made by the telecommunication companies is a
motivation to develop solutions that transmit voice over other media rather than
the traditional, circuit switching network.
However, while IP networks can carry data traffic very well due to their besteffort
nature, they are not designed to carry real-time applications such as voice.
As such several degradations can happen to the speech signal before it reaches its
destination. Therefore, it is important for legal, commercial, and technical reasons
to measure the quality of VoIP applications accurately and non-intrusively.
Several methods were proposed to measure the speech quality: some of these
methods are subjective, others are intrusive-based while others are non-intrusive.
One of the non-intrusive methods for measuring the speech quality is the E-model
standardised by the International Telecommunication Union-Telecommunication Standardisation
Sector (ITU-T).
Although the E-model is a non-intrusive method for measuring the speech quality,
but it depends on the time-consuming, expensive and hard to conduct subjective
tests to calibrate its parameters, consequently it is applicable to a limited number
of conditions and speech coders. Also, it is less accurate than the intrusive methods
such as Perceptual Evaluation of Speech Quality (PESQ) because it does not consider
the contents of the received signal.
In this thesis an approach to extend the E-model based on PESQ is proposed.
Using this method the E-model can be extended to new network conditions and
applied to new speech coders without the need for the subjective tests. The modified
E-model calibrated using PESQ is compared with the E-model calibrated using
i
ii
subjective tests to prove its effectiveness.
During the above extension the relation between quality estimation using the
E-model and PESQ is investigated and a correction formula is proposed to correct
the deviation in speech quality estimation.
Another extension to the E-model to improve its accuracy in comparison with
the PESQ looks into the content of the degraded signal and classifies packet loss
into either Voiced or Unvoiced based on the received surrounding packets. The accuracy
of the proposed method is evaluated by comparing the estimation of the new
method that takes packet class into consideration with the measurement provided
by PESQ as a more accurate, intrusive method for measuring the speech quality.
The above two extensions for quality estimation of the E-model are combined
to offer a method for estimating the quality of VoIP applications accurately, nonintrusively
without the need for the time-consuming, expensive, and hard to conduct
subjective tests.
Finally, the applicability of the E-model or the modified E-model in measuring
the quality of services in Service Oriented Computing (SOC) is illustrated
Quality aspects of Internet telephony
Internet telephony has had a tremendous impact on how people communicate.
Many now maintain contact using some form of Internet telephony.
Therefore the motivation for this work has been to address the quality aspects
of real-world Internet telephony for both fixed and wireless telecommunication.
The focus has been on the quality aspects of voice communication,
since poor quality leads often to user dissatisfaction. The scope of the work
has been broad in order to address the main factors within IP-based voice
communication.
The first four chapters of this dissertation constitute the background
material. The first chapter outlines where Internet telephony is deployed
today. It also motivates the topics and techniques used in this research.
The second chapter provides the background on Internet telephony including
signalling, speech coding and voice Internetworking. The third chapter
focuses solely on quality measures for packetised voice systems and finally
the fourth chapter is devoted to the history of voice research.
The appendix of this dissertation constitutes the research contributions.
It includes an examination of the access network, focusing on how calls are
multiplexed in wired and wireless systems. Subsequently in the wireless
case, we consider how to handover calls from 802.11 networks to the cellular
infrastructure. We then consider the Internet backbone where most of our
work is devoted to measurements specifically for Internet telephony. The
applications of these measurements have been estimating telephony arrival
processes, measuring call quality, and quantifying the trend in Internet telephony
quality over several years. We also consider the end systems, since
they are responsible for reconstructing a voice stream given loss and delay
constraints. Finally we estimate voice quality using the ITU proposal PESQ
and the packet loss process.
The main contribution of this work is a systematic examination of Internet
telephony. We describe several methods to enable adaptable solutions
for maintaining consistent voice quality. We have also found that relatively
small technical changes can lead to substantial user quality improvements.
A second contribution of this work is a suite of software tools designed to
ascertain voice quality in IP networks. Some of these tools are in use within
commercial systems today
Multimedia Communication in e-Government Interface: A Usability and User Trust Investigation
In the past few years, e-government has been a topic of much interest among those excited about the advent of Web technologies. Due to the growing demand for effective communication to facilitate real-time interaction between users and e-government applications, many governments are considering installing new tools by e-government portals to mitigate the problems associated with user – interface communication. Therefore, this study is to indicate the use of multimodal metaphors such as audio-visual avatars in e-government interfaces; to increase the user performance of communications and to reduce information overload and lack of trust that is common with many e-government interfaces. However, only a minority of empirical studies has been focused on assessing the role of audio-visual metaphors in e-government. Therefore, the subject of this thesis’ investigation was the use of novel combinations of multimodal metaphors in the presentation of messaging content to produce an evaluation of these combinations’ effects on the users’ communication performance as well as the usability of e-government interfaces and perception of trust. The thesis outlines research comprising three experimental phases. An initial experiment was to explore and compare the usability of text in the presentation of the messaging content versus recorded speech and text with graphic metaphors. The second experimental was to investigate two different styles of incorporating initial avatars versus the auditory channel. The third experiment examined a novel approach around the use of speaking avatars with human-like facial expressions, obverse speaking avatars full body gestures during the presentation of the messaging content to compare the usability and communication performance as well as the perception of trust. The achieved results demonstrated the usefulness of the tested metaphors to enhance e-government usability, improve the performance of communication and increase users’ trust. A set of empirically derived ground-breaking guidelines for the design and use of these metaphors to generate more usable e-government interfaces was the overall provision of the results.Saudi Arabia Embass
The impact of alerting designs on air traffic controller's eye movement patterns and situation awareness
This research investigated controller’ situation awareness by comparing COOPANS’s acoustic alerts with newly designed semantic alerts. The results demonstrate that ATCOs’ visual scan patterns had significant differences between acoustic and semantic designs. ATCOs established different eye movement patterns on fixations number, fixation duration and saccade velocity. Effective decision support systems require human-centred design with effective stimuli to direct ATCO’s attention to critical events. It is necessary to provide ATCOs with specific alerting information to reflect the nature of of the critical situation in order to minimize the side-effects of startle and inattentional deafness. Consequently, the design of a semantic alert can significantly reduce ATCOs’ response time, therefore providing valuable extra time in a time-limited situation to formulate and execute resolution strategies in critical air safety events. The findings of this research indicate that the context-specified design of semantic alerts could improve ATCO’s situational awareness and significantly reduce response time in the event of Short Term Conflict Alert activation which alerts to two aircraft having less than the required lateral or vertical separation
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation
Over the past few years, speech recognition technology performance on tasks ranging from isolated digit recognition to conversational speech has dramatically improved. Performance on limited recognition tasks in noiseree environments is comparable to that achieved by human transcribers. This advancement in automatic speech recognition technology along with an increase in the compute power of mobile devices, standardization of communication protocols, and the explosion in the popularity of the mobile devices, has created an interest in flexible voice interfaces for mobile devices. However, speech recognition performance degrades dramatically in mobile environments which are inherently noisy. In the recent past, a great amount of effort has been spent on the development of front ends based on advanced noise robust approaches. The primary objective of this thesis was to analyze the performance of two advanced front ends, referred to as the QIO and MFA front ends, on a speech recognition task based on the Wall Street Journal database. Though the advanced front ends are shown to achieve a significant improvement over an industry-standard baseline front end, this improvement is not operationally significant. Further, we show that the results of this evaluation were not significantly impacted by suboptimal recognition system parameter settings. Without any front end-specific tuning, the MFA front end outperforms the QIO front end by 9.6% relative. With tuning, the relative performance gap increases to 15.8%. Finally, we also show that mismatched microphone and additive noise evaluation conditions resulted in a significant degradation in performance for both front ends
An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony
In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique
An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony
In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique
- …