512 research outputs found

    Non-Intrusive Subscriber Authentication for Next Generation Mobile Communication Systems

    Get PDF
    Merged with duplicate record 10026.1/753 on 14.03.2017 by CS (TIS)The last decade has witnessed massive growth in both the technological development, and the consumer adoption of mobile devices such as mobile handsets and PDAs. The recent introduction of wideband mobile networks has enabled the deployment of new services with access to traditionally well protected personal data, such as banking details or medical records. Secure user access to this data has however remained a function of the mobile device's authentication system, which is only protected from masquerade abuse by the traditional PIN, originally designed to protect against telephony abuse. This thesis presents novel research in relation to advanced subscriber authentication for mobile devices. The research began by assessing the threat of masquerade attacks on such devices by way of a survey of end users. This revealed that the current methods of mobile authentication remain extensively unused, leaving terminals highly vulnerable to masquerade attack. Further investigation revealed that, in the context of the more advanced wideband enabled services, users are receptive to many advanced authentication techniques and principles, including the discipline of biometrics which naturally lends itself to the area of advanced subscriber based authentication. To address the requirement for a more personal authentication capable of being applied in a continuous context, a novel non-intrusive biometric authentication technique was conceived, drawn from the discrete disciplines of biometrics and Auditory Evoked Responses. The technique forms a hybrid multi-modal biometric where variations in the behavioural stimulus of the human voice (due to the propagation effects of acoustic waves within the human head), are used to verify the identity o f a user. The resulting approach is known as the Head Authentication Technique (HAT). Evaluation of the HAT authentication process is realised in two stages. Firstly, the generic authentication procedures of registration and verification are automated within a prototype implementation. Secondly, a HAT demonstrator is used to evaluate the authentication process through a series of experimental trials involving a representative user community. The results from the trials confirm that multiple HAT samples from the same user exhibit a high degree of correlation, yet samples between users exhibit a high degree of discrepancy. Statistical analysis of the prototypes performance realised early system error rates of; FNMR = 6% and FMR = 0.025%. The results clearly demonstrate the authentication capabilities of this novel biometric approach and the contribution this new work can make to the protection of subscriber data in next generation mobile networks.Orange Personal Communication Services Lt

    Perceptual techniques in audio quality assessment

    Get PDF

    Quality of media traffic over Lossy internet protocol networks: Measurement and improvement.

    Get PDF
    Voice over Internet Protocol (VoIP) is an active area of research in the world of communication. The high revenue made by the telecommunication companies is a motivation to develop solutions that transmit voice over other media rather than the traditional, circuit switching network. However, while IP networks can carry data traffic very well due to their besteffort nature, they are not designed to carry real-time applications such as voice. As such several degradations can happen to the speech signal before it reaches its destination. Therefore, it is important for legal, commercial, and technical reasons to measure the quality of VoIP applications accurately and non-intrusively. Several methods were proposed to measure the speech quality: some of these methods are subjective, others are intrusive-based while others are non-intrusive. One of the non-intrusive methods for measuring the speech quality is the E-model standardised by the International Telecommunication Union-Telecommunication Standardisation Sector (ITU-T). Although the E-model is a non-intrusive method for measuring the speech quality, but it depends on the time-consuming, expensive and hard to conduct subjective tests to calibrate its parameters, consequently it is applicable to a limited number of conditions and speech coders. Also, it is less accurate than the intrusive methods such as Perceptual Evaluation of Speech Quality (PESQ) because it does not consider the contents of the received signal. In this thesis an approach to extend the E-model based on PESQ is proposed. Using this method the E-model can be extended to new network conditions and applied to new speech coders without the need for the subjective tests. The modified E-model calibrated using PESQ is compared with the E-model calibrated using i ii subjective tests to prove its effectiveness. During the above extension the relation between quality estimation using the E-model and PESQ is investigated and a correction formula is proposed to correct the deviation in speech quality estimation. Another extension to the E-model to improve its accuracy in comparison with the PESQ looks into the content of the degraded signal and classifies packet loss into either Voiced or Unvoiced based on the received surrounding packets. The accuracy of the proposed method is evaluated by comparing the estimation of the new method that takes packet class into consideration with the measurement provided by PESQ as a more accurate, intrusive method for measuring the speech quality. The above two extensions for quality estimation of the E-model are combined to offer a method for estimating the quality of VoIP applications accurately, nonintrusively without the need for the time-consuming, expensive, and hard to conduct subjective tests. Finally, the applicability of the E-model or the modified E-model in measuring the quality of services in Service Oriented Computing (SOC) is illustrated

    Quality aspects of Internet telephony

    Get PDF
    Internet telephony has had a tremendous impact on how people communicate. Many now maintain contact using some form of Internet telephony. Therefore the motivation for this work has been to address the quality aspects of real-world Internet telephony for both fixed and wireless telecommunication. The focus has been on the quality aspects of voice communication, since poor quality leads often to user dissatisfaction. The scope of the work has been broad in order to address the main factors within IP-based voice communication. The first four chapters of this dissertation constitute the background material. The first chapter outlines where Internet telephony is deployed today. It also motivates the topics and techniques used in this research. The second chapter provides the background on Internet telephony including signalling, speech coding and voice Internetworking. The third chapter focuses solely on quality measures for packetised voice systems and finally the fourth chapter is devoted to the history of voice research. The appendix of this dissertation constitutes the research contributions. It includes an examination of the access network, focusing on how calls are multiplexed in wired and wireless systems. Subsequently in the wireless case, we consider how to handover calls from 802.11 networks to the cellular infrastructure. We then consider the Internet backbone where most of our work is devoted to measurements specifically for Internet telephony. The applications of these measurements have been estimating telephony arrival processes, measuring call quality, and quantifying the trend in Internet telephony quality over several years. We also consider the end systems, since they are responsible for reconstructing a voice stream given loss and delay constraints. Finally we estimate voice quality using the ITU proposal PESQ and the packet loss process. The main contribution of this work is a systematic examination of Internet telephony. We describe several methods to enable adaptable solutions for maintaining consistent voice quality. We have also found that relatively small technical changes can lead to substantial user quality improvements. A second contribution of this work is a suite of software tools designed to ascertain voice quality in IP networks. Some of these tools are in use within commercial systems today

    Multimedia Communication in e-Government Interface: A Usability and User Trust Investigation

    Get PDF
    In the past few years, e-government has been a topic of much interest among those excited about the advent of Web technologies. Due to the growing demand for effective communication to facilitate real-time interaction between users and e-government applications, many governments are considering installing new tools by e-government portals to mitigate the problems associated with user – interface communication. Therefore, this study is to indicate the use of multimodal metaphors such as audio-visual avatars in e-government interfaces; to increase the user performance of communications and to reduce information overload and lack of trust that is common with many e-government interfaces. However, only a minority of empirical studies has been focused on assessing the role of audio-visual metaphors in e-government. Therefore, the subject of this thesis’ investigation was the use of novel combinations of multimodal metaphors in the presentation of messaging content to produce an evaluation of these combinations’ effects on the users’ communication performance as well as the usability of e-government interfaces and perception of trust. The thesis outlines research comprising three experimental phases. An initial experiment was to explore and compare the usability of text in the presentation of the messaging content versus recorded speech and text with graphic metaphors. The second experimental was to investigate two different styles of incorporating initial avatars versus the auditory channel. The third experiment examined a novel approach around the use of speaking avatars with human-like facial expressions, obverse speaking avatars full body gestures during the presentation of the messaging content to compare the usability and communication performance as well as the perception of trust. The achieved results demonstrated the usefulness of the tested metaphors to enhance e-government usability, improve the performance of communication and increase users’ trust. A set of empirically derived ground-breaking guidelines for the design and use of these metaphors to generate more usable e-government interfaces was the overall provision of the results.Saudi Arabia Embass

    The impact of alerting designs on air traffic controller's eye movement patterns and situation awareness

    Get PDF
    This research investigated controller’ situation awareness by comparing COOPANS’s acoustic alerts with newly designed semantic alerts. The results demonstrate that ATCOs’ visual scan patterns had significant differences between acoustic and semantic designs. ATCOs established different eye movement patterns on fixations number, fixation duration and saccade velocity. Effective decision support systems require human-centred design with effective stimuli to direct ATCO’s attention to critical events. It is necessary to provide ATCOs with specific alerting information to reflect the nature of of the critical situation in order to minimize the side-effects of startle and inattentional deafness. Consequently, the design of a semantic alert can significantly reduce ATCOs’ response time, therefore providing valuable extra time in a time-limited situation to formulate and execute resolution strategies in critical air safety events. The findings of this research indicate that the context-specified design of semantic alerts could improve ATCO’s situational awareness and significantly reduce response time in the event of Short Term Conflict Alert activation which alerts to two aircraft having less than the required lateral or vertical separation

    Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation

    Get PDF
    Over the past few years, speech recognition technology performance on tasks ranging from isolated digit recognition to conversational speech has dramatically improved. Performance on limited recognition tasks in noiseree environments is comparable to that achieved by human transcribers. This advancement in automatic speech recognition technology along with an increase in the compute power of mobile devices, standardization of communication protocols, and the explosion in the popularity of the mobile devices, has created an interest in flexible voice interfaces for mobile devices. However, speech recognition performance degrades dramatically in mobile environments which are inherently noisy. In the recent past, a great amount of effort has been spent on the development of front ends based on advanced noise robust approaches. The primary objective of this thesis was to analyze the performance of two advanced front ends, referred to as the QIO and MFA front ends, on a speech recognition task based on the Wall Street Journal database. Though the advanced front ends are shown to achieve a significant improvement over an industry-standard baseline front end, this improvement is not operationally significant. Further, we show that the results of this evaluation were not significantly impacted by suboptimal recognition system parameter settings. Without any front end-specific tuning, the MFA front end outperforms the QIO front end by 9.6% relative. With tuning, the relative performance gap increases to 15.8%. Finally, we also show that mismatched microphone and additive noise evaluation conditions resulted in a significant degradation in performance for both front ends

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique
    • …
    corecore