8 research outputs found
Recommended from our members
Operating System Based Perceptual Evaluation of Call Quality in Radio Telecommunications Networks. Development of call quality assessment at mobile terminals using the Symbian operating system, comparison with traditional approaches and proposals for a tariff regime relating call charging to perceived speech quality.
Call quality has been crucial from the inception of telecommunication networks.
Operators need to monitor call quality from the end-user¿s perspective, in order to retain
subscribers and reduce subscriber ¿churn¿. Operators worry not only about call quality and
interconnect revenue loss, but also about network connectivity issues in areas where mobile
network gateways are prevalent. Bandwidth quality as experienced by the end-user is equally
important in helping operators to reduce churn.
The parameters that network operators use to improve call quality are mainly from the
end-user¿s perspective. These parameters are usually ASR (answer seizure ratio), PDD (postdial
delay), NER (network efficiency ratio), the number of calls for which these parameters
have been analyzed and successful calls. Operators use these parameters to evaluate and
optimize the network to meet their quality requirements.
Analysis of speech quality is a major arena for research. Traditionally, users¿ perception
of speech quality has been measured offline using subjective listening tests. Such tests are,
however, slow, tedious and costly. An alternative method is therefore needed; one that can be
automatically computed on the subscriber¿s handset, be available to the operator as well as to
subscribers and, at the same time, provide results that are comparable with conventional
subjective scores. QMeter® ¿ a set of tools for signal and bandwidth measurement that have
been developed bearing in mind all the parameters that influence call and bandwidth quality
experienced by the end-user ¿ addresses these issues and, additionally, facilitates dynamic tariff
propositions which enhance the credibility of the operator.
This research focuses on call quality parameters from the end-user¿s perspective. The
call parameters used in the research are signal strength, successful call rate, normal drop call
rate, and hand-over drop rate. Signal strength is measured for every five milliseconds of an
active call and average signal strength is calculated for each successful call. The successful call
rate, normal drop rate and hand-over drop rate are used to achieve a measurement of the overall
call quality. Call quality with respect to bundles of 10 calls is proposed.
An attempt is made to visualize these parameters for better understanding of where the
quality is bad, good and excellent. This will help operators, as well as user groups, to measure
quality and coverage.
Operators boast about their bandwidth but in reality, to know the locations where speed
has to be improved, they need a tool that can effectively measure speed from the end-user¿s
perspective. BM (bandwidth meter), a tool developed as a part of this research, measures the
average speed of data sessions and stores the information for analysis at different locations.
To address issues of quality in the subscriber segment, this research proposes the
varying of tariffs based on call and bandwidth quality. Call charging based on call quality as
perceived by the end-user is proposed, both to satisfy subscribers and help operators to improve
customer satisfaction and increase average revenue per user. Tariff redemption procedures are
put forward for bundles of 10 calls and 10 data sessions. In addition to the varying of tariffs,
quality escalation processes are proposed. Deploying such tools on selected or random samples
of users will result in substantial improvement in user loyalty which, in turn, will bring
operational and economic advantages
Speech assessment and characterization for law enforcement applications
Speech signals acquired, transmitted or stored in non-ideal conditions are often degraded by
one or more effects including, for example, additive noise. These degradations alter the signal
properties in a manner that deteriorates the intelligibility or quality of the speech signal. In
the law enforcement context such degradations are commonplace due to the limitations in
the audio collection methodology, which is often required to be covert. In severe degradation
conditions, the acquired signal may become unintelligible, losing its value in an investigation
and in less severe conditions, a loss in signal quality may be encountered, which can lead to
higher transcription time and cost.
This thesis proposes a non-intrusive speech assessment framework from which algorithms for
speech quality and intelligibility assessment are derived, to guide the collection and transcription
of law enforcement audio. These methods are trained on a large database labelled using
intrusive techniques (whose performance is verified with subjective scores) and shown to perform
favorably when compared with existing non-intrusive techniques. Additionally, a non-intrusive
CODEC identification and verification algorithm is developed which can identify a CODEC with
an accuracy of 96.8 % and detect the presence of a CODEC with an accuracy higher than 97 %
in the presence of additive noise.
Finally, the speech description taxonomy framework is developed, with the aim of characterizing
various aspects of a degraded speech signal, including the mechanism that results in a signal
with particular characteristics, the vocabulary that can be used to describe those degradations
and the measurable signal properties that can characterize the degradations. The taxonomy is
implemented as a relational database that facilitates the modeling of the relationships between
various attributes of a signal and promises to be a useful tool for training and guiding audio
analysts
Reducing out-of-vocabulary in morphology to improve the accuracy in Arabic dialects speech recognition
This thesis has two aims: developing resources for Arabic dialects and improving the speech recognition of Arabic dialects. Two important components are considered: Pronunciation Dictionary (PD) and Language Model (LM). Six parts are involved, which relate to building and evaluating dialects resources and improving the performance of systems for the speech recognition of dialects.
Three resources are built and evaluated: one tool and two corpora. The methodology that was used for building the multi-dialect morphology analyser involves the proposal and evaluation of linguistic and statistic bases. We obtained an overall accuracy of 94%. The dialect text corpora have four sub-dialects, with more than 50 million tokens. The multi-dialect speech corpora have 32 speech hours, which were collected from 52 participants. The resultant speech corpora have more than 67,000 speech files.
The main objective is improvement in the PDs and LMs of Arabic dialects. The use of incremental methodology made it possible to check orthography and phonology rules incrementally. We were able to distinguish the rules that positively affected the PDs. The Word Error Rate (WER) improved by an accuracy of 5.3% in MSA and 5% in Levantine.
Three levels of morphemes were used to improve the LMs of dialects: stem, prefix+stem and stem+suffix. We checked the three forms using two different types of LMs. Eighteen experiments are carried out on MSA, Gulf dialect and Egyptian dialect, all of which yielded positive results, showing that WERs were reduced by 0.5% to 6.8%
Discriminative preprocessing of speech : towards improving biometric authentication
Im Rahmen des "SecurePhone-Projektes" wurde ein multimodales System zur Benutzerauthentifizierung entwickelt, das auf ein PDA implementiert wurde. Bei der vollzogenen Erweiterung dieses Systems wurde der Möglichkeit nachgegangen, die Benutzerauthentifizierung durch eine auf biometrischen Parametern (E.: "feature enhancement") basierende Unterscheidung zwischen Sprechern sowie durch eine Kombination mehrerer Parameter zu verbessern.
In der vorliegenden Dissertation wird ein allgemeines Bezugssystem zur Verbesserung der Parameter präsentiert, das ein mehrschichtiges neuronales Netz (E.: "MLP: multilayer perceptron") benutzt, um zu einer optimalen Sprecherdiskrimination zu gelangen.
In einem ersten Schritt wird beim Trainieren des MLPs eine Teilmenge der Sprecher (Sprecherbasis) berücksichtigt, um die zugrundeliegenden Charakteristika des vorhandenen akustischen Parameterraums darzustellen.
Am Ende eines zweiten Schrittes steht die Erkenntnis, dass die Größe der verwendeten Sprecherbasis die Leistungsfähigkeit eines Sprechererkennungssystems entscheidend beeinflussen kann.
Ein dritter Schritt führt zur Feststellung, dass sich die Selektion der Sprecherbasis ebenfalls auf die Leistungsfähigkeit des Systems auswirken kann. Aufgrund dieser Beobachtung wird eine automatische Selektionsmethode für die Sprecher auf der Basis des maximalen Durchschnittswertes der Zwischenklassenvariation (between-class variance) vorgeschlagen. Unter Rückgriff auf verschiedene sprachliche Produktionssituationen (Sprachproduktion mit und ohne Hintergrundgeräusche; Sprachproduktion beim Telefonieren) wird gezeigt, dass diese Methode die Leistungsfähigkeit des Erkennungssystems verbessern kann.
Auf der Grundlage dieser Ergebnisse wird erwartet, dass sich die hier für die Sprechererkennung verwendete Methode auch für andere biometrische Modalitäten als sinnvoll erweist.
Zusätzlich wird in der vorliegenden Dissertation eine alternative Parameterrepräsentation vorgeschlagen, die aus der sog. "Sprecher-Stimme-Signatur" (E.: "SVS: speaker voice signature") abgeleitet wird. Die SVS besteht aus Trajektorien in einem Kohonennetz (E.: "SOM: self-organising map"), das den akustischen Raum repräsentiert. Als weiteres Ergebnis der Arbeit erweist sich diese Parameterrepräsentation als Ergänzung zu dem zugrundeliegenden Parameterset. Deshalb liegt eine Kombination beider Parametersets im Sinne einer Verbesserung der Leistungsfähigkeit des Erkennungssystems nahe.
Am Ende der Arbeit sind schließlich einige potentielle Erweiterungsmöglichkeiten zu den vorgestellten Methoden zu finden.
Schlüsselwörter: Feature Enhancement, MLP, SOM, Sprecher-Basis-Selektion, SprechererkennungIn the context of the SecurePhone project, a multimodal user authentication system was developed for implementation on a PDA. Extending this system, we investigate biometric feature enhancement and multi-feature fusion with the aim of improving user authentication accuracy.
In this dissertation, a general framework for feature enhancement is proposed which uses a multilayer perceptron (MLP) to achieve optimal speaker discrimination.
First, to train this MLP a subset of speakers (speaker basis) is used to represent the underlying characteristics of the given acoustic feature space.
Second, the size of the speaker basis is found to be among the crucial factors affecting the performance of a speaker recognition system.
Third, it is found that the selection of the speaker basis can also influence system performance. Based on this observation, an automatic speaker selection approach is proposed on the basis of the maximal average between-class variance. Tests in a variety of conditions, including clean and noisy as well as telephone speech, show that this approach can improve the performance of speaker recognition systems. This approach, which is applied here to feature enhancement for speaker recognition, can be expected to also be effective with other biometric modalities besides speech.
Further, an alternative feature representation is proposed in this dissertation, which is derived from what we call speaker voice signatures (SVS). These are trajectories in a Kohonen self organising map (SOM) which has been trained to represent the acoustic space. This feature representation is found to be somewhat complementary to the baseline feature set, suggesting that they can be fused to achieve improved performance in speaker recognition.
Finally, this dissertation finishes with a number of potential extensions of the proposed approaches.
Keywords: feature enhancement, MLP, SOM, speaker basis selection, speaker recognition, biometric, authentication, verificatio
Automatic speaker recognition: modelling, feature extraction and effects of clinical environment
Speaker recognition is the task of establishing identity of an individual based on his/her voice. It has a significant potential as a convenient biometric method for telephony applications and does not require sophisticated or dedicated hardware. The Speaker Recognition task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker-specific feature parameters from the speech. The features are used to generate statistical models of different speakers. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Current state of the art speaker recognition systems use the Gaussian mixture model (GMM) technique in combination with the Expectation Maximization (EM) algorithm to build the speaker models. The most frequently used features are the Mel Frequency Cepstral Coefficients (MFCC). This thesis investigated areas of possible improvements in the field of speaker recognition. The identified drawbacks of the current speaker recognition systems included: slow convergence rates of the modelling techniques and feature’s sensitivity to changes due aging of speakers, use of alcohol and drugs, changing health conditions and mental state. The thesis proposed a new method of deriving the Gaussian mixture model (GMM) parameters called the EM-ITVQ algorithm. The EM-ITVQ showed a significant improvement of the equal error rates and higher convergence rates when compared to the classical GMM based on the expectation maximization (EM) method. It was demonstrated that features based on the nonlinear model of speech production (TEO based features) provided better performance compare to the conventional MFCCs features. For the first time the effect of clinical depression on the speaker verification rates was tested. It was demonstrated that the speaker verification results deteriorate if the speakers are clinically depressed. The deterioration process was demonstrated using conventional (MFCC) features. The thesis also showed that when replacing the MFCC features with features based on the nonlinear model of speech production (TEO based features), the detrimental effect of the clinical depression on speaker verification rates can be reduced
Robust speech recognition under band-limited channels and other channel distortions
Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, junio de 200
Electronic Bulls and Bears: U.S. Securities Markets and Information Technology
This report responds to requests by the House Committee on Energy and Commerce and the House Committee on Government Operations to assess the role that communication and information technologies play in the securities markets. The Committee desired a benchmark for gauging progress made toward the national market system envisioned by the 1975 Act. This report assesses the current use of information technology by U.S. securities exchanges and over-the-counter dealers, by related futures and options markets, and by associated industries and regulatory agencies