Search CORE

146 research outputs found

Study to determine potential flight applications and human factors design guidelines for voice recognition and synthesis systems

Author: Parks D. L.
White R. W.
Publication venue
Publication date
Field of study

A study was conducted to determine potential commercial aircraft flight deck applications and implementation guidelines for voice recognition and synthesis. At first, a survey of voice recognition and synthesis technology was undertaken to develop a working knowledge base. Then, numerous potential aircraft and simulator flight deck voice applications were identified and each proposed application was rated on a number of criteria in order to achieve an overall payoff rating. The potential voice recognition applications fell into five general categories: programming, interrogation, data entry, switch and mode selection, and continuous/time-critical action control. The ratings of the first three categories showed the most promise of being beneficial to flight deck operations. Possible applications of voice synthesis systems were categorized as automatic or pilot selectable and many were rated as being potentially beneficial. In addition, voice system implementation guidelines and pertinent performance criteria are proposed. Finally, the findings of this study are compared with those made in a recent NASA study of a 1995 transport concept

NASA Technical Reports Server

Orthogonal transmultiplexers : extensions to digital subscriber line (DSL) communications

Author: Lin Xueming
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/1998
Field of study

An orthogonal transmultiplexer which unifies multirate filter bank theory and communications theory is investigated in this dissertation. Various extensions of the orthogonal transmultiplexer techniques have been made for digital subscriber line communication applications. It is shown that the theoretical performance bounds of single carrier modulation based transceivers and multicarrier modulation based transceivers are the same under the same operational conditions. Single carrier based transceiver systems such as Quadrature Amplitude Modulation (QAM) and Carrierless Amplitude and Phase (CAP) modulation scheme, multicarrier based transceiver systems such as Orthogonal Frequency Division Multiplexing (OFDM) or Discrete Multi Tone (DMT) and Discrete Subband (Wavelet) Multicarrier based transceiver (DSBMT) techniques are considered in this investigation. The performance of DMT and DSBMT based transceiver systems for a narrow band interference and their robustness are also investigated. It is shown that the performance of a DMT based transceiver system is quite sensitive to the location and strength of a single tone (narrow band) interference. The performance sensitivity is highlighted in this work. It is shown that an adaptive interference exciser can alleviate the sensitivity problem of a DMT based system. The improved spectral properties of DSBMT technique reduces the performance sensitivity for variations of a narrow band interference. It is shown that DSBMT technique outperforms DMT and has a more robust performance than the latter. The superior performance robustness is shown in this work. Optimal orthogonal basis design using cosine modulated multirate filter bank is discussed. An adaptive linear combiner at the output of analysis filter bank is implemented to eliminate the intersymbol and interchannel interferences. It is shown that DSBMT is the most suitable technique for a narrow band interference environment. A blind channel identification and optimal MMSE based equalizer employing a nonmaximally decimated filter bank precoder / postequalizer structure is proposed. The performance of blind channel identification scheme is shown not to be sensitive to the characteristics of unknown channel. The performance of the proposed optimal MMSE based equalizer is shown to be superior to the zero-forcing equalizer

Digital Commons @ New Jersey Institute of Technology (NJIT)

The development of speech coding and the first standard coder for public mobile telephony

Author: Sluijter R.J.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2005
Field of study

This thesis describes in its core chapter (Chapter 4) the original algorithmic and design features of the ??rst coder for public mobile telephony, the GSM full-rate speech coder, as standardized in 1988. It has never been described in so much detail as presented here. The coder is put in a historical perspective by two preceding chapters on the history of speech production models and the development of speech coding techniques until the mid 1980s, respectively. In the epilogue a brief review is given of later developments in speech coding. The introductory Chapter 1 starts with some preliminaries. It is de- ??ned what speech coding is and the reader is introduced to speech coding standards and the standardization institutes which set them. Then, the attributes of a speech coder playing a role in standardization are explained. Subsequently, several applications of speech coders - including mobile telephony - will be discussed and the state of the art in speech coding will be illustrated on the basis of some worldwide recognized standards. Chapter 2 starts with a summary of the features of speech signals and their source, the human speech organ. Then, historical models of speech production which form the basis of di??erent kinds of modern speech coders are discussed. Starting with a review of ancient mechanical models, we will arrive at the electrical source-??lter model of the 1930s. Subsequently, the acoustic-tube models as they arose in the 1950s and 1960s are discussed. Finally the 1970s are reviewed which brought the discrete-time ??lter model on the basis of linear prediction. In a unique way the logical sequencing of these models is exposed, and the links are discussed. Whereas the historical models are discussed in a narrative style, the acoustic tube models and the linear prediction tech nique as applied to speech, are subject to more mathematical analysis in order to create a sound basis for the treatise of Chapter 4. This trend continues in Chapter 3, whenever instrumental in completing that basis. In Chapter 3 the reader is taken by the hand on a guided tour through time during which successive speech coding methods pass in review. In an original way special attention is paid to the evolutionary aspect. Speci??cally, for each newly proposed method it is discussed what it added to the known techniques of the time. After presenting the relevant predecessors starting with Pulse Code Modulation (PCM) and the early vocoders of the 1930s, we will arrive at Residual-Excited Linear Predictive (RELP) coders, Analysis-by-Synthesis systems and Regular- Pulse Excitation in 1984. The latter forms the basis of the GSM full-rate coder. In Chapter 4, which constitutes the core of this thesis, explicit forms of Multi-Pulse Excited (MPE) and Regular-Pulse Excited (RPE) analysis-by-synthesis coding systems are developed. Starting from current pulse-amplitude computation methods in 1984, which included solving sets of equations (typically of order 10-16) two hundred times a second, several explicit-form designs are considered by which solving sets of equations in real time is avoided. Then, the design of a speci??c explicitform RPE coder and an associated eÆcient architecture are described. The explicit forms and the resulting architectural features have never been published in so much detail as presented here. Implementation of such a codec enabled real-time operation on a state-of-the-art singlechip digital signal processor of the time. This coder, at a bit rate of 13 kbit/s, has been selected as the Full-Rate GSM standard in 1988. Its performance is recapitulated. Chapter 5 is an epilogue brie y reviewing the major developments in speech coding technology after 1988. Many speech coding standards have been set, for mobile telephony as well as for other applications, since then. The chapter is concluded by an outlook

Repository TU/e

Pure OAI Repository

Advances and trends in automatic speech recognition

Author: MARIANI (J.)
Publication venue: GRETSI, Saint Martin d'Hères, France
Publication date: 01/01/1990
Field of study

This paper aimts at giving an overview of récent advances in the domain of Speech Recognition . The paper mainly focttses on Speech Recognition, but also mentions some progress in other areas of Speech Processing (spea er recognition, speech synthesis, speech analysis and coding) using similar methodologies. It first gives a view of what the problems related to aulomatic speech processing are, and then describes the initial approaches that have been followed in order to address Chose problems . It then introduces thé methodological novelties that allowed for progress along three axes : from isolated-word recognition to continuous speech, from spea er-dependent recognition to spea er-independent, and from small vocabularies to large rocabularies. Special emphasis centers on tlie improvements made possible by Mar ov Models . and, more recently, hy Connectionist Models, resulting in progress simultaneously obtained along the above différent axes, in improved performance for difficult vocabularies, or in more robust systems . Some specialised hardware is also described, as well as the efforts aimed ai assessing Speech Recognition systems.Le but de cet article est de donner un aperçu des progrès récents obtenus dans le domaine de la reconnaissance automatique de la parole . Il traite essentiellement de la reconnaissance vocale, mais mentionne également les progrès réalisés dans d'autres domaines du Traitement Automatique de la Parole (Reconnaissance du Locuteur, Synthèse de Parole . Analyse et Codage), qui utilisent des méthodes voisines. Ensuite, sont introduites les nouveautés méthodologiques qui ont permis des progrès suivant trois axes : des mots isolés vers la parole continue, de la reconnaissance monolocuteur vers la reconnaissance multilocuteur, et des petits vocabulaires vers les grands vocabulaires . Une mention spéciale est accordée aux améliorations qui ont été rendues possibles par les Modèles Mar oviens, et, plus récemment, par les Modèles Connexionnistes . Ces méthodes ont conduit à des progrès obtenus concurremment suivant plusieurs axes, à des performances meilleures sur les vocabulaires difficiles, ou à des systèmes plus robustes . Quelques matériels spécialisés sont également décrits, ainsi que les efforts qui ont été consentis dans le but d'évaluer la qualité des systèmes de reconnaissanc

I-Revues

Automatic speaker recognition: modelling, feature extraction and effects of clinical environment

Author: Memon S
Publication venue: RMIT University
Publication date: 01/01/2010
Field of study

Speaker recognition is the task of establishing identity of an individual based on his/her voice. It has a significant potential as a convenient biometric method for telephony applications and does not require sophisticated or dedicated hardware. The Speaker Recognition task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker-specific feature parameters from the speech. The features are used to generate statistical models of different speakers. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Current state of the art speaker recognition systems use the Gaussian mixture model (GMM) technique in combination with the Expectation Maximization (EM) algorithm to build the speaker models. The most frequently used features are the Mel Frequency Cepstral Coefficients (MFCC). This thesis investigated areas of possible improvements in the field of speaker recognition. The identified drawbacks of the current speaker recognition systems included: slow convergence rates of the modelling techniques and feature’s sensitivity to changes due aging of speakers, use of alcohol and drugs, changing health conditions and mental state. The thesis proposed a new method of deriving the Gaussian mixture model (GMM) parameters called the EM-ITVQ algorithm. The EM-ITVQ showed a significant improvement of the equal error rates and higher convergence rates when compared to the classical GMM based on the expectation maximization (EM) method. It was demonstrated that features based on the nonlinear model of speech production (TEO based features) provided better performance compare to the conventional MFCCs features. For the first time the effect of clinical depression on the speaker verification rates was tested. It was demonstrated that the speaker verification results deteriorate if the speakers are clinically depressed. The deterioration process was demonstrated using conventional (MFCC) features. The thesis also showed that when replacing the MFCC features with features based on the nonlinear model of speech production (TEO based features), the detrimental effect of the clinical depression on speaker verification rates can be reduced

RMIT Research Repository

Combating Misinformation in the Age of LLMs: Opportunities and Challenges

Author: Chen Canyu
Shu Kai
Publication venue
Publication date: 08/11/2023
Field of study

Misinformation such as fake news and rumors is a serious threat on information ecosystems and public trust. The emergence of Large Language Models (LLMs) has great potential to reshape the landscape of combating misinformation. Generally, LLMs can be a double-edged sword in the fight. On the one hand, LLMs bring promising opportunities for combating misinformation due to their profound world knowledge and strong reasoning abilities. Thus, one emergent question is: how to utilize LLMs to combat misinformation? On the other hand, the critical challenge is that LLMs can be easily leveraged to generate deceptive misinformation at scale. Then, another important question is: how to combat LLM-generated misinformation? In this paper, we first systematically review the history of combating misinformation before the advent of LLMs. Then we illustrate the current efforts and present an outlook for these two fundamental questions respectively. The goal of this survey paper is to facilitate the progress of utilizing LLMs for fighting misinformation and call for interdisciplinary efforts from different stakeholders for combating LLM-generated misinformation.Comment: 9 pages for the main paper, 35 pages including 656 references, more resources on "LLMs Meet Misinformation" are on the website: https://llm-misinformation.github.io

arXiv.org e-Print Archive

Current state of digital signal processing in myoelectric interfaces and related applications

Author: Hakonen Maria
Piitulainen Harri
Visala Arto
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

This review discusses the critical issues and recommended practices from the perspective of myoelectric interfaces. The major benefits and challenges of myoelectric interfaces are evaluated. The article aims to fill gaps left by previous reviews and identify avenues for future research. Recommendations are given, for example, for electrode placement, sampling rate, segmentation, and classifiers. Four groups of applications where myoelectric interfaces have been adopted are identified: assistive technology, rehabilitation technology, input devices, and silent speech interfaces. The state-of-the-art applications in each of these groups are presented.Peer reviewe

Elsevier - Publisher Connector

Crossref

Aaltodoc Publication Archive