294 research outputs found
A language for interactive speech dialog specification
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.Includes bibliographical references (leaves 113-114).by Ira Scharf.M.S
Toward effective conversational messaging
Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (leaves 118-123).Matthew Talin Marx.M.S
Voice technologies in advanced computer systems
Recently there has been a surge of interest using voice technologies in advanced workstations. The motivation for voice is its role as the primary channel of human-to-human communication, which ties in with current research in which computers are used to facilitate group problem solving, in enhanced user interfaces, and office computing. Taken broadly, the use of speech as a command and data channel may require digital recording and playback techniques, speech recognition, text-to-speech synthesis, and telephone interface equipment. The big payoff will be to build systems, using these technologies, to allow computers to become a part of the infrastructure of daily human communication (Schmandt and Arons, 1985)
A Speech recognition-based telephone auto-attendant
This dissertation details the implementation of a real-time, speaker-independent telephone auto attendant from first principles on limited quality speech data. An auto attendant is a computerized agent that answers the phone and switches the caller through to the desired person's extension after conducting a limited dialogue to determine the wishes of the caller, through the use of speech recognition technology. The platform is a computer with a telephone interface card. The speech recognition engine uses whole word hidden Markov modelling, with limited vocabulary and constrained (finite state) grammar. The feature set used is based on Mel frequency spaced cepstral coefficients. The Viterbi search is used together with the level building algorithm to recognise speech within the utterances. Word-spotting techniques including a "garbage" model, are used. Various techniques compensating for noise and a varying channel transfer function are employed to improve the recognition rate. An Afrikaans conversational interface prompts the caller for information. Detailed experiments illustrate the dependence and sensitivity of the system on its parameters, and show the influence of several techniques aimed at improving the recognition rate.Dissertation (MEng (Computer Engineering))--University of Pretoria, 2006.Electrical, Electronic and Computer Engineeringunrestricte
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
Speech recognition on DSP: algorithm optimization and performance analysis.
Yuan Meng.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 85-91).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- History of ASR development --- p.2Chapter 1.2 --- Fundamentals of automatic speech recognition --- p.3Chapter 1.2.1 --- Classification of ASR systems --- p.3Chapter 1.2.2 --- Automatic speech recognition process --- p.4Chapter 1.3 --- Performance measurements of ASR --- p.7Chapter 1.3.1 --- Recognition accuracy --- p.7Chapter 1.3.2 --- Complexity --- p.7Chapter 1.3.3 --- Robustness --- p.8Chapter 1.4 --- Motivation and goal of this work --- p.8Chapter 1.5 --- Thesis outline --- p.10Chapter 2 --- Signal processing techniques for front-end --- p.12Chapter 2.1 --- Basic feature extraction principles --- p.13Chapter 2.1.1 --- Pre-emphasis --- p.13Chapter 2.1.2 --- Frame blocking and windowing --- p.13Chapter 2.1.3 --- Discrete Fourier Transform (DFT) computation --- p.15Chapter 2.1.4 --- Spectral magnitudes --- p.15Chapter 2.1.5 --- Mel-frequency filterbank --- p.16Chapter 2.1.6 --- Logarithm of filter energies --- p.18Chapter 2.1.7 --- Discrete Cosine Transformation (DCT) --- p.18Chapter 2.1.8 --- Cepstral Weighting --- p.19Chapter 2.1.9 --- Dynamic featuring --- p.19Chapter 2.2 --- Practical issues --- p.20Chapter 2.2.1 --- Review of practical problems and solutions in ASR appli- cations --- p.20Chapter 2.2.2 --- Model of environment --- p.23Chapter 2.2.3 --- End-point detection (EPD) --- p.23Chapter 2.2.4 --- Spectral subtraction (SS) --- p.25Chapter 3 --- HMM-based Acoustic Modeling --- p.26Chapter 3.1 --- HMMs for ASR --- p.26Chapter 3.2 --- Output probabilities --- p.27Chapter 3.3 --- Viterbi search engine --- p.29Chapter 3.4 --- Isolated word recognition (IWR) & Connected word recognition (CWR) --- p.30Chapter 3.4.1 --- Isolated word recognition --- p.30Chapter 3.4.2 --- Connected word recognition (CWR) --- p.31Chapter 4 --- DSP for embedded applications --- p.32Chapter 4.1 --- "Classification of embedded systems (DSP, ASIC, FPGA, etc.)" --- p.32Chapter 4.2 --- Description of hardware platform --- p.34Chapter 4.3 --- I/O operation for real-time processing --- p.36Chapter 4.4 --- Fixed point algorithm on DSP --- p.40Chapter 5 --- ASR algorithm optimization --- p.42Chapter 5.1 --- Methodology --- p.42Chapter 5.2 --- Floating-point to fixed-point conversion --- p.43Chapter 5.3 --- Computational complexity consideration --- p.45Chapter 5.3.1 --- Feature extraction techniques --- p.45Chapter 5.3.2 --- Viterbi search module --- p.50Chapter 5.4 --- Memory requirements consideration --- p.51Chapter 6 --- Experimental results and performance analysis --- p.53Chapter 6.1 --- Cantonese isolated word recognition (IWR) --- p.54Chapter 6.1.1 --- Execution time --- p.54Chapter 6.1.2 --- Memory requirements --- p.57Chapter 6.1.3 --- Recognition performance --- p.57Chapter 6.2 --- Connected word recognition (CWR) --- p.61Chapter 6.2.1 --- Execution time consideration --- p.62Chapter 6.2.2 --- Recognition performance --- p.62Chapter 6.3 --- Summary & discussion --- p.66Chapter 7 --- Implementation of practical techniques --- p.67Chapter 7.1 --- End-point detection (EPD) --- p.67Chapter 7.2 --- Spectral subtraction (SS) --- p.71Chapter 7.3 --- Experimental results --- p.72Chapter 7.3.1 --- Isolated word recognition (IWR) --- p.72Chapter 7.3.2 --- Connected word recognition (CWR) --- p.75Chapter 7.4 --- Results --- p.77Chapter 8 --- Conclusions and future work --- p.78Chapter 8.1 --- Summary and Conclusions --- p.78Chapter 8.2 --- Suggestions for future research --- p.80Appendices --- p.82Chapter A --- "Interpolation of data entries without floating point, divides or conditional branches" --- p.82Chapter B --- Vocabulary for Cantonese isolated word recognition task --- p.84Bibliography --- p.8
Job Development Essentials: A Guide for Job Developers, Second Edition
"Job Development Essentials Second Edition" provides practical advice for workforce development professionals -- the same advice found in the first edition, but with a stronger emphasis on engaging employers, providing expanded services to the business community and involving business people as resources and advocates for an organization
Framework for Human Computer Interaction for Learning Dialogue Strategies using Controlled Natural Language in Information Systems
Spoken Language systems are going to have a tremendous impact in all
the real world applications, be it healthcare enquiry, public transportation
system or airline booking system maintaining the language ethnicity for
interaction among users across the globe. These system have the capability
of interacting with the user in di erent languages that the system
supports. Normally when a person interacts with another person there are
many non-verbal clues which guide the dialogue and all the utterances have
a contextual relationship, which manage the dialogue as its mixed by the
two speakers. Human Computer Interaction has a wide impact on the design
of the applications and has become one of the emerging interest area of
the researchers. All of us are witness to an explosive electronic revolution
where lots of gadgets and gizmo's have surrounded us, advanced not only
in power, design, applications but the ease of access or what we call user
friendly interfaces are designed that we can easily use and control all the
functionality of the devices. Since speech is one of the most intuitive form
of interaction that humans use. It provides potential bene ts such as handfree
access to machines, ergonomics and greater e ciency of interaction.
Yet, speech-based interfaces design has been an expert job for a long time.
Lot of research has been done in building real spoken Dialogue Systems
which can interact with humans using voice interactions and help in performing
various tasks as are done by humans. Last two decades have seen
utmost advanced research in the automatic speech recognition, dialogue
management, text to speech synthesis and Natural Language Processing
for various applications which have shown positive results. This dissertation
proposes to apply machine learning (ML) techniques to the problem
of optimizing the dialogue management strategy selection in the Spoken
Dialogue system prototype design. Although automatic speech recognition
and system initiated dialogues where the system expects an answer in the
form of `yes' or `no' have already been applied to Spoken Dialogue Systems(
SDS), no real attempt to use those techniques in order to design a
new system from scratch has been made. In this dissertation, we propose
some novel ideas in order to achieve the goal of easing the design of Spoken
Dialogue Systems and allow novices to have access to voice technologies.
A framework for simulating and evaluating dialogues and learning optimal
dialogue strategies in a controlled Natural Language is proposed. The simulation
process is based on a probabilistic description of a dialogue and
on the stochastic modelling of both arti cial NLP modules composing a
SDS and the user. This probabilistic model is based on a set of parameters
that can be tuned from the prior knowledge from the discourse or learned
from data. The evaluation is part of the simulation process and is based
on objective measures provided by each module. Finally, the simulation
environment is connected to a learning agent using the supplied evaluation
metrics as an objective function in order to generate an optimal behaviour
for the SDS
Toward Widely-Available and Usable Multimodal Conversational Interfaces
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 159-166).Multimodal conversational interfaces, which allow humans to interact with a computer using a combination of spoken natural language and a graphical interface, offer the potential to transform the manner by which humans communicate with computers. While researchers have developed myriad such interfaces, none have made the transition out of the laboratory and into the hands of a significant number of users. This thesis makes progress toward overcoming two intertwined barriers preventing more widespread adoption: availability and usability. Toward addressing the problem of availability, this thesis introduces a new platform for building multimodal interfaces that makes it easy to deploy them to users via the World Wide Web. One consequence of this work is City Browser, the first multimodal conversational interface made publicly available to anyone with a web browser and a microphone. City Browser serves as a proof-of-concept that significant amounts of usage data can be collected in this way, allowing a glimpse of how users interact with such interfaces outside of a laboratory environment. City Browser, in turn, has served as the primary platform for deploying and evaluating three new strategies aimed at improving usability. The most pressing usability challenge for conversational interfaces is their limited ability to accurately transcribe and understand spoken natural language. The three strategies developed in this thesis - context-sensitive language modeling, response confidence scoring, and user behavior shaping - each attack the problem from a different angle, but they are linked in that each critically integrates information from the conversational context.by Alexander Gruenstein.Ph.D
- …