78,340 research outputs found
On the Computation of the Kullback-Leibler Measure for Spectral Distances
Efficient algorithms for the exact and approximate computation of the symmetrical Kullback-Leibler (1998) measure for spectral distances are presented for linear predictive coding (LPC) spectra. A interpretation of this measure is given in terms of the poles of the spectra. The performances of the algorithms in terms of accuracy and computational complexity are assessed for the application of computing concatenation costs in unit-selection-based speech synthesis. With the same complexity and storage requirements, the exact method is superior in terms of accuracy
Study to determine potential flight applications and human factors design guidelines for voice recognition and synthesis systems
A study was conducted to determine potential commercial aircraft flight deck applications and implementation guidelines for voice recognition and synthesis. At first, a survey of voice recognition and synthesis technology was undertaken to develop a working knowledge base. Then, numerous potential aircraft and simulator flight deck voice applications were identified and each proposed application was rated on a number of criteria in order to achieve an overall payoff rating. The potential voice recognition applications fell into five general categories: programming, interrogation, data entry, switch and mode selection, and continuous/time-critical action control. The ratings of the first three categories showed the most promise of being beneficial to flight deck operations. Possible applications of voice synthesis systems were categorized as automatic or pilot selectable and many were rated as being potentially beneficial. In addition, voice system implementation guidelines and pertinent performance criteria are proposed. Finally, the findings of this study are compared with those made in a recent NASA study of a 1995 transport concept
Design of multimedia processor based on metric computation
Media-processing applications, such as signal processing, 2D and 3D graphics
rendering, and image compression, are the dominant workloads in many embedded
systems today. The real-time constraints of those media applications have
taxing demands on today's processor performances with low cost, low power and
reduced design delay. To satisfy those challenges, a fast and efficient
strategy consists in upgrading a low cost general purpose processor core. This
approach is based on the personalization of a general RISC processor core
according the target multimedia application requirements. Thus, if the extra
cost is justified, the general purpose processor GPP core can be enforced with
instruction level coprocessors, coarse grain dedicated hardware, ad hoc
memories or new GPP cores. In this way the final design solution is tailored to
the application requirements. The proposed approach is based on three main
steps: the first one is the analysis of the targeted application using
efficient metrics. The second step is the selection of the appropriate
architecture template according to the first step results and recommendations.
The third step is the architecture generation. This approach is experimented
using various image and video algorithms showing its feasibility
Assessment of cockpit interface concepts for data link retrofit
The problem is examined of retrofitting older generation aircraft with data link capability. The approach taken analyzes requirements for the cockpit interface, based on review of prior research and opinions obtained from subject matter experts. With this background, essential functions and constraints for a retrofit installation are defined. After an assessment of the technology available to meet the functions and constraints, candidate design concepts are developed. The most promising design concept is described in detail. Finally, needs for further research and development are identified
17 ways to say yes:Toward nuanced tone of voice in AAC and speech technology
People with complex communication needs who use speech-generating devices have very little expressive control over their tone of voice. Despite its importance in human interaction, the issue of tone of voice remains all but absent from AAC research and development however. In this paper, we describe three interdisciplinary projects, past, present and future: The critical design collection Six Speaking Chairs has provoked deeper discussion and inspired a social model of tone of voice; the speculative concept Speech Hedge illustrates challenges and opportunities in designing more expressive user interfaces; the pilot project Tonetable could enable participatory research and seed a research network around tone of voice. We speculate that more radical interactions might expand frontiers of AAC and disrupt speech technology as a whole
Relating Objective and Subjective Performance Measures for AAM-based Visual Speech Synthesizers
We compare two approaches for synthesizing visual speech using Active Appearance Models (AAMs): one that utilizes acoustic features as input, and one that utilizes a phonetic transcription as input. Both synthesizers are trained using the same data and the performance is measured using both objective and subjective testing. We investigate the impact of likely sources of error in the synthesized visual speech by introducing typical errors into real visual speech sequences and subjectively measuring the perceived degradation. When only a small region (e.g. a single syllable) of ground-truth visual speech is incorrect we find that the subjective score for the entire sequence is subjectively lower than sequences generated by our synthesizers. This observation motivates further consideration of an often ignored issue, which is to what extent are subjective measures correlated with objective measures of performance? Significantly, we find that the most commonly used objective measures of performance are not necessarily the best indicator of viewer perception of quality. We empirically evaluate alternatives and show that the cost of a dynamic time warp of synthesized visual speech parameters to the respective ground-truth parameters is a better indicator of subjective quality
- …