179 research outputs found

    Development of a Field-Deployable Voice-Controlled Ultrasound Scanner System

    Get PDF
    Modern ultrasound scanners are portable and have become very useful for clinical diagnosis. However, they have limitations for field use purposes, primarily because they occupy both hands of the physician who performs the scanning. The goal of this thesis is to develop a wearable voice-controlled ultrasound scanner system that would enable the physician to provide a fast and efficient diagnosis. This is expected to become very useful for emergency and trauma applications. A commercially available ultrasound scanner system, Terason 2000, was chosen as the basis for development. This system consists of a laptop, a hardware unit containing the RF beamforming and signal processing chips and the ultrasound transducer. In its commercial version, the control of the ultrasound system is performed via a Graphical User Interface with a Windows-application look and feel. In the system we developed, a command and control speech recognition engine and a noise-canceling microphone are selected to control the scanner using voice commands. A mini-joystick is attached to the top of the ultrasound transducer for distance and area measurements and to perform zooming of the ultrasound images. An eye-wear viewer connected to the laptop enables the user to view the ultrasound images directly. Power management features are incorporated into the ultrasound system in order to conserve the battery power. A wireless connection is set up with a remote laptop to enable real-time transmission of wireless images. The result is a truly untethered, voice-controlled, ultrasound system enclosed in a backpack and monitored by the eye-wear viewer. (In the second generation of this system, the laptop is replaced by an embedded PC and is incorporated into a photographer’s vest). The voice-controlled system has to be made reliable under various forms of background noise. Three command and control speech recognition systems were selected and their recognition performances were determined under different types and levels of ambient noise. The variation of recognition rates was also analyzed over 6 different speakers. A detailed testing was also conducted to identify the ideal combination of a microphone and speech recognition engine suitable for the ultrasound scanner system. Six different microphones, each with their own unique methods of implementing noise cancellation features, were chosen as candidates for this analysis. The testing was conducted by making recordings inside a highly reverberant acoustic noisy chamber, and the recordings were fed to the automatic speech recognition engines offline for performance evaluation. The speech recognition engine and microphone selected as a result of this extensive testing were then incorporated into the wearable ultrasound scanner system. This thesis also discusses the implementation of the human-speech interface, which also plays a major role in the effectiveness of the voice-controlled ultrasound scanner system

    Intelligent Voice Email Agent: A Multimedia Solution

    Get PDF
    Intelligent agent theory is an important concept in artificial intelligent area. An intelligent agent, in a nutshell, is an intelligent program that uses agent communication protocols to exchange information for automatic problem solving, performing specific tasks on behalf of their users. Our objective is to investigate what an intelligent agent consists of and to implement several important aspects of it. In particular, we are interested in an intelligent agent that is able to take care of the incoming messages while the user is concentrating on some other duties. We develop an agent-based design framework and implement an intelligent agent system - voice email system that monitors incoming emails for us while we are surfing the Internet. A special feature of this system is that the agent reads any new messages for users. This initiative is based on the perception of real world needs and the academic research development. Based on what we have done, we can extend our agent capabilities. For example, two mail agents should be able to communicate each other to achieve more complicated task

    Proposing a speech to gesture translation architecture for Spanish deaf people.

    Get PDF
    This article describes an architecture for translating speech into Spanish Sign Language (SSL). The architecture proposed is made up of four modules: speech recognizer, semantic analysis, gesture sequence generation and gesture playing. For the speech recognizer and the semantic analysis modules, we use software developed by IBM and CSLR (Center for Spoken Language Research at University of Colorado), respectively. Gesture sequence generation and gesture animation are the modules on which we have focused our main effort. Gesture sequence generation uses semantic concepts (obtained from the semantic analysis) associating them with several SSL gestures. This association is carried out based on a number of generation rules. For gesture animation, we have developed an animated agent (virtual representation of a human person) and a strategy for reducing the effort in gesture animation. This strategy consists of making the system automatically generate all agent positions necessary for the gesture animation. In this process, the system uses a few main agent positions (two or three per second) and some interpolation strategies, both issues previously generated by the service developer (the person who adapts the architecture proposed in this paper to a specific domain). Related to this module, we propose a distance between agent positions and a measure of gesture complexity. This measure can be used to analyze the gesture perception versus its complexity. With the architecture proposed, we are not trying to build a domain independent translator but a system able to translate speech utterances into gesture sequences in a restricted domain: railway, flights or weather information

    Improvement of speech recognition by nonlinear noise reduction

    Full text link
    The success of nonlinear noise reduction applied to a single channel recording of human voice is measured in terms of the recognition rate of a commercial speech recognition program in comparison to the optimal linear filter. The overall performance of the nonlinear method is shown to be superior. We hence demonstrate that an algorithm which has its roots in the theory of nonlinear deterministic dynamics possesses a large potential in a realistic application.Comment: see urbanowicz.org.p

    Dictation System - User Interface

    Get PDF
    Pod pojmem diktovací systém chápeme software, který se skládá ze dvou hlavních částí. První částí je rozpoznávač pro rozpoznávání mluveného slova, druhou částí je uživatelské rozhraní pro interakci s uživatelem a zpracování výstupu rozpoznávače. Tato práce se zaměřuje na uživatelské rozhraní diktovacího systému, popis komunikace mezi rozpoznávačem a uživatelským rozhraním, opravou slov přicházejících od rozpoznávače a převodem čísel ze slovní podoby na číselnou.The term dictation system we understand the software system, which consists of two main parts. The first part is recognizer spoken word, the second is the user interface for user interaction and processing output recognizer. This work focuses on the user interface dictation system, a description of comunication between the recognizer and the user interface, the correction of words coming from the recognizer and transfer numbers from verbal to numerical form.

    Herramientas informáticas disponibles para la automatización de la traducción audiovisual (“revoicing”)

    Get PDF
    Cet présent article fait état d’une évaluation des logiciels informatiques utilisés dans les principales modalités de la traduction audiovisuelle impliquant la reformulation orale de la traduction du texte cible (ce que l’on appelle en anglais revoicing) : l’audio-description, la voix superposée (voice-over) et le doublage. Cette dernière modalité est celle qui nous intéresse le plus, car il existe très peu de logiciels spécialisés la concernant. Après l’examen de ces logiciels, nous proposons deux options d’automatisation facilitant le travail des traducteurs en situation de doublage. La dernière phase de programmation de la première option est actuellement en cours. Puis, dans la dernière partie de cet article, nous proposons un glossaire et une liste de logiciels informatiques destinés au traducteur audiovisuel qui sont disponibles sur le marché.This article evaluates the existing software applications for the main audiovisual translation modalities in which the translation is meant to be reformulated orally (known as revoicing modalities), i.e., audio-description, voice-over and dubbing. The latter is the most interesting one, since the lack of applications for this particular modality is remarkable. Once the evaluation is completed, two automation options for dubbing are proposed. The first one is currently in the final process of implementation by the author. This document concludes with a glossary and a list of applications which are commercially available for the audiovisual translator

    Core Matters Journal

    Get PDF

    Herramientas informáticas disponibles para la automatización de la traducción audiovisual (“revoicing”)

    Get PDF
    This article evaluates the existing software applications for the main audiovisual translation modalities in which the translation is meant to be reformulated orally (known as revoicing modalities), i.e., audio-description, voice-over and dubbing. The latter is the most interesting one, since the lack of applications for this particular modality is remarkable. Once the evaluation is completed, two automation options for dubbing are proposed. The first one is currently in the final process of implementation by the author. This document concludes with a glossary and a list of applications which are commercially available for the audiovisual translator.Cet présent article fait état d’une évaluation des logiciels informatiques utilisés dans les principales modalités de la traduction audiovisuelle impliquant la reformulation orale de la traduction du texte cible (ce que l’on appelle en anglais revoicing) : l’audio-description, la voix superposée (voice-over) et le doublage. Cette dernière modalité est celle qui nous intéresse le plus, car il existe très peu de logiciels spécialisés la concernant. Après l’examen de ces logiciels, nous proposons deux options d’automatisation facilitant le travail des traducteurs en situation de doublage. La dernière phase de programmation de la première option est actuellement en cours. Puis, dans la dernière partie de cet article, nous proposons un glossaire et une liste de logiciels informatiques destinés au traducteur audiovisuel qui sont disponibles sur le marché
    • …
    corecore