43 research outputs found

    System For Detecting End Of Speech Utterance For Safe Interruption

    Get PDF
    A system and method for detecting end of speech utterance for safe interruption using audio and video processing software or application is disclosed. The system includes hardware that determines the speaker’s intonation in real time or with minimal delay and displays this visually on a screen along with a transcript of the speech delivered by the speaker. The speaker’s intonation and end of speech could be identified in the system display in real-time as rising and falling of pitch at the end of speech. The system and method could be implemented as a module of any video call or conference facility. The system will provide significant help to people with hearing difficulties so that they participate efficiently in meetings. The method could be implemented in real time without any delays since it could be located in a user\u27s device

    An example of a non-associative Moufang loop of point classes on a cubic surface

    Full text link
    Let k=Q3(θ)k=\mathbb{Q}_3(\theta), θ3=1\theta^3=1 be a quadratic extension of 3-adic numbers. Let VV be a cubic surface defined over a field kk by the equation T03+T13+T23+θT03=0T_0^3+T_1^3+T_2^3+\theta T_0^3=0 and let V(k)V(k) be a set of rational points on VV defined over kk. We show that a relation on V(k)V(k) modulo a prime (1θ)3(1-\theta)^3 (in a ring of integers of kk) defines an admissible relation on a set of rational points of VV over kk and a commutative Moufang loop associated with classes of this admissible equivalence on V(k)V(k) is non-associative. This answers a long standing problem that was formulated by Yu. I. Manin more than 50 years ago about existence of non-abelian quasi-groups associated with some cubic surface over some field

    EXTENDED BAUM TRANSFORMATIONS FOR GENERAL FUNCTIONS, II

    No full text
    The discrimination technique for estimating the parameters of Gaussian mixtures that is based on the Extended Baum transformations (EB) has had significant impact on the speech recognition community. The proof that definitively shows that these transformations increase the value of an objective function with iteration (i.e., so-called "growth transformations") was presented by the author two years ago for a diagonal Gaussian mixture densities. In this paper this proof is extended to a multidimensional multivariate Gaussian mixtures. The proof presented in the current paper is based on the linearization process and the explicit growth estimate for linear forms of Gaussian mixtures

    Method For Providing Metrics To Determine Transcription/Translation Quality

    Get PDF
    A system and method are disclosed for determining the transcription/translation quality of web video content based on metrics derived from indirect feedback of users on existing captioning. The method may take into account how often the closed caption (CC) option was activated by a user on videos and the number of times users stayed through the whole video content using the closed captions. The system can also be used in assessing quality of manual transcription for languages that do not have automated speech recognition and to validate acoustic and language models in machine translation/transcription

    Emotional Assistants for Applications

    Get PDF
    This disclosure describes techniques that provide appropriate emotional responses to users of computing devices, irrespective of the specific interaction context. The techniques can be implemented as an emotional module that is called by an application. In response, the emotional module provides an appropriate response that the application renders. The techniques can learn from various sources of interaction data, such as books, movies, chats etc. and from user behavior after an emotional response is rendered. The techniques generate a database of emotions and appropriate computer responses. The techniques enable computers to provide effective emotional responses and improve human-computer interaction. The techniques can also be used to train certain users, such as autistic children, to understand human emotions

    COLLABORATIVE DISTRIBUTED SPEECH RECOGNITION

    Get PDF
    The disclosure includes a captioning system configured to caption a video. A video may be identified for captioning. The video may be submitted to automated speech recognition engines. A transcription of audio from the video may be received from the automated speech recognition engines. It may be determined whether to accept or create a final transcription. If not, the video may be submitted to one or more manual speech recognition engines. If the final transcription is accepted or created, at least one of the automated speech recognition engines or one of the manual speech recognition engines may be rewarded based on the transcriptions. The video may be captioned with the final transcription

    SYSTEM FOR DETECTING END OF SPEECH UTTERANCE FOR SAFE INTERRUPTION

    Get PDF
    A system and method for detecting end of speech utterance for safe interruption using audio and video processing software or application is disclosed. The system includes hardware that determines the speaker’s intonation in real time or with minimal delay and displays this visually on a screen along with a transcript of the speech delivered by the speaker. The speaker’s intonation and end of speech could be identified in the system display in realtime as rising and falling of pitch at the end of speech. The system and method could be implemented as a module of any video call or conference facility. The system will provide significant help to people with hearing difficulties so that they participate efficiently in meetings. The method could be implemented in real time without any delays since it could be located in a user\u27s device

    SYSTEM AND METHOD FOR SPEECH RECOGNITION

    Get PDF
    The present disclosure is directed to a system and method for speech recognition. A user will provide a speech input to a computing device, which will be recognized and processed. The user can also provide authorization to the computing device to sense and process the speech input via one or more sensor(s) that are utilized to detect the position, movement, etc. of the user’s lips, and in some aspects tongue. If explicitly authorized by the user, the computing device can receive the speech input via the one or more sensors that are utilized to detect the position, movement, etc. of the user’s lips, and in some aspects tongue. The one or more sensors can, for example, be contactless (proximity detection) sensors and/or touch sensors. The speech input can (but need not) be audible speech of the user that is also received by a microphone of the computing device. Further areas of applicability of the present disclosure will become apparent from the detailed description provided hereinafter. It should be understood that the detailed description and specific examples are intended for purposes of illustration only and are not intended to limit the scope of the disclosure

    Crowdsourcing Training Data For Real-Time Transcription Models

    Get PDF
    A system and method are disclosed to train speech transcription models via crowdsourcing. Users of a media sharing platform may view real-time transcriptions associated with media on the user devices and identify the transcriptions as correct or incorrect. Users may determine with high accuracy correct and incorrect parts of transcribed text, using a general context of a conversation that is being transcribed. The users may select or mark blocks of transcription text and assign the selected text as correct transcription or incorrect transcription on the input user device. The system may aggregate a large amount of marked transcriptions from multiple user devices and store the marked transcriptions. The stored marked transcriptions may be used as training data for transcription and captioning models. The disclosed concept may also be extended to machine translation. The accurate training data enables development of better transcription models

    Preference-Based Acceleration of Video Material

    Get PDF
    Video material, such as movies and TV shows, often includes visual and auditory information for users that have sensory disabilities. For example, a blind movie watcher will listen to audio descriptions of visual scenes of the movie and a deaf movie watcher will read descriptions (e.g., closed captioning) of audio information of the movie. Users may prefer to skip or accelerate certain portions of the video material while other portions of the video material are presented in full. These users may include viewers with various disabilities, such as blindness, deafness, autism and so forth, that prefer to only watch portions of the video material that contain specific details while skipping other portions of the video material. Alternately or additionally, the users may include a viewer that simply prefers to only watch portions of the video material that is of interest to them, such as particular types of scenes, certain actors, or particular details, while skipping the other portions that may not be of interest. It would be beneficial if playback of video material could be customized to a particular user, such that portions of the video are accelerated or summarized, and only preferred portions of the video are presented in full. Techniques for a preference-based acceleration of video material are described
    corecore