11 research outputs found

    Incremental LSTM-based Dialog State Tracker

    Full text link
    A dialog state tracker is an important component in modern spoken dialog systems. We present an incremental dialog state tracker, based on LSTM networks. It directly uses automatic speech recognition hypotheses to track the state. We also present the key non-standard aspects of the model that bring its performance close to the state-of-the-art and experimentally analyze their contribution: including the ASR confidence scores, abstracting scarcely represented values, including transcriptions in the training data, and model averaging

    Using Explicit Semantic Analysis for Cross-Lingual Link Discovery

    Get PDF
    This paper explores how to automatically generate cross language links between resources in large document collections. The paper presents new methods for Cross Lingual Link Discovery(CLLD) based on Explicit Semantic Analysis (ESA). The methods are applicable to any multilingual document collection. In this report, we present their comparative study on the Wikipedia corpus and provide new insights into the evaluation of link discovery systems. In particular, we measure the agreement of human annotators in linking articles in different language versions of Wikipedia, and compare it to the results achieved by the presented methods

    Content aware user interface retargeting

    Get PDF
    This disclosure describes the preservation of important elements of a user interface (UI) during retargeting of the interface image on a mobile device. An on-device machine learned (ML) model is utilized to detect saliency or lack thereof of various UI elements in the user interface. Training of the ML model is performed by utilizing training data from repositories of software application designs and screenshot data from online marketplaces and app evaluation services. The trained ML model is utilized to detect salient UI elements that are to be preserved during display retargeting. During resizing of the UI, with express user permission, content-aware image retargeting techniques are utilized for the preservation of elements identified as important by the UI saliency detection model. Past interactions are utilized, and interpretation or corrective action is performed only upon permission from the user

    DISPLAYING INFORMATION RELATING TO SELECTED TEXT

    Get PDF
    A device (e.g., a mobile phone, a camera device, a smart display, a tablet computer, a laptop computer, a desktop computer, a gaming system, a media player, an e-book reader, a television platform, a vehicle infotainment system or head unit, etc.) may display useful information such as a translation, definition, and/or description of text selected on the device by a user. The user may select text, such as a character, word, phrase, sentence, paragraph, passage, etc. on the device (e.g., by using a long press, drag, tap, click, or other gesture or input) to cause a language identification module to identify the language of the selected text and determine whether the language of the selected text is a language the user understands. If the language identification module determines that the language of the selected text is not a language the user understands (e.g., based on a system language, user preferences, etc.), a dictionary module and/or other module for displaying information related to the selected text may display a translation, definition, and/or description of the selected text in a non-obtrusive manner on the device (e.g., the translation may be in-line with the selected text, positioned above the selected text, positioned below the selected text, etc.)

    Smart Notifications Based on Priority and Context

    Get PDF
    Current Operating Systems (OSs) of devices such as desktop computers, laptops, mobile phones, and tablets, provide applications with capabilities to serve information to users via builtin notification mechanisms. If the information presented by a notification is not useful or timely, the user’s current task is needlessly disrupted. Moreover, the user is likely to dismiss an inopportune notification quickly, thus reducing user engagement. The techniques of this disclosure enable smart delivery of notifications such that notifications are delivered to the user at an opportune time. On-device neural networks are utilized to make the determination of the opportune time. With user permission, the content of a generated notification is processed to determine whether it is to be shown immediately, by interrupting the user, or whether the delivery is to be deferred until an opportune time

    Smart linkification of content within applications

    Get PDF
    When a user initiates text selection within an application, the operating system can examine text content displayed within the application to predict text selection bounds along with a possible destination application for the selected text. Until the user initiates selection, there may be no indication that a piece of text content might be actionable. Further, the functionality may not work as intended in cases where application developers implement a custom operation for the input mode utilized for passing the text content and associated action from one application to another. With user permission, this disclosure applies regular expression parsing and neural network processing to the text portion of the on-screen content to detect text entities that might be actionable by the OS or other applications on the device. After merging the actionable text entities identified via either of the two techniques, the corresponding text is presented, e.g., by underlining the corresponding text and linking it to invoke the corresponding action

    AudioPaLM: A Large Language Model That Can Speak and Listen

    Full text link
    We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2. We demonstrate that initializing AudioPaLM with the weights of a text-only large language model improves speech processing, successfully leveraging the larger quantity of text training data used in pretraining to assist with the speech tasks. The resulting model significantly outperforms existing systems for speech translation tasks and has the ability to perform zero-shot speech-to-text translation for many languages for which input/target language combinations were not seen in training. AudioPaLM also demonstrates features of audio language models, such as transferring a voice across languages based on a short spoken prompt. We release examples of our method at https://google-research.github.io/seanet/audiopalm/examplesComment: Technical repor
    corecore