11 research outputs found
Incremental LSTM-based Dialog State Tracker
A dialog state tracker is an important component in modern spoken dialog
systems. We present an incremental dialog state tracker, based on LSTM
networks. It directly uses automatic speech recognition hypotheses to track the
state. We also present the key non-standard aspects of the model that bring its
performance close to the state-of-the-art and experimentally analyze their
contribution: including the ASR confidence scores, abstracting scarcely
represented values, including transcriptions in the training data, and model
averaging
Using Explicit Semantic Analysis for Cross-Lingual Link Discovery
This paper explores how to automatically generate cross language links between resources in large document collections. The paper presents new methods for Cross Lingual Link Discovery(CLLD) based on Explicit Semantic Analysis (ESA). The methods are applicable to any multilingual document collection. In this report, we present their comparative study on the Wikipedia corpus and provide new insights into the evaluation of link discovery systems. In particular, we measure the agreement of human annotators in linking articles in different language versions of Wikipedia, and compare it to the results achieved by the presented methods
Recommended from our members
KMI, The Open University at NTCIR-9 CrossLink: Cross-Lingual Link Discovery in Wikipedia using explicit semantic analysis
This paper describes the methods used in the submission of Knowledge Media institute (KMI), The Open University to the NTCIR-9 Cross-Lingual Link Discovery (CLLD)task entitled CrossLink. KMI submitted four runs for link discovery from English to Chinese; however, the developed methods, which utilise Explicit Semantic Analysis (ESA), are applicable also to other language combinations. Three of the runs are based on exploiting the existing cross-lingual mapping between different versions of Wikipedia articles. In the fourth run, we assume information about the mapping is not available. Our methods achieved encouraging results and we describe in detail how their performance can be further improved. Finally, we discuss two important issues in link discovery: the evaluation methodology and the applicability of the developed methods across dfferent textual collections
Content aware user interface retargeting
This disclosure describes the preservation of important elements of a user interface (UI) during retargeting of the interface image on a mobile device. An on-device machine learned (ML) model is utilized to detect saliency or lack thereof of various UI elements in the user interface. Training of the ML model is performed by utilizing training data from repositories of software application designs and screenshot data from online marketplaces and app evaluation services.
The trained ML model is utilized to detect salient UI elements that are to be preserved during display retargeting. During resizing of the UI, with express user permission, content-aware image retargeting techniques are utilized for the preservation of elements identified as important by the UI saliency detection model. Past interactions are utilized, and interpretation or corrective action is performed only upon permission from the user
DISPLAYING INFORMATION RELATING TO SELECTED TEXT
A device (e.g., a mobile phone, a camera device, a smart display, a tablet computer, a laptop computer, a desktop computer, a gaming system, a media player, an e-book reader, a television platform, a vehicle infotainment system or head unit, etc.) may display useful information such as a translation, definition, and/or description of text selected on the device by a user. The user may select text, such as a character, word, phrase, sentence, paragraph, passage, etc. on the device (e.g., by using a long press, drag, tap, click, or other gesture or input) to cause a language identification module to identify the language of the selected text and determine whether the language of the selected text is a language the user understands. If the language identification module determines that the language of the selected text is not a language the user understands (e.g., based on a system language, user preferences, etc.), a dictionary module and/or other module for displaying information related to the selected text may display a translation, definition, and/or description of the selected text in a non-obtrusive manner on the device (e.g., the translation may be in-line with the selected text, positioned above the selected text, positioned below the selected text, etc.)
Smart Notifications Based on Priority and Context
Current Operating Systems (OSs) of devices such as desktop computers, laptops, mobile phones, and tablets, provide applications with capabilities to serve information to users via builtin notification mechanisms. If the information presented by a notification is not useful or timely, the user’s current task is needlessly disrupted. Moreover, the user is likely to dismiss an inopportune notification quickly, thus reducing user engagement. The techniques of this disclosure enable smart delivery of notifications such that notifications are delivered to the user at an opportune time. On-device neural networks are utilized to make the determination of the opportune time. With user permission, the content of a generated notification is processed to determine whether it is to be shown immediately, by interrupting the user, or whether the delivery is to be deferred until an opportune time
Smart linkification of content within applications
When a user initiates text selection within an application, the operating system can examine text content displayed within the application to predict text selection bounds along with a possible destination application for the selected text. Until the user initiates selection, there may be no indication that a piece of text content might be actionable. Further, the functionality may not work as intended in cases where application developers implement a custom operation for the input mode utilized for passing the text content and associated action from one application to another. With user permission, this disclosure applies regular expression parsing and neural network processing to the text portion of the on-screen content to detect text entities that might be actionable by the OS or other applications on the device. After merging the actionable text entities identified via either of the two techniques, the corresponding text is presented, e.g., by underlining the corresponding text and linking it to invoke the corresponding action
AudioPaLM: A Large Language Model That Can Speak and Listen
We introduce AudioPaLM, a large language model for speech understanding and
generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2
[Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified
multimodal architecture that can process and generate text and speech with
applications including speech recognition and speech-to-speech translation.
AudioPaLM inherits the capability to preserve paralinguistic information such
as speaker identity and intonation from AudioLM and the linguistic knowledge
present only in text large language models such as PaLM-2. We demonstrate that
initializing AudioPaLM with the weights of a text-only large language model
improves speech processing, successfully leveraging the larger quantity of text
training data used in pretraining to assist with the speech tasks. The
resulting model significantly outperforms existing systems for speech
translation tasks and has the ability to perform zero-shot speech-to-text
translation for many languages for which input/target language combinations
were not seen in training. AudioPaLM also demonstrates features of audio
language models, such as transferring a voice across languages based on a short
spoken prompt. We release examples of our method at
https://google-research.github.io/seanet/audiopalm/examplesComment: Technical repor