Search CORE

11 research outputs found

Incremental LSTM-based Dialog State Tracker

Author: Jurcicek Filip
Zilka Lukas
Publication venue
Publication date: 13/07/2015
Field of study

A dialog state tracker is an important component in modern spoken dialog systems. We present an incremental dialog state tracker, based on LSTM networks. It directly uses automatic speech recognition hypotheses to track the state. We also present the key non-standard aspects of the model that bring its performance close to the state-of-the-art and experimentally analyze their contribution: including the ASR confidence scores, abstracting scarcely represented values, including transcriptions in the training data, and model averaging

arXiv.org e-Print Archive

Crossref

Using Explicit Semantic Analysis for Cross-Lingual Link Discovery

Author: Knoth Petr
Zdrahal Zdenek
Zilka Lukas
Publication venue
Publication date: 01/01/2011
Field of study

This paper explores how to automatically generate cross language links between resources in large document collections. The paper presents new methods for Cross Lingual Link Discovery(CLLD) based on Explicit Semantic Analysis (ESA). The methods are applicable to any multilingual document collection. In this report, we present their comparative study on the Wikipedia corpus and provide new insights into the evaluation of link discovery systems. In particular, we measure the agreement of human annotators in linking articles in different language versions of Wikipedia, and compare it to the results achieved by the presented methods

CiteSeerX

Open Research Online (The Open University)

Recommended from our members

KMI, The Open University at NTCIR-9 CrossLink: Cross-Lingual Link Discovery in Wikipedia using explicit semantic analysis

Author: Knoth Petr
Zdrahal Zdenek
Zilka Lukas
Publication venue
Publication date: 01/01/2011
Field of study

This paper describes the methods used in the submission of Knowledge Media institute (KMI), The Open University to the NTCIR-9 Cross-Lingual Link Discovery (CLLD)task entitled CrossLink. KMI submitted four runs for link discovery from English to Chinese; however, the developed methods, which utilise Explicit Semantic Analysis (ESA), are applicable also to other language combinations. Three of the runs are based on exploiting the existing cross-lingual mapping between different versions of Wikipedia articles. In the fourth run, we assume information about the mapping is not available. Our methods achieved encouraging results and we describe in detail how their performance can be further improved. Finally, we discuss two important issues in link discovery: the evaluation methodology and the applicability of the developed methods across dfferent textual collections

Open Research Online (The Open University)

Content aware user interface retargeting

Author: Althaus Jan
Millius Sebastian
Zilka Lukas
Publication venue: Technical Disclosure Commons
Publication date: 08/11/2018
Field of study

This disclosure describes the preservation of important elements of a user interface (UI) during retargeting of the interface image on a mobile device. An on-device machine learned (ML) model is utilized to detect saliency or lack thereof of various UI elements in the user interface. Training of the ML model is performed by utilizing training data from repositories of software application designs and screenshot data from online marketplaces and app evaluation services. The trained ML model is utilized to detect salient UI elements that are to be preserved during display retargeting. During resizing of the UI, with express user permission, content-aware image retargeting techniques are utilized for the preservation of elements identified as important by the UI saliency detection model. Past interactions are utilized, and interpretation or corrective action is performed only upon permission from the user

Technical Disclosure Common

DISPLAYING INFORMATION RELATING TO SELECTED TEXT

Author: Althaus Jan
Jurewicz Jacek
Toki Abodunrinwa
Zilka Lukas
Publication venue: Technical Disclosure Commons
Publication date: 10/12/2020
Field of study

A device (e.g., a mobile phone, a camera device, a smart display, a tablet computer, a laptop computer, a desktop computer, a gaming system, a media player, an e-book reader, a television platform, a vehicle infotainment system or head unit, etc.) may display useful information such as a translation, definition, and/or description of text selected on the device by a user. The user may select text, such as a character, word, phrase, sentence, paragraph, passage, etc. on the device (e.g., by using a long press, drag, tap, click, or other gesture or input) to cause a language identification module to identify the language of the selected text and determine whether the language of the selected text is a language the user understands. If the language identification module determines that the language of the selected text is not a language the user understands (e.g., based on a system language, user preferences, etc.), a dictionary module and/or other module for displaying information related to the selected text may display a translation, definition, and/or description of the selected text in a non-obtrusive manner on the device (e.g., the translation may be in-line with the selected text, positioned above the selected text, positioned below the selected text, etc.)

Technical Disclosure Common

Smart Notifications Based on Priority and Context

Author: Althaus Jan
Flier Holger-Frederik
Millius Sebastian
Sharifi Matthew
Zilka Lukas
Publication venue: Technical Disclosure Commons
Publication date: 30/08/2018
Field of study

Current Operating Systems (OSs) of devices such as desktop computers, laptops, mobile phones, and tablets, provide applications with capabilities to serve information to users via builtin notification mechanisms. If the information presented by a notification is not useful or timely, the user’s current task is needlessly disrupted. Moreover, the user is likely to dismiss an inopportune notification quickly, thus reducing user engagement. The techniques of this disclosure enable smart delivery of notifications such that notifications are delivered to the user at an opportune time. On-device neural networks are utilized to make the determination of the opportune time. With user permission, the content of a generated notification is processed to determine whether it is to be shown immediately, by interrupting the user, or whether the delivery is to be deferred until an opportune time

Technical Disclosure Common

Smart linkification of content within applications

Author: Althaus Jan
Binder Thomas
Jurewicz Jacek
Kao Evelyn
Millius Sebastian
O\u27Dell Regina
Sharifi Matthew
Toki Abodunrinwa
Zilka Lukas
Publication venue: Technical Disclosure Commons
Publication date: 21/06/2018
Field of study

When a user initiates text selection within an application, the operating system can examine text content displayed within the application to predict text selection bounds along with a possible destination application for the selected text. Until the user initiates selection, there may be no indication that a piece of text content might be actionable. Further, the functionality may not work as intended in cases where application developers implement a custom operation for the input mode utilized for passing the text content and associated action from one application to another. With user permission, this disclosure applies regular expression parsing and neural network processing to the text portion of the on-screen content to detect text entities that might be actionable by the OS or other applications on the device. After merging the actionable text entities identified via either of the two techniques, the corresponding text is presented, e.g., by underlining the corresponding text and linking it to invoke the corresponding action

Technical Disclosure Common

AudioPaLM: A Large Language Model That Can Speak and Listen

We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2. We demonstrate that initializing AudioPaLM with the weights of a text-only large language model improves speech processing, successfully leveraging the larger quantity of text training data used in pretraining to assist with the speech tasks. The resulting model significantly outperforms existing systems for speech translation tasks and has the ability to perform zero-shot speech-to-text translation for many languages for which input/target language combinations were not seen in training. AudioPaLM also demonstrates features of audio language models, such as transferring a voice across languages based on a short spoken prompt. We release examples of our method at https://google-research.github.io/seanet/audiopalm/examplesComment: Technical repor

arXiv.org e-Print Archive