Search CORE

2,766 research outputs found

Analyzing Autoencoder-Based Acoustic Word Embeddings

Author: Goldwater Sharon
Kamper Herman
Matusevych Yevgen
Publication venue
Publication date: 03/04/2020
Field of study

Recent studies have introduced methods for learning acoustic word embeddings (AWEs)---fixed-size vector representations of words which encode their acoustic features. Despite the widespread use of AWEs in speech processing research, they have only been evaluated quantitatively in their ability to discriminate between whole word tokens. To better understand the applications of AWEs in various downstream tasks and in cognitive modeling, we need to analyze the representation spaces of AWEs. Here we analyze basic properties of AWE spaces learned by a sequence-to-sequence encoder-decoder model in six typologically diverse languages. We first show that these AWEs preserve some information about words' absolute duration and speaker. At the same time, the representation space of these AWEs is organized such that the distance between words' embeddings increases with those words' phonetic dissimilarity. Finally, the AWEs exhibit a word onset bias, similar to patterns reported in various studies on human speech processing and lexical access. We argue this is a promising result and encourage further evaluation of AWEs as a potentially useful tool in cognitive science, which could provide a link between speech processing and lexical memory.Comment: 6 pages, 7 figures, accepted to BAICS workshop (ICLR2020

arXiv.org e-Print Archive

Edinburgh Research Explorer

Personalized Video Recommendation Using Rich Contents from Videos

Author: Chen Ling
Du Xingzhong
Wang Yang
Yang Yi
Yin Hongzhi
Zhou Xiaofang
Publication venue
Publication date: 04/12/2018
Field of study

Video recommendation has become an essential way of helping people explore the massive videos and discover the ones that may be of interest to them. In the existing video recommender systems, the models make the recommendations based on the user-video interactions and single specific content features. When the specific content features are unavailable, the performance of the existing models will seriously deteriorate. Inspired by the fact that rich contents (e.g., text, audio, motion, and so on) exist in videos, in this paper, we explore how to use these rich contents to overcome the limitations caused by the unavailability of the specific ones. Specifically, we propose a novel general framework that incorporates arbitrary single content feature with user-video interactions, named as collaborative embedding regression (CER) model, to make effective video recommendation in both in-matrix and out-of-matrix scenarios. Our extensive experiments on two real-world large-scale datasets show that CER beats the existing recommender models with any single content feature and is more time efficient. In addition, we propose a priority-based late fusion (PRI) method to gain the benefit brought by the integrating the multiple content features. The corresponding experiment shows that PRI brings real performance improvement to the baseline and outperforms the existing fusion methods

arXiv.org e-Print Archive

University of Queensland eSpace

Recommended from our members

Preservation of Patient Level Privacy: Federated Classification and Calibration Models

Author: Huang Yingxiang
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

With the launching of the Precision Medicine Initiative in the United States, by the National Institute of Health, and the emergence of a large volume of electronic health records, there are many opportunities to improve clinical decision support systems. A large number of samples are needed to build predictive models that have adequate discrimination and calibration. However, protecting patient privacy is also an important issue. Patient data are typically protected in localized silos, and consolidation of datasets from different healthcare systems is difficult. Federated learning allows the training of a global model by amassing intermediate calculations from localized medical systems. The knowledge learned from the data can be transferred and aggregated to achieve better performance than the one achieved by individual local models. Federated learning may help build better models, providing more accurate predictions. There are two types of measures to assess how well a model performs: discrimination and calibration. While most papers report discrimination measures, calibration has often been neglected but it is a critical metric for evaluation. In this dissertation, I show a novel way to build classifiers and calibration models in a federated manner. I also show how I can evaluate and improve model calibration in this manner. Federated modeling enables the accumulation of knowledge and information that are otherwise locked behind local medical systems

eScholarship - University of California

Cultural influences on word meanings revealed through large-scale semantic alignment

Author: Lupyan Gary
Roberts Seán G.
Thompson Bill
Publication venue
Publication date: 23/07/2019
Field of study

If the structure of language vocabularies mirrors the structure of natural divisions that are universally perceived, then the meanings of words in different languages should closely align. By contrast, if shared word meanings are a product of shared culture, history and geography, they may differ between languages in substantial but predictable ways. Here, we analysed the semantic neighbourhoods of 1,010 meanings in 41 languages. The most-aligned words were from semantic domains with high internal structure (number, quantity and kinship). Words denoting natural kinds, common actions and artefacts aligned much less well. Languages that are more geographically proximate, more historically related and/or spoken by more-similar cultures had more aligned word meanings. These results provide evidence that the meanings of common words vary in ways that reflect the culture, history and geography of their users

Online Research @ Cardiff

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Explore Bristol Research

Quantifying Semantic Similarity Across Languages

Author: Lupyan Gary
Roberts Sean G
Thompson Bill
Publication venue: Cognitive Science Society
Publication date: 28/07/2018
Field of study

Explore Bristol Research

The relational processing limits of classic and contemporary neural network models of language processing

Author: Doumas Leonidas
Martin Andrea E.
Puebla Ramírez Guillermo
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2020
Field of study

Whether neural networks can capture relational knowledge is a matter of long-standing controversy. Recently, some researchers have argued that (1) classic connectionist models can handle relational structure and (2) the success of deep learning approaches to natural language processing suggests that structured representations are unnecessary to model human language. We tested the Story Gestalt model, a classic connectionist model of text comprehension, and a Sequence-to-Sequence with Attention model, a modern deep learning architecture for natural language processing. Both models were trained to answer questions about stories based on abstract thematic roles. Two simulations varied the statistical structure of new stories while keeping their relational structure intact. The performance of each model fell below chance at least under one manipulation. We argue that both models fail our tests because they can't perform dynamic binding. These results cast doubts on the suitability of traditional neural networks for explaining relational reasoning and language processing phenomena

Crossref

Edinburgh Research Explorer

Radboud Repository

MPG.PuRe

A Data-driven Approach to the Semantics of Iconicity in American Sign Language and English

Author: Emmory Karen
Lupyan Gary
Perlman Marcus
Sehyr Zed Sevcikova
Thompson Bill
Publication venue: Chapman University Digital Commons
Publication date: 02/03/2020
Field of study

A growing body of research shows that both signed and spoken languages display regular patterns of iconicity in their vocabularies. We compared iconicity in the lexicons of American Sign Language (ASL) and English by combining previously collected ratings of ASL signs (Caselli, Sevcikova Sehyr, Cohen-Goldberg, & Emmorey, 2017) and English words (Winter, Perlman, Perry, & Lupyan, 2017) with the use of data-driven semantic vectors derived from English. Our analyses show that models of spoken language lexical semantics drawn from large text corpora can be useful for predicting the iconicity of signs as well as words. Compared to English, ASL has a greater number of regions of semantic space with concentrations of highly iconic vocabulary. There was an overall negative relationship between semantic density and the iconicity of both English words and ASL signs. This negative relationship disappeared for highly iconic signs, suggesting that iconic forms may be more easily discriminable in ASL than in English. Our findings contribute to an increasingly detailed picture of how iconicity is distributed across different languages

Chapman University Digital Commons

Machine Translation with Image Context from Mandarin Chinese to English

Author: Johnson Brooke E.
Publication venue: AFIT Scholar
Publication date: 22/03/2019
Field of study

Despite ongoing improvements in machine translation, machine translators still lack the capability of incorporating context from which source text may have been derived. Machine translators use text from a source language to translate it into a target language without observing any visual context. This work aims to produce a neural machine translation model that is capable of accepting both text and image context as a multimodal translator from Mandarin Chinese to English. The model was trained on a small multimodal dataset of 700 images and sentences, and compared to a translator trained only on the text associated with those images. The model was also trained on a larger text only corpus of 21,000 sentences with and without the addition of the small multimodal dataset. Notable differences were produced between the text only and the multimodal translators when trained on the small 700 sentence and image dataset, however no observable discrepancies were found between the translators trained on the larger text corpus. Further research with a larger multimodal dataset could provide more results clarifying the utility of multimodal machine translation

AFTI Scholar (Air Force Institute of Technology)