17 research outputs found
Ridge Regression, Hubness, and Zero-Shot Learning
This paper discusses the effect of hubness in zero-shot learning, when ridge
regression is used to find a mapping between the example space to the label
space. Contrary to the existing approach, which attempts to find a mapping from
the example space to the label space, we show that mapping labels into the
example space is desirable to suppress the emergence of hubs in the subsequent
nearest neighbor search step. Assuming a simple data model, we prove that the
proposed approach indeed reduces hubness. This was verified empirically on the
tasks of bilingual lexicon extraction and image labeling: hubness was reduced
with both of these tasks and the accuracy was improved accordingly.Comment: To be presented at ECML/PKDD 201
Are we describing the same sound? An analysis of word embedding spaces of expressive piano performance
Semantic embeddings play a crucial role in natural language-based information
retrieval. Embedding models represent words and contexts as vectors whose
spatial configuration is derived from the distribution of words in large text
corpora. While such representations are generally very powerful, they might
fail to account for fine-grained domain-specific nuances. In this article, we
investigate this uncertainty for the domain of characterizations of expressive
piano performance. Using a music research dataset of free text performance
characterizations and a follow-up study sorting the annotations into clusters,
we derive a ground truth for a domain-specific semantic similarity structure.
We test five embedding models and their similarity structure for correspondence
with the ground truth. We further assess the effects of contextualizing
prompts, hubness reduction, cross-modal similarity, and k-means clustering. The
quality of embedding models shows great variability with respect to this task;
more general models perform better than domain-adapted ones and the best model
configurations reach human-level agreement
HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
The hubness problem widely exists in high-dimensional embedding space and is
a fundamental source of error for cross-modal matching tasks. In this work, we
study the emergence of hubs in Visual Semantic Embeddings (VSE) with
application to text-image matching. We analyze the pros and cons of two widely
adopted optimization objectives for training VSE and propose a novel
hubness-aware loss function (HAL) that addresses previous methods' defects.
Unlike (Faghri et al.2018) which simply takes the hardest sample within a
mini-batch, HAL takes all samples into account, using both local and global
statistics to scale up the weights of "hubs". We experiment our method with
various configurations of model architectures and datasets. The method exhibits
exceptionally good robustness and brings consistent improvement on the task of
text-image matching across all settings. Specifically, under the same model
architectures as (Faghri et al. 2018) and (Lee at al. 2018), by switching only
the learning objective, we report a maximum R@1improvement of 7.4% on MS-COCO
and 8.3% on Flickr30k.Comment: AAAI-20 (to appear
An investigation of likelihood normalization for robust ASR
International audienceNoise-robust automatic speech recognition (ASR) systems rely on feature and/or model compensation. Existing compensation techniques typically operate on the features or on the parameters of the acoustic models themselves. By contrast, a number of normalization techniques have been defined in the field of speaker verification that operate on the resulting log-likelihood scores. In this paper, we provide a theoretical motivation for likelihood normalization due to the so-called "hubness" phenomenon and we evaluate the benefit of several normalization techniques on ASR accuracy for the 2nd CHiME Challenge task. We show that symmetric normalization (S-norm) reduces the relative error rate by 43% alone and by 10% after feature and model compensation
Recommended from our members
A computational study on outliers in world music
The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We investigate music similarity in a corpus of 8200 recordings of folk and traditional music from 137 countries around the world. In particular, we aim to identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as ‘outliers’. We use signal processing tools to extract music information from audio recordings, data mining to quantify similarity and detect outliers, and spatial statistics to account for geographical correlation. Our findings suggest that Botswana is the country with the most distinct recordings in the corpus and China is the country with the most distinct recordings when considering spatial correlation. Our analysis includes a comparison of musical attributes and styles that contribute to the ‘uniqueness’ of the music of each country