Search CORE

17 research outputs found

Ridge Regression, Hubness, and Zero-Shot Learning

Author: Hara Kazuo
Matsumoto Yuji
Shigeto Yutaro
Shimbo Masashi
Suzuki Ikumi
Publication venue
Publication date: 03/07/2015
Field of study

This paper discusses the effect of hubness in zero-shot learning, when ridge regression is used to find a mapping between the example space to the label space. Contrary to the existing approach, which attempts to find a mapping from the example space to the label space, we show that mapping labels into the example space is desirable to suppress the emergence of hubs in the subsequent nearest neighbor search step. Assuming a simple data model, we prove that the proposed approach indeed reduces hubness. This was verified empirically on the tasks of bilingual lexicon extraction and image labeling: hubness was reduced with both of these tasks and the accuracy was improved accordingly.Comment: To be presented at ECML/PKDD 201

arXiv.org e-Print Archive

Crossref

Are we describing the same sound? An analysis of word embedding spaces of expressive piano performance

Author: Cancino-Chacón Carlos Eduardo
Chowdhury Shreyan
Peter Silvan David
Widmer Gerhard
Publication venue
Publication date: 31/12/2023
Field of study

Semantic embeddings play a crucial role in natural language-based information retrieval. Embedding models represent words and contexts as vectors whose spatial configuration is derived from the distribution of words in large text corpora. While such representations are generally very powerful, they might fail to account for fine-grained domain-specific nuances. In this article, we investigate this uncertainty for the domain of characterizations of expressive piano performance. Using a music research dataset of free text performance characterizations and a follow-up study sorting the annotations into clusters, we derive a ground truth for a domain-specific semantic similarity structure. We test five embedding models and their similarity structure for correspondence with the ground truth. We further assess the effects of contextualizing prompts, hubness reduction, cross-modal similarity, and k-means clustering. The quality of embedding models shows great variability with respect to this task; more general models perform better than domain-adapted ones and the best model configurations reach human-level agreement

arXiv.org e-Print Archive

HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs

Author: Li Shuaipeng
Liu Fangyu
Wang Xun
Ye Rongtian
Publication venue
Publication date: 22/11/2019
Field of study

The hubness problem widely exists in high-dimensional embedding space and is a fundamental source of error for cross-modal matching tasks. In this work, we study the emergence of hubs in Visual Semantic Embeddings (VSE) with application to text-image matching. We analyze the pros and cons of two widely adopted optimization objectives for training VSE and propose a novel hubness-aware loss function (HAL) that addresses previous methods' defects. Unlike (Faghri et al.2018) which simply takes the hardest sample within a mini-batch, HAL takes all samples into account, using both local and global statistics to scale up the weights of "hubs". We experiment our method with various configurations of model architectures and datasets. The method exhibits exceptionally good robustness and brings consistent improvement on the task of text-image matching across all settings. Specifically, under the same model architectures as (Faghri et al. 2018) and (Lee at al. 2018), by switching only the learning objective, we report a maximum R@1improvement of 7.4% on MS-COCO and 8.3% on Flickr30k.Comment: AAAI-20 (to appear

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Listener-Aware Music Recommendation from Sensor and Social Media Data

Author: D Schnitzer
M Gillhofer
M Schedl
M Schedl
P Lamere
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An investigation of likelihood normalization for robust ASR

Author: Flexer Arthur
Gkiokas Aggelos
Schnitzer Dominik
Vincent Emmanuel
Publication venue: HAL CCSD
Publication date: 14/09/2014
Field of study

International audienceNoise-robust automatic speech recognition (ASR) systems rely on feature and/or model compensation. Existing compensation techniques typically operate on the features or on the parameters of the acoustic models themselves. By contrast, a number of normalization techniques have been defined in the field of speaker verification that operate on the resulting log-likelihood scores. In this paper, we provide a theoretical motivation for likelihood normalization due to the so-called "hubness" phenomenon and we evaluate the benefit of several normalization techniques on ASR accuracy for the 2nd CHiME Challenge task. We show that symmetric normalization (S-norm) reduces the relative error rate by 43% alone and by 10% after feature and model compensation

INRIA a CCSD electronic archive server

HAL-Rennes 1

Recommended from our members

A computational study on outliers in world music

Author: A Flexer
A Holzapfel
A Honingh
A Livshin
A Lomax
A Lomax
B Nettl
B Nettl
BL Sturm
C Guastavino
C Panagiotakis
Chun-Hsi Huang
CM Bishop
CT Lu
D Bountouridis
D Chen
D Clarke
D Schnitzer
DMW Powers
E Gómez
Emmanouil Benetos
F Pachet
G Tzanetakis
G Tzanetakis
G Tzanetakis
H Lee
I Ben-Gal
J Salamon
J Serrà
J Serrà
JJ Aucouturier
JP Bello
JS Downie
JS Downie
JT Titon
L Sun
M Mauch
M Müller
M Schedl
MA Bartsch
MA Schmuckler
Maria Panteli
N Kroher
P Casas
P Filzmoser
P Toiviainen
PE Savage
PE Savage
PE Savage
PJ Rousseeuw
PV Bohlman
R Typke
S Abdallah
S Bhattacharyya
S Brown
S Le Bomin
S McAdams
S Sadie
SC Johnson
SE Trehub
Simon Dixon
T Collins
T Rzeszutek
TH Grubesic
V Hodge
Y Lu
Z Fu
Z Fu
Ò Celma
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2017
Field of study

The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We investigate music similarity in a corpus of 8200 recordings of folk and traditional music from 137 countries around the world. In particular, we aim to identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as ‘outliers’. We use signal processing tools to extract music information from audio recordings, data mining to quantify similarity and detect outliers, and spatial statistics to account for geographical correlation. Our findings suggest that Botswana is the country with the most distinct recordings in the corpus and China is the country with the most distinct recordings when considering spatial correlation. Our analysis includes a comparison of musical attributes and styles that contribute to the ‘uniqueness’ of the music of each country

City Research Online

Crossref

Directory of Open Access Journals

Queen Mary Research Online