Search CORE

117 research outputs found

Sketch Me That Shoe

Author: Hospedales TM
IEEE
Liu F
Loy CC
Song Y-Z
Xiang T
Yu Q
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/04/2016
Field of study

This project received support from the European Union’s Horizon 2020 research and innovation programme under grant agreement #640891, the Royal Society and Natural Science Foundation of China (NSFC) joint grant #IE141387 and #61511130081, and the China Scholarship Council (CSC). We gratefully acknowledge the support of NVIDIA Corporation for the donation of the GPUs used for this research

Crossref

University of Surrey

Edinburgh Research Explorer

Queen Mary Research Online

Surrey Research Insight

Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

Author: Dey Sounak
Dutta Anjan
Ghosh Suman K.
Lladós Josep
Pal Umapada
Valveny Ernest
Publication venue
Publication date: 28/04/2018
Field of study

In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query. A cross-modal deep network architecture is formulated to jointly model the sketch and text input modalities as well as the the image output modality, learning a common embedding between text and images and between sketches and images. In addition, an attention model is used to selectively focus the attention on the different objects of the image, allowing for retrieval with multiple objects in the query. Experiments show that the proposed method performs the best in both single and multiple object image retrieval in standard datasets.Comment: Accepted at ICPR 201

arXiv.org e-Print Archive

Crossref

Open Research Exeter

A Feature Learning Siamese Model for Intelligent Control of the Dynamic Range Compressor

Author: Fazekas György
Sheng Di
Publication venue
Publication date: 01/05/2019
Field of study

In this paper, a siamese DNN model is proposed to learn the characteristics of the audio dynamic range compressor (DRC). This facilitates an intelligent control system that uses audio examples to configure the DRC, a widely used non-linear audio signal conditioning technique in the areas of music production, speech communication and broadcasting. Several alternative siamese DNN architectures are proposed to learn feature embeddings that can characterise subtle effects due to dynamic range compression. These models are compared with each other as well as handcrafted features proposed in previous work. The evaluation of the relations between the hyperparameters of DNN and DRC parameters are also provided. The best model is able to produce a universal feature embedding that is capable of predicting multiple DRC parameters simultaneously, which is a significant improvement from our previous research. The feature embedding shows better performance than handcrafted audio features when predicting DRC parameters for both mono-instrument audio loops and polyphonic music pieces.Comment: 8 pages, accepted in IJCNN 201

arXiv.org e-Print Archive

Crossref