Search CORE

1,034 research outputs found

Discriminative feature learning for multimodal classification

Author: Calefati Alessandro
Publication venue: country:Italy
Publication date: 01/01/2019
Field of study

The purpose of this thesis is to tackle two related topics: multimodal classification and objective functions to improve the discriminative power of features. First, I worked on image and text classification tasks and performed many experiments to show the effectiveness of different approaches available in literature. Then, I introduced a novel methodology which can classify multimodal documents using singlemodal classifiers merging textual and visual information into images and a novel loss function to improve separability between samples of a dataset. Results show that exploiting multimodal data increases performances on classification tasks rather than using traditional single-modality methods. Moreover the introduced GIT loss function is able to enhance the discriminative power of features, lowering intra-class distance and raising inter-class distance between samples of a multiclass dataset

InsubriaSPACE - Thesis PhD Repository

Archivio istituzionale della ricerca - Università dell'Insubria

InsubriaSPACE

Unified Embedding and Metric Learning for Zero-Exemplar Event Detection

Author: Gavves Efstratios
Hussein Noureldien
Smeulders Arnold W. M.
Publication venue
Publication date: 01/01/2017
Field of study

Event detection in unconstrained videos is conceived as a content-based video retrieval with two modalities: textual and visual. Given a text describing a novel event, the goal is to rank related videos accordingly. This task is zero-exemplar, no video examples are given to the novel event. Related works train a bank of concept detectors on external data sources. These detectors predict confidence scores for test videos, which are ranked and retrieved accordingly. In contrast, we learn a joint space in which the visual and textual representations are embedded. The space casts a novel event as a probability of pre-defined events. Also, it learns to measure the distance between an event and its related videos. Our model is trained end-to-end on publicly available EventNet. When applied to TRECVID Multimedia Event Detection dataset, it outperforms the state-of-the-art by a considerable margin.Comment: IEEE CVPR 201

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Evaluation of Output Embeddings for Fine-Grained Image Classification

Author: Akata Zeynep
Lee Honglak
Reed Scott
Schiele Bernt
Walter Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Image classification has advanced significantly in recent years with the availability of large-scale image sets. However, fine-grained classification remains a major challenge due to the annotation cost of large numbers of fine-grained categories. This project shows that compelling classification performance can be achieved on such categories even without labeled training data. Given image and class embeddings, we learn a compatibility function such that matching embeddings are assigned a higher score than mismatching ones; zero-shot classification of an image proceeds by finding the label yielding the highest joint compatibility score. We use state-of-the-art image features and focus on different supervised attributes and unsupervised output embeddings either derived from hierarchies or learned from unlabeled text corpora. We establish a substantially improved state-of-the-art on the Animals with Attributes and Caltech-UCSD Birds datasets. Most encouragingly, we demonstrate that purely unsupervised output embeddings (learned from Wikipedia and improved with fine-grained text) achieve compelling results, even outperforming the previous supervised state-of-the-art. By combining different output embeddings, we further improve results.Comment: @inproceedings {ARWLS15, title = {Evaluation of Output Embeddings for Fine-Grained Image Classification}, booktitle = {IEEE Computer Vision and Pattern Recognition}, year = {2015}, author = {Zeynep Akata and Scott Reed and Daniel Walter and Honglak Lee and Bernt Schiele}

arXiv.org e-Print Archive

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe