1,518 research outputs found
Food Recognition using Fusion of Classifiers based on CNNs
With the arrival of convolutional neural networks, the complex problem of
food recognition has experienced an important improvement in recent years. The
best results have been obtained using methods based on very deep convolutional
neural networks, which show that the deeper the model,the better the
classification accuracy will be obtain. However, very deep neural networks may
suffer from the overfitting problem. In this paper, we propose a combination of
multiple classifiers based on different convolutional models that complement
each other and thus, achieve an improvement in performance. The evaluation of
our approach is done on two public datasets: Food-101 as a dataset with a wide
variety of fine-grained dishes, and Food-11 as a dataset of high-level food
categories, where our approach outperforms the independent CNN models
More cat than cute? Interpretable Prediction of Adjective-Noun Pairs
The increasing availability of affect-rich multimedia resources has bolstered
interest in understanding sentiment and emotions in and from visual content.
Adjective-noun pairs (ANP) are a popular mid-level semantic construct for
capturing affect via visually detectable concepts such as "cute dog" or
"beautiful landscape". Current state-of-the-art methods approach ANP prediction
by considering each of these compound concepts as individual tokens, ignoring
the underlying relationships in ANPs. This work aims at disentangling the
contributions of the `adjectives' and `nouns' in the visual prediction of ANPs.
Two specialised classifiers, one trained for detecting adjectives and another
for nouns, are fused to predict 553 different ANPs. The resulting ANP
prediction model is more interpretable as it allows us to study contributions
of the adjective and noun components. Source code and models are available at
https://imatge-upc.github.io/affective-2017-musa2/ .Comment: Oral paper at ACM Multimedia 2017 Workshop on Multimodal
Understanding of Social, Affective and Subjective Attributes (MUSA2
FoodNet: Recognizing Foods Using Ensemble of Deep Networks
In this work we propose a methodology for an automatic food classification
system which recognizes the contents of the meal from the images of the food.
We developed a multi-layered deep convolutional neural network (CNN)
architecture that takes advantages of the features from other deep networks and
improves the efficiency. Numerous classical handcrafted features and approaches
are explored, among which CNNs are chosen as the best performing features.
Networks are trained and fine-tuned using preprocessed images and the filter
outputs are fused to achieve higher accuracy. Experimental results on the
largest real-world food recognition database ETH Food-101 and newly contributed
Indian food image database demonstrate the effectiveness of the proposed
methodology as compared to many other benchmark deep learned CNN frameworks.Comment: 5 pages, 3 figures, 3 tables, IEEE Signal Processing Letter
Discriminative feature learning for multimodal classification
The purpose of this thesis is to tackle two related topics: multimodal classification and objective functions to improve the discriminative power of features.
First, I worked on image and text classification tasks and performed many experiments to show the effectiveness of different approaches available in literature.
Then, I introduced a novel methodology which can classify multimodal documents using singlemodal classifiers merging textual and visual information into images and a novel loss function to improve separability between samples of a dataset.
Results show that exploiting multimodal data increases performances on classification tasks rather than using traditional single-modality methods.
Moreover the introduced GIT loss function is able to enhance the discriminative power of features, lowering intra-class distance and raising inter-class distance between samples of a multiclass dataset
Deep learning for Plankton and Coral Classification
Oceans are the essential lifeblood of the Earth: they provide over 70% of the
oxygen and over 97% of the water. Plankton and corals are two of the most
fundamental components of ocean ecosystems, the former due to their function at
many levels of the oceans food chain, the latter because they provide spawning
and nursery grounds to many fish populations. Studying and monitoring plankton
distribution and coral reefs is vital for environment protection. In the last
years there has been a massive proliferation of digital imagery for the
monitoring of underwater ecosystems and much research is concentrated on the
automated recognition of plankton and corals. In this paper, we present a study
about an automated system for monitoring of underwater ecosystems. The system
here proposed is based on the fusion of different deep learning methods. We
study how to create an ensemble based of different CNN models, fine tuned on
several datasets with the aim of exploiting their diversity. The aim of our
study is to experiment the possibility of fine-tuning pretrained CNN for
underwater imagery analysis, the opportunity of using different datasets for
pretraining models, the possibility to design an ensemble using the same
architecture with small variations in the training procedure. The experimental
results are very encouraging, our experiments performed on 5 well-knowns
datasets (3 plankton and 2 coral datasets) show that the fusion of such
different CNN models in a heterogeneous ensemble grants a substantial
performance improvement with respect to other state-of-the-art approaches in
all the tested problems. One of the main contributions of this work is a wide
experimental evaluation of famous CNN architectures to report performance of
both single CNN and ensemble of CNNs in different problems. Moreover, we show
how to create an ensemble which improves the performance of the best single
model
Discriminative feature learning for multimodal classification
The purpose of this thesis is to tackle two related topics: multimodal classification and objective functions to improve the discriminative power of features.
First, I worked on image and text classification tasks and performed many experiments to show the effectiveness of different approaches available in literature.
Then, I introduced a novel methodology which can classify multimodal documents using singlemodal classifiers merging textual and visual information into images and a novel loss function to improve separability between samples of a dataset.
Results show that exploiting multimodal data increases performances on classification tasks rather than using traditional single-modality methods.
Moreover the introduced GIT loss function is able to enhance the discriminative power of features, lowering intra-class distance and raising inter-class distance between samples of a multiclass dataset
- …