617 research outputs found
Bimodal network architectures for automatic generation of image annotation from text
Medical image analysis practitioners have embraced big data methodologies.
This has created a need for large annotated datasets. The source of big data is
typically large image collections and clinical reports recorded for these
images. In many cases, however, building algorithms aimed at segmentation and
detection of disease requires a training dataset with markings of the areas of
interest on the image that match with the described anomalies. This process of
annotation is expensive and needs the involvement of clinicians. In this work
we propose two separate deep neural network architectures for automatic marking
of a region of interest (ROI) on the image best representing a finding
location, given a textual report or a set of keywords. One architecture
consists of LSTM and CNN components and is trained end to end with images,
matching text, and markings of ROIs for those images. The output layer
estimates the coordinates of the vertices of a polygonal region. The second
architecture uses a network pre-trained on a large dataset of the same image
types for learning feature representations of the findings of interest. We show
that for a variety of findings from chest X-ray images, both proposed
architectures learn to estimate the ROI, as validated by clinical annotations.
There is a clear advantage obtained from the architecture with pre-trained
imaging network. The centroids of the ROIs marked by this network were on
average at a distance equivalent to 5.1% of the image width from the centroids
of the ground truth ROIs.Comment: Accepted to MICCAI 2018, LNCS 1107
Visually Grounded Meaning Representations
In this paper we address the problem of grounding distributional representations of lexical meaning. We introduce a new
model which uses stacked autoencoders to learn higher-level representations from textual and visual input. The visual modality is
encoded via vectors of attributes obtained automatically from images. We create a new large-scale taxonomy of 600 visual attributes
representing more than 500 concepts and 700K images. We use this dataset to train attribute classifiers and integrate their predictions
with text-based distributional models of word meaning. We evaluate our model on its ability to simulate word similarity judgments and
concept categorization. On both tasks, our model yields a better fit to behavioral data compared to baselines and related models which
either rely on a single modality or do not make use of attribute-based input
First impressions: A survey on vision-based apparent personality trait analysis
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.Peer ReviewedPostprint (author's final draft
Deep Active Learning Explored Across Diverse Label Spaces
abstract: Deep learning architectures have been widely explored in computer vision and have
depicted commendable performance in a variety of applications. A fundamental challenge
in training deep networks is the requirement of large amounts of labeled training
data. While gathering large quantities of unlabeled data is cheap and easy, annotating
the data is an expensive process in terms of time, labor and human expertise.
Thus, developing algorithms that minimize the human effort in training deep models
is of immense practical importance. Active learning algorithms automatically identify
salient and exemplar samples from large amounts of unlabeled data and can augment
maximal information to supervised learning models, thereby reducing the human annotation
effort in training machine learning models. The goal of this dissertation is to
fuse ideas from deep learning and active learning and design novel deep active learning
algorithms. The proposed learning methodologies explore diverse label spaces to
solve different computer vision applications. Three major contributions have emerged
from this work; (i) a deep active framework for multi-class image classication, (ii)
a deep active model with and without label correlation for multi-label image classi-
cation and (iii) a deep active paradigm for regression. Extensive empirical studies
on a variety of multi-class, multi-label and regression vision datasets corroborate the
potential of the proposed methods for real-world applications. Additional contributions
include: (i) a multimodal emotion database consisting of recordings of facial
expressions, body gestures, vocal expressions and physiological signals of actors enacting
various emotions, (ii) four multimodal deep belief network models and (iii)
an in-depth analysis of the effect of transfer of multimodal emotion features between
source and target networks on classification accuracy and training time. These related
contributions help comprehend the challenges involved in training deep learning
models and motivate the main goal of this dissertation.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
- …