172 research outputs found
Natural language image description: data, models, and evaluation
Automatically describing an image with a concise natural language description
is an ambitious and emerging task bringing together the Natural Language
and Computer Vision communities. With any emerging task, the
necessary groundwork developing appropriate datasets, strong baseline models,
and evaluation frameworks is key. In this thesis, we introduce the rst
large datasets speci cally designed with image description in mind, focusing
on concrete descriptions that can be gleaned from the image alone. Furthermore,
we develop strong baseline models that show the need to model
language beyond a simple bag-of-words approach to increase performance.
Most importantly, we introduce a ranking based framework for comparing
image description models. We show that this framework is more reliable and
accurate than the conventional wisdom of evaluating on novel model generated
text. As this task has gained popularity recently, we further analyze
the drawbacks of current evaluation methods, and put forth concrete extensions
to our ranking framework that will guide progress towards modeling
the association of natural language and the images the language describes
Machine Learning and Irresponsible Inference: Morally Assessing the Training Data for Image Recognition Systems
Just as humans can draw conclusions responsibly or irresponsibly, so too can computers. Machine learning systems that have been trained on data sets that include irresponsible judgments are likely to yield irresponsible predictions as outputs. In this paper I focus on a particular kind of inference a computer system might make: identification of the intentions with which a person acted on the basis of photographic evidence. Such inferences are liable to be morally objectionable, because of a way in which they are presumptuous. After elaborating this moral concern, I explore the possibility that carefully procuring the training data for image recognition systems could ensure that the systems avoid the problem. The lesson of this paper extends beyond just the particular case of image recognition systems and the challenge of responsibly identifying a person’s intentions. Reflection on this particular case demonstrates the importance (as well as the difficulty) of evaluating machine learning systems and their training data from the standpoint of moral considerations that are not encompassed by ordinary assessments of predictive accuracy
Recommended from our members
Repeatable Reverse Engineering for the Greater Good with PANDA
We present PANDA, an open-source tool that has
been purpose-built to support whole system reverse engineering.
It is built upon the QEMU whole system emulator, and so analyses
have access to all code executing in the guest and all data.
PANDA adds the ability to record and replay executions, enabling
iterative, deep, whole system analyses. Further, the replay log files
are compact and shareable, allowing for repeatable experiments.
A nine billion instruction boot of FreeBSD, e.g., is represented
by only a few hundred MB. Further, PANDA leverages QEMU's
support of thirteen different CPU architectures to make analyses
of those diverse instruction sets possible within the LLVM IR. In
this way, PANDA can have a single dynamic taint analysis, for
example, that precisely supports many CPUs. PANDA analyses
are written in a simple plugin architecture which includes a
mechanism to share functionality between plugins, increasing
analysis code re-use and simplifying complex analysis development.
We demonstrate PANDA's effectiveness via a number of
use cases, including enabling an old but legitimate version of
Starcraft to run despite a lost CD key, in-depth diagnosis of an
Internet Explorer crash, and uncovering the censorship activities
and mechanisms of a Chinese IM client
The role of image representations in vision to language tasks
Tasks that require modeling of both language and visual information such as image captioning have become very popular in recent years. Most state-of-the-art approaches make use of image representations obtained from a deep neural network, which are used to generate language information in a variety of ways with end-to-end neural network-based models. However, it is not clear how different image representations contribute to language generation tasks. In this paper, we probe the representational contribution of the image features in an end-to-end neural modeling framework and study the properties of different types of image representations. We focus on two popular vision to language problems: the task of image captioning and the task of multimodal machine translation. Our analysis provides interesting insights into the representational properties and suggests that end-to-end approaches implicitly learn a visual-semantic subspace and exploit the subspace to generate captions
Revascularization of the Periodontium After Tooth Grafting in Monkeys
In replanted and homo transplanted teeth a vascular network developed in the blood clot between the two parts of the torn periodontium, which allowed the grafted ligament to regain its vascularity. When dentoalveolar ankylosis developed, the periodontal vasculature was split into a number of vascular clusters. In homotransplants, a definite cellular immunologic response by the host was absent. An acrylic radicular obturator was used.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/67229/2/10.1177_00220345710500025101.pd
Interactive-predictive neural multimodal systems
[EN] Despite the advances achieved by neural models in sequence
to sequence learning, exploited in a variety of tasks, they still make errors.
In many use cases, these are corrected by a human expert in a posterior
revision process. The interactive-predictive framework aims to minimize
the human effort spent on this process by considering partial corrections
for iteratively refining the hypothesis. In this work, we generalize the
interactive-predictive approach, typically applied in to machine translation field, to tackle other multimodal problems namely, image and video
captioning. We study the application of this framework to multimodal
neural sequence to sequence models. We show that, following this framework, we approximately halve the effort spent for correcting the outputs
generated by the automatic systems. Moreover, we deploy our systems
in a publicly accessible demonstration, that allows to better understand
the behavior of the interactive-predictive framework.The research leading to these results has received funding from MINECO under grant
IDIFEDER/2018/025 Sistemas de fabricacion inteligentes para la industria 4.0,
action co-funded by the European Regional Development Fund 2014-2020 (FEDER),
and from the European Commission under grant H2020, reference 825111 (DeepHealth). We also acknowledge NVIDIA Corporation for the donation of GPUs used
in this work.Peris, Á.; Casacuberta Nolla, F. (2019). Interactive-predictive neural multimodal systems. Springer. 16-28. https://doi.org/978-3-030-31332-6_2S162
A prospective clinical trial on the influence of a triamcinolone/demeclocycline and a calcium hydroxide based temporary cement on pain perception
<p>Abstract</p> <p>Introduction</p> <p>The aim of this clinical trial was to compare the degree of short term post-operative irritation after application of a triamcinolone/demeclocycyline based or a calcium hydroxide based provisional cement.</p> <p>Methods</p> <p>A total of 109 patients (55 female and 54 male; mean age: 51 ± 14 years) with primary or secondary dentinal caries were randomly assigned to the two treatment groups of this biomedical clinical trial (phase III). Selection criteria were good systemic health and treated teeth, which were vital and showed no symptoms of pulpitis. Up to three teeth were prepared for indirect metallic restorations, and the provisional restorations were cemented with a triamcinolone/demeclocycyline (Ledermix) or a calcium hydroxide (Provicol) based material. The intensity of post-operative pain experienced was documented according to the VAS (4, 12, 20, 24, and 82 h) and compared to VAS baseline.</p> <p>Results</p> <p>A total of 159 teeth were treated (Ledermix: 83 teeth, Provicol: 76 teeth). The minor irritation of the teeth, experienced prior to treatment, was similar in both groups; however, 4 h after treatment this value was significantly higher in the Provicol group than in the Ledermix group (p < 0.005, t-test). After 12 h, the difference was no longer significant. The number of patients taking analgesics for post-treatment pain was higher in the Provicol group (n = 11/53) than in the Ledermix group (n = 3/56).</p> <p>Conclusions</p> <p>The patients had no long term post-operative pain experience in both groups. However, within the first hours after cementation the sensation of pain was considerably higher in the Provicol group than in the Ledermix group.</p
Assessing multilingual multimodal image description: Studies of native speaker preferences and translator choices
Two studies on multilingual multimodal image description provide empirical evidence
towards two hypotheses at the core of the task: (i) whether target language speakers prefer
descriptions generated directly in their native language, as compared to descriptions
translated from a different language; (ii) the role of the image in human translation of descriptions.
These results provide guidance for future work in multimodal natural language
processing by firstly showing that on the whole, translations are not distinguished from
native language descriptions, and secondly delineating and quantifying the information
gained from the image during the human translation task
- …