172 research outputs found

    Natural language image description: data, models, and evaluation

    Get PDF
    Automatically describing an image with a concise natural language description is an ambitious and emerging task bringing together the Natural Language and Computer Vision communities. With any emerging task, the necessary groundwork developing appropriate datasets, strong baseline models, and evaluation frameworks is key. In this thesis, we introduce the rst large datasets speci cally designed with image description in mind, focusing on concrete descriptions that can be gleaned from the image alone. Furthermore, we develop strong baseline models that show the need to model language beyond a simple bag-of-words approach to increase performance. Most importantly, we introduce a ranking based framework for comparing image description models. We show that this framework is more reliable and accurate than the conventional wisdom of evaluating on novel model generated text. As this task has gained popularity recently, we further analyze the drawbacks of current evaluation methods, and put forth concrete extensions to our ranking framework that will guide progress towards modeling the association of natural language and the images the language describes

    Machine Learning and Irresponsible Inference: Morally Assessing the Training Data for Image Recognition Systems

    Get PDF
    Just as humans can draw conclusions responsibly or irresponsibly, so too can computers. Machine learning systems that have been trained on data sets that include irresponsible judgments are likely to yield irresponsible predictions as outputs. In this paper I focus on a particular kind of inference a computer system might make: identification of the intentions with which a person acted on the basis of photographic evidence. Such inferences are liable to be morally objectionable, because of a way in which they are presumptuous. After elaborating this moral concern, I explore the possibility that carefully procuring the training data for image recognition systems could ensure that the systems avoid the problem. The lesson of this paper extends beyond just the particular case of image recognition systems and the challenge of responsibly identifying a person’s intentions. Reflection on this particular case demonstrates the importance (as well as the difficulty) of evaluating machine learning systems and their training data from the standpoint of moral considerations that are not encompassed by ordinary assessments of predictive accuracy

    The role of image representations in vision to language tasks

    Get PDF
    Tasks that require modeling of both language and visual information such as image captioning have become very popular in recent years. Most state-of-the-art approaches make use of image representations obtained from a deep neural network, which are used to generate language information in a variety of ways with end-to-end neural network-based models. However, it is not clear how different image representations contribute to language generation tasks. In this paper, we probe the representational contribution of the image features in an end-to-end neural modeling framework and study the properties of different types of image representations. We focus on two popular vision to language problems: the task of image captioning and the task of multimodal machine translation. Our analysis provides interesting insights into the representational properties and suggests that end-to-end approaches implicitly learn a visual-semantic subspace and exploit the subspace to generate captions

    Revascularization of the Periodontium After Tooth Grafting in Monkeys

    Full text link
    In replanted and homo transplanted teeth a vascular network developed in the blood clot between the two parts of the torn periodontium, which allowed the grafted ligament to regain its vascularity. When dentoalveolar ankylosis developed, the periodontal vasculature was split into a number of vascular clusters. In homotransplants, a definite cellular immunologic response by the host was absent. An acrylic radicular obturator was used.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/67229/2/10.1177_00220345710500025101.pd

    Interactive-predictive neural multimodal systems

    Full text link
    [EN] Despite the advances achieved by neural models in sequence to sequence learning, exploited in a variety of tasks, they still make errors. In many use cases, these are corrected by a human expert in a posterior revision process. The interactive-predictive framework aims to minimize the human effort spent on this process by considering partial corrections for iteratively refining the hypothesis. In this work, we generalize the interactive-predictive approach, typically applied in to machine translation field, to tackle other multimodal problems namely, image and video captioning. We study the application of this framework to multimodal neural sequence to sequence models. We show that, following this framework, we approximately halve the effort spent for correcting the outputs generated by the automatic systems. Moreover, we deploy our systems in a publicly accessible demonstration, that allows to better understand the behavior of the interactive-predictive framework.The research leading to these results has received funding from MINECO under grant IDIFEDER/2018/025 Sistemas de fabricacion inteligentes para la industria 4.0, action co-funded by the European Regional Development Fund 2014-2020 (FEDER), and from the European Commission under grant H2020, reference 825111 (DeepHealth). We also acknowledge NVIDIA Corporation for the donation of GPUs used in this work.Peris, Á.; Casacuberta Nolla, F. (2019). Interactive-predictive neural multimodal systems. Springer. 16-28. https://doi.org/978-3-030-31332-6_2S162

    A prospective clinical trial on the influence of a triamcinolone/demeclocycline and a calcium hydroxide based temporary cement on pain perception

    Get PDF
    <p>Abstract</p> <p>Introduction</p> <p>The aim of this clinical trial was to compare the degree of short term post-operative irritation after application of a triamcinolone/demeclocycyline based or a calcium hydroxide based provisional cement.</p> <p>Methods</p> <p>A total of 109 patients (55 female and 54 male; mean age: 51 ± 14 years) with primary or secondary dentinal caries were randomly assigned to the two treatment groups of this biomedical clinical trial (phase III). Selection criteria were good systemic health and treated teeth, which were vital and showed no symptoms of pulpitis. Up to three teeth were prepared for indirect metallic restorations, and the provisional restorations were cemented with a triamcinolone/demeclocycyline (Ledermix) or a calcium hydroxide (Provicol) based material. The intensity of post-operative pain experienced was documented according to the VAS (4, 12, 20, 24, and 82 h) and compared to VAS baseline.</p> <p>Results</p> <p>A total of 159 teeth were treated (Ledermix: 83 teeth, Provicol: 76 teeth). The minor irritation of the teeth, experienced prior to treatment, was similar in both groups; however, 4 h after treatment this value was significantly higher in the Provicol group than in the Ledermix group (p < 0.005, t-test). After 12 h, the difference was no longer significant. The number of patients taking analgesics for post-treatment pain was higher in the Provicol group (n = 11/53) than in the Ledermix group (n = 3/56).</p> <p>Conclusions</p> <p>The patients had no long term post-operative pain experience in both groups. However, within the first hours after cementation the sensation of pain was considerably higher in the Provicol group than in the Ledermix group.</p

    Assessing multilingual multimodal image description: Studies of native speaker preferences and translator choices

    Get PDF
    Two studies on multilingual multimodal image description provide empirical evidence towards two hypotheses at the core of the task: (i) whether target language speakers prefer descriptions generated directly in their native language, as compared to descriptions translated from a different language; (ii) the role of the image in human translation of descriptions. These results provide guidance for future work in multimodal natural language processing by firstly showing that on the whole, translations are not distinguished from native language descriptions, and secondly delineating and quantifying the information gained from the image during the human translation task
    corecore