Search CORE

982 research outputs found

Reinforcing an Image Caption Generator Using Off-Line Human Feedback

Author: Han Bohyung
Levinboim Tomer
Seo Paul Hongsuck
Sharma Piyush
Soricut Radu
Publication venue
Publication date: 21/11/2019
Field of study

Human ratings are currently the most accurate way to assess the quality of an image captioning model, yet most often the only used outcome of an expensive human rating evaluation is a few overall statistics over the evaluation dataset. In this paper, we show that the signal from instance-level human caption ratings can be leveraged to improve captioning models, even when the amount of caption ratings is several orders of magnitude less than the caption training data. We employ a policy gradient method to maximize the human ratings as rewards in an off-policy reinforcement learning setting, where policy gradients are estimated by samples from a distribution that focuses on the captions in a caption ratings dataset. Our empirical evidence indicates that the proposed method learns to generalize the human raters' judgments to a previously unseen set of images, as judged by a different set of human judges, and additionally on a different, multi-dimensional side-by-side human evaluation procedure.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Adversarial content manipulation for analyzing and improving model robustness

Author: Shetty Rakshith Ramesh
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2021
Field of study

The recent rapid progress in machine learning systems has opened up many real-world applications --- from recommendation engines on web platforms to safety critical systems like autonomous vehicles. A model deployed in the real-world will often encounter inputs far from its training distribution. For example, a self-driving car might come across a black stop sign in the wild. To ensure safe operation, it is vital to quantify the robustness of machine learning models to such out-of-distribution data before releasing them into the real-world. However, the standard paradigm of benchmarking machine learning models with fixed size test sets drawn from the same distribution as the training data is insufficient to identify these corner cases efficiently. In principle, if we could generate all valid variations of an input and measure the model response, we could quantify and guarantee model robustness locally. Yet, doing this with real world data is not scalable. In this thesis, we propose an alternative, using generative models to create synthetic data variations at scale and test robustness of target models to these variations. We explore methods to generate semantic data variations in a controlled fashion across visual and text modalities. We build generative models capable of performing controlled manipulation of data like changing visual context, editing appearance of an object in images or changing writing style of text. Leveraging these generative models we propose tools to study robustness of computer vision systems to input variations and systematically identify failure modes. In the text domain, we deploy these generative models to improve diversity of image captioning systems and perform writing style manipulation to obfuscate private attributes of the user. Our studies quantifying model robustness explore two kinds of input manipulations, model-agnostic and model-targeted. The model-agnostic manipulations leverage human knowledge to choose the kinds of changes without considering the target model being tested. This includes automatically editing images to remove objects not directly relevant to the task and create variations in visual context. Alternatively, in the model-targeted approach the input variations performed are directly adversarially guided by the target model. For example, we adversarially manipulate the appearance of an object in the image to fool an object detector, guided by the gradients of the detector. Using these methods, we measure and improve the robustness of various computer vision systems -- specifically image classification, segmentation, object detection and visual question answering systems -- to semantic input variations.Der schnelle Fortschritt von Methoden des maschinellen Lernens hat viele neue Anwendungen ermöglicht – von Recommender-Systemen bis hin zu sicherheitskritischen Systemen wie autonomen Fahrzeugen. In der realen Welt werden diese Systeme oft mit Eingaben außerhalb der Verteilung der Trainingsdaten konfrontiert. Zum Beispiel könnte ein autonomes Fahrzeug einem schwarzen Stoppschild begegnen. Um sicheren Betrieb zu gewährleisten, ist es entscheidend, die Robustheit dieser Systeme zu quantifizieren, bevor sie in der Praxis eingesetzt werden. Aktuell werden diese Modelle auf festen Eingaben von derselben Verteilung wie die Trainingsdaten evaluiert. Allerdings ist diese Strategie unzureichend, um solche Ausnahmefälle zu identifizieren. Prinzipiell könnte die Robustheit “lokal” bestimmt werden, indem wir alle zulässigen Variationen einer Eingabe generieren und die Ausgabe des Systems überprüfen. Jedoch skaliert dieser Ansatz schlecht zu echten Daten. In dieser Arbeit benutzen wir generative Modelle, um synthetische Variationen von Eingaben zu erstellen und so die Robustheit eines Modells zu überprüfen. Wir erforschen Methoden, die es uns erlauben, kontrolliert semantische Änderungen an Bild- und Textdaten vorzunehmen. Wir lernen generative Modelle, die kontrollierte Manipulation von Daten ermöglichen, zum Beispiel den visuellen Kontext zu ändern, die Erscheinung eines Objekts zu bearbeiten oder den Schreibstil von Text zu ändern. Basierend auf diesen Modellen entwickeln wir neue Methoden, um die Robustheit von Bilderkennungssystemen bezüglich Variationen in den Eingaben zu untersuchen und Fehlverhalten zu identifizieren. Im Gebiet von Textdaten verwenden wir diese Modelle, um die Diversität von sogenannten Automatische Bildbeschriftung-Modellen zu verbessern und Schreibtstil-Manipulation zu erlauben, um private Attribute des Benutzers zu verschleiern. Um die Robustheit von Modellen zu quantifizieren, werden zwei Arten von Eingabemanipulationen untersucht: Modell-agnostische und Modell-spezifische Manipulationen. Modell-agnostische Manipulationen basieren auf menschlichem Wissen, um bestimmte Änderungen auszuwählen, ohne das entsprechende Modell miteinzubeziehen. Dies beinhaltet das Entfernen von für die Aufgabe irrelevanten Objekten aus Bildern oder Variationen des visuellen Kontextes. In dem alternativen Modell-spezifischen Ansatz werden Änderungen vorgenommen, die für das Modell möglichst ungünstig sind. Zum Beispiel ändern wir die Erscheinung eines Objekts um ein Modell der Objekterkennung täuschen. Dies ist durch den Gradienten des Modells möglich. Mithilfe dieser Werkzeuge können wir die Robustheit von Systemen zur Bildklassifizierung oder -segmentierung, Objekterkennung und Visuelle Fragenbeantwortung quantifizieren und verbessern

Universaar

Acronym

MPG.PuRe

Capsule networks as recurrent models of grouping and segmentation

Author: Doerig Adrien
herzog Michael H
Manassi Mauro
Sayim Bilge
Schmittwilken Lynn
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

Funding: AD was supported by the Swiss National Science Foundation grant n.176153 “Basics of visual processing: from elements to figures”. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Data Availability: The human data for experiment 2 and the full code to reproduce all our results are available here: https://github.com/adriendoerig/Capsule-networks-as-recurrent-models-of-grouping-and-segmentation.Peer reviewedPublisher PD

Infoscience - École polytechnique fédérale de Lausanne

Aberdeen University Research

Directory of Open Access Journals

Radboud Repository

Towards Interaction-level Video Action Understanding

Author: BAI YANG
Publication venue
Publication date: 01/01/2023
Field of study

A huge amount of videos have been created, spread, and viewed daily. Among these massive videos, the actions and activities of humans account for a large part. We desire machines to understand human actions in videos as this is essential to various applications, including but not limited to autonomous driving cars, security systems, human-robot interactions and healthcare. Towards real intelligent system that is able to interact with humans, video understanding must go beyond simply answering ``what is the action in the video", but be more aware of what those actions mean to humans and be more in line with human thinking, which we call interactive-level action understanding. This thesis identifies three main challenges to approaching interactive-level video action understanding: 1) understanding actions given human consensus; 2) understanding actions based on specific human rules; 3) directly understanding actions in videos via human natural language. For the first challenge, we select video summary as a representative task that aims to select informative frames to retain high-level information based on human annotators' experience. Through self-attention architecture and meta-learning, which jointly process dual representations of visual and sequential information for video summarization, the proposed model is capable of understanding video from human consensus (e.g., how humans think which parts of an action sequence are essential). For the second challenge, our works on action quality assessment utilize transformer decoders to parse the input action into several sub-actions and assess the more fine-grained qualities of the given action, yielding the capability of action understanding given specific human rules. (e.g., how well a diving action performs, how well a robot performs surgery) The third key idea explored in this thesis is to use graph neural networks in an adversarial fashion to understand actions through natural language. We demonstrate the utility of this technique for the video captioning task, which takes an action video as input, outputs natural language, and yields state-of-the-art performance. It can be concluded that the research directions and methods introduced in this thesis provide fundamental components toward interactive-level action understanding

Durham e-Theses

Human-Computer Interaction

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

In this book the reader will find a collection of 31 papers presenting different facets of Human Computer Interaction, the result of research projects and experiments as well as new approaches to design user interfaces. The book is organized according to the following main topics in a sequential order: new interaction paradigms, multimodality, usability studies on several interaction mechanisms, human factors, universal design and development methodologies and tools

Directory of Open Access Books (DOAB)

Documenting Femininity: Body-Positivity and Female Empowerment on Instagram

Author: Cwynar-Horta Jessica Cristina
Publication venue
Publication date: 25/11/2016
Field of study

Drawing on participatory research, this study examines the body positive community on Instagram in order to understand how social media platforms enable women to self-present outside of traditional gender norms and challenge dominant ideals of feminine beauty, including the demands to produce smooth skin, to adhere to body size norms, and contain bodily fluids. Although women are experiencing embodiment and agency online, analysis revealed that body positive members are simultaneously repeating dominant codes and reinforcing views of non-normative bodies. This was noticeable when analyzing the responses of the female viewers interviewed for the study who reported visceral feelings of disgust towards the non-normative bodies on display

YorkSpace

Adaptive presentation styles for dynamic hypermedia scripts

Author: Fineblum Michelle Ann
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1991
Field of study

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Architecture, 1991.Title as it appears in the M.I.T. Graduate List, Sept. 1991: Adaptable presentation styles in dynamic hypermedia scripts.Includes bibliographical references (leaves 62-65).by Michelle Ann Fineblum.M.S

DSpace@MIT

ARTiVIS Arts, real-time video and interactivity for sustainability

Author: Mendes Mónica Sofia Santos
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2012
Field of study

Dissertação para obtenção do Grau de Doutor em Media DigitaisPortuguese Foundation for Science and Technology (SFRH/BD/42555/2007

Repositório da Universidade Nova de Lisboa

Image Based Social Media and The Tourist Gaze A Phenomenological Approach

Author
Publication venue
Publication date: 01/01/2019
Field of study

abstract: The emergence of social media in concert with improved camera and cell phone technologies has helped usher in an age of unprecedented visual communication which has radically changed the tourism industry worldwide. Serving as an important pillar of tourism and leisure studies, the concept of the tourist gaze has been left relatively unexamined within the context of this new visual world and more specifically image based social media. This phenomenological inquiry sought to explore how image based social media impacts the concept of the tourist gaze and furthermore to discover how the democratization of the gaze in concert with specific features of image based social media applications impacts the hermeneutic circle of the tourist gaze. This in-depth analysis of the user experience within the context of travel consisted of 19 semi-structured photo elicitation interviews and incorporated 57 participant generated photos. Six salient themes emerged from the study of this phenomenon; 1) sphere of influence, 2) exchange of information, 3) connections manifested, 4) impression management and content curation, 5) replicated travel photography, and 6) expectations. Analysis of these themes in conjunction with examples from the lived user experience demonstrate that the tourist gaze is being accelerated and expanded by image based social media in a rapid manner. Furthermore, democratization of the gaze as enabled by technological developments and specialized social media platforms is actively shifting the power role away from a small number of mass media influencers towards a larger number of branded individuals and social media influencers. Results of this inquiry support the theoretical assertions that the tourist gaze adapts to social and technological developments and demonstrates that the concept of the tourist gaze is increasingly important within tourism studies. Practical implications regarding the prevalence of real-time information, site visitation, and “taking only pictures” as sustainable touristic behavior are discussed.Dissertation/ThesisMasters Thesis Community Resources and Development 201

ASU Digital Repository

Formula electric : powertrain

Author: Allison Mark
Bidwell Bryan
Hopson Stuart
Smith Jackson
Streegan Carlos
Villa Dominic
Publication venue: Scholar Commons
Publication date: 14/06/2013
Field of study

The Santa Clara Formula Electric team designed, and manufactured a powertrain for an electric racecar according to the rules prescribed by the SAE International Formula Electric competition. The powertrain is divided into subsystems: the battery pack, battery pack cooling system, motor controller, and the motor. The battery pack was constructed, but full electrical connection of all cells were not made. The pack was not integrated with the motor and motor controller. In addition, due to time constraints, extensive testing could not be completed

Scholar Commons - Santa Clara University