982 research outputs found

    Reinforcing an Image Caption Generator Using Off-Line Human Feedback

    Full text link
    Human ratings are currently the most accurate way to assess the quality of an image captioning model, yet most often the only used outcome of an expensive human rating evaluation is a few overall statistics over the evaluation dataset. In this paper, we show that the signal from instance-level human caption ratings can be leveraged to improve captioning models, even when the amount of caption ratings is several orders of magnitude less than the caption training data. We employ a policy gradient method to maximize the human ratings as rewards in an off-policy reinforcement learning setting, where policy gradients are estimated by samples from a distribution that focuses on the captions in a caption ratings dataset. Our empirical evidence indicates that the proposed method learns to generalize the human raters' judgments to a previously unseen set of images, as judged by a different set of human judges, and additionally on a different, multi-dimensional side-by-side human evaluation procedure.Comment: AAAI 202

    Adversarial content manipulation for analyzing and improving model robustness

    Get PDF
    The recent rapid progress in machine learning systems has opened up many real-world applications --- from recommendation engines on web platforms to safety critical systems like autonomous vehicles. A model deployed in the real-world will often encounter inputs far from its training distribution. For example, a self-driving car might come across a black stop sign in the wild. To ensure safe operation, it is vital to quantify the robustness of machine learning models to such out-of-distribution data before releasing them into the real-world. However, the standard paradigm of benchmarking machine learning models with fixed size test sets drawn from the same distribution as the training data is insufficient to identify these corner cases efficiently. In principle, if we could generate all valid variations of an input and measure the model response, we could quantify and guarantee model robustness locally. Yet, doing this with real world data is not scalable. In this thesis, we propose an alternative, using generative models to create synthetic data variations at scale and test robustness of target models to these variations. We explore methods to generate semantic data variations in a controlled fashion across visual and text modalities. We build generative models capable of performing controlled manipulation of data like changing visual context, editing appearance of an object in images or changing writing style of text. Leveraging these generative models we propose tools to study robustness of computer vision systems to input variations and systematically identify failure modes. In the text domain, we deploy these generative models to improve diversity of image captioning systems and perform writing style manipulation to obfuscate private attributes of the user. Our studies quantifying model robustness explore two kinds of input manipulations, model-agnostic and model-targeted. The model-agnostic manipulations leverage human knowledge to choose the kinds of changes without considering the target model being tested. This includes automatically editing images to remove objects not directly relevant to the task and create variations in visual context. Alternatively, in the model-targeted approach the input variations performed are directly adversarially guided by the target model. For example, we adversarially manipulate the appearance of an object in the image to fool an object detector, guided by the gradients of the detector. Using these methods, we measure and improve the robustness of various computer vision systems -- specifically image classification, segmentation, object detection and visual question answering systems -- to semantic input variations.Der schnelle Fortschritt von Methoden des maschinellen Lernens hat viele neue Anwendungen ermöglicht – von Recommender-Systemen bis hin zu sicherheitskritischen Systemen wie autonomen Fahrzeugen. In der realen Welt werden diese Systeme oft mit Eingaben außerhalb der Verteilung der Trainingsdaten konfrontiert. Zum Beispiel könnte ein autonomes Fahrzeug einem schwarzen Stoppschild begegnen. Um sicheren Betrieb zu gewĂ€hrleisten, ist es entscheidend, die Robustheit dieser Systeme zu quantifizieren, bevor sie in der Praxis eingesetzt werden. Aktuell werden diese Modelle auf festen Eingaben von derselben Verteilung wie die Trainingsdaten evaluiert. Allerdings ist diese Strategie unzureichend, um solche AusnahmefĂ€lle zu identifizieren. Prinzipiell könnte die Robustheit “lokal” bestimmt werden, indem wir alle zulĂ€ssigen Variationen einer Eingabe generieren und die Ausgabe des Systems ĂŒberprĂŒfen. Jedoch skaliert dieser Ansatz schlecht zu echten Daten. In dieser Arbeit benutzen wir generative Modelle, um synthetische Variationen von Eingaben zu erstellen und so die Robustheit eines Modells zu ĂŒberprĂŒfen. Wir erforschen Methoden, die es uns erlauben, kontrolliert semantische Änderungen an Bild- und Textdaten vorzunehmen. Wir lernen generative Modelle, die kontrollierte Manipulation von Daten ermöglichen, zum Beispiel den visuellen Kontext zu Ă€ndern, die Erscheinung eines Objekts zu bearbeiten oder den Schreibstil von Text zu Ă€ndern. Basierend auf diesen Modellen entwickeln wir neue Methoden, um die Robustheit von Bilderkennungssystemen bezĂŒglich Variationen in den Eingaben zu untersuchen und Fehlverhalten zu identifizieren. Im Gebiet von Textdaten verwenden wir diese Modelle, um die DiversitĂ€t von sogenannten Automatische Bildbeschriftung-Modellen zu verbessern und Schreibtstil-Manipulation zu erlauben, um private Attribute des Benutzers zu verschleiern. Um die Robustheit von Modellen zu quantifizieren, werden zwei Arten von Eingabemanipulationen untersucht: Modell-agnostische und Modell-spezifische Manipulationen. Modell-agnostische Manipulationen basieren auf menschlichem Wissen, um bestimmte Änderungen auszuwĂ€hlen, ohne das entsprechende Modell miteinzubeziehen. Dies beinhaltet das Entfernen von fĂŒr die Aufgabe irrelevanten Objekten aus Bildern oder Variationen des visuellen Kontextes. In dem alternativen Modell-spezifischen Ansatz werden Änderungen vorgenommen, die fĂŒr das Modell möglichst ungĂŒnstig sind. Zum Beispiel Ă€ndern wir die Erscheinung eines Objekts um ein Modell der Objekterkennung tĂ€uschen. Dies ist durch den Gradienten des Modells möglich. Mithilfe dieser Werkzeuge können wir die Robustheit von Systemen zur Bildklassifizierung oder -segmentierung, Objekterkennung und Visuelle Fragenbeantwortung quantifizieren und verbessern

    Capsule networks as recurrent models of grouping and segmentation

    Get PDF
    Funding: AD was supported by the Swiss National Science Foundation grant n.176153 “Basics of visual processing: from elements to figures”. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Data Availability: The human data for experiment 2 and the full code to reproduce all our results are available here: https://github.com/adriendoerig/Capsule-networks-as-recurrent-models-of-grouping-and-segmentation.Peer reviewedPublisher PD

    Towards Interaction-level Video Action Understanding

    Get PDF
    A huge amount of videos have been created, spread, and viewed daily. Among these massive videos, the actions and activities of humans account for a large part. We desire machines to understand human actions in videos as this is essential to various applications, including but not limited to autonomous driving cars, security systems, human-robot interactions and healthcare. Towards real intelligent system that is able to interact with humans, video understanding must go beyond simply answering ``what is the action in the video", but be more aware of what those actions mean to humans and be more in line with human thinking, which we call interactive-level action understanding. This thesis identifies three main challenges to approaching interactive-level video action understanding: 1) understanding actions given human consensus; 2) understanding actions based on specific human rules; 3) directly understanding actions in videos via human natural language. For the first challenge, we select video summary as a representative task that aims to select informative frames to retain high-level information based on human annotators' experience. Through self-attention architecture and meta-learning, which jointly process dual representations of visual and sequential information for video summarization, the proposed model is capable of understanding video from human consensus (e.g., how humans think which parts of an action sequence are essential). For the second challenge, our works on action quality assessment utilize transformer decoders to parse the input action into several sub-actions and assess the more fine-grained qualities of the given action, yielding the capability of action understanding given specific human rules. (e.g., how well a diving action performs, how well a robot performs surgery) The third key idea explored in this thesis is to use graph neural networks in an adversarial fashion to understand actions through natural language. We demonstrate the utility of this technique for the video captioning task, which takes an action video as input, outputs natural language, and yields state-of-the-art performance. It can be concluded that the research directions and methods introduced in this thesis provide fundamental components toward interactive-level action understanding

    Human-Computer Interaction

    Get PDF
    In this book the reader will find a collection of 31 papers presenting different facets of Human Computer Interaction, the result of research projects and experiments as well as new approaches to design user interfaces. The book is organized according to the following main topics in a sequential order: new interaction paradigms, multimodality, usability studies on several interaction mechanisms, human factors, universal design and development methodologies and tools

    Documenting Femininity: Body-Positivity and Female Empowerment on Instagram

    Get PDF
    Drawing on participatory research, this study examines the body positive community on Instagram in order to understand how social media platforms enable women to self-present outside of traditional gender norms and challenge dominant ideals of feminine beauty, including the demands to produce smooth skin, to adhere to body size norms, and contain bodily fluids. Although women are experiencing embodiment and agency online, analysis revealed that body positive members are simultaneously repeating dominant codes and reinforcing views of non-normative bodies. This was noticeable when analyzing the responses of the female viewers interviewed for the study who reported visceral feelings of disgust towards the non-normative bodies on display

    Adaptive presentation styles for dynamic hypermedia scripts

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Architecture, 1991.Title as it appears in the M.I.T. Graduate List, Sept. 1991: Adaptable presentation styles in dynamic hypermedia scripts.Includes bibliographical references (leaves 62-65).by Michelle Ann Fineblum.M.S

    ARTiVIS Arts, real-time video and interactivity for sustainability

    Get PDF
    Dissertação para obtenção do Grau de Doutor em Media DigitaisPortuguese Foundation for Science and Technology (SFRH/BD/42555/2007

    Image Based Social Media and The Tourist Gaze A Phenomenological Approach

    Get PDF
    abstract: The emergence of social media in concert with improved camera and cell phone technologies has helped usher in an age of unprecedented visual communication which has radically changed the tourism industry worldwide. Serving as an important pillar of tourism and leisure studies, the concept of the tourist gaze has been left relatively unexamined within the context of this new visual world and more specifically image based social media. This phenomenological inquiry sought to explore how image based social media impacts the concept of the tourist gaze and furthermore to discover how the democratization of the gaze in concert with specific features of image based social media applications impacts the hermeneutic circle of the tourist gaze. This in-depth analysis of the user experience within the context of travel consisted of 19 semi-structured photo elicitation interviews and incorporated 57 participant generated photos. Six salient themes emerged from the study of this phenomenon; 1) sphere of influence, 2) exchange of information, 3) connections manifested, 4) impression management and content curation, 5) replicated travel photography, and 6) expectations. Analysis of these themes in conjunction with examples from the lived user experience demonstrate that the tourist gaze is being accelerated and expanded by image based social media in a rapid manner. Furthermore, democratization of the gaze as enabled by technological developments and specialized social media platforms is actively shifting the power role away from a small number of mass media influencers towards a larger number of branded individuals and social media influencers. Results of this inquiry support the theoretical assertions that the tourist gaze adapts to social and technological developments and demonstrates that the concept of the tourist gaze is increasingly important within tourism studies. Practical implications regarding the prevalence of real-time information, site visitation, and “taking only pictures” as sustainable touristic behavior are discussed.Dissertation/ThesisMasters Thesis Community Resources and Development 201

    Formula electric : powertrain

    Get PDF
    The Santa Clara Formula Electric team designed, and manufactured a powertrain for an electric racecar according to the rules prescribed by the SAE International Formula Electric competition. The powertrain is divided into subsystems: the battery pack, battery pack cooling system, motor controller, and the motor. The battery pack was constructed, but full electrical connection of all cells were not made. The pack was not integrated with the motor and motor controller. In addition, due to time constraints, extensive testing could not be completed
    • 

    corecore