79 research outputs found

    Automatic generation of natural language descriptions of visual data: describing images and videos using recurrent and self-attentive models

    Get PDF
    Humans are faced with a constant flow of visual stimuli, e.g., from the environment or when looking at social media. In contrast, visually-impaired people are often incapable to perceive and process this advantageous and beneficial information that could help maneuver them through everyday situations and activities. However, audible feedback such as natural language can give them the ability to better be aware of their surroundings, thus enabling them to autonomously master everyday's challenges. One possibility to create audible feedback is to produce natural language descriptions for visual data such as still images and then read this text to the person. Moreover, textual descriptions for images can be further utilized for text analysis (e.g., sentiment analysis) and information aggregation. In this work, we investigate different approaches and techniques for the automatic generation of natural language of visual data such as still images and video clips. In particular, we look at language models that generate textual descriptions with recurrent neural networks: First, we present a model that allows to generate image captions for scenes that depict interactions between humans and branded products. Thereby, we focus on the correct identification of the brand name in a multi-task training setting and present two new metrics that allow us to evaluate this requirement. Second, we explore the automatic answering of questions posed for an image. In fact, we propose a model that generates answers from scratch instead of predicting an answer from a limited set of possible answers. In comparison to related works, we are therefore able to generate rare answers, which are not contained in the pool of frequent answers. Third, we review the automatic generation of doctors' reports for chest X-ray images. That is, we introduce a model that can cope with a dataset bias of medical datasets (i.e., abnormal cases are very rare) and generates reports with a hierarchical recurrent model. We also investigate the correlation between the distinctiveness of the report and the score in traditional metrics and find a discrepancy between good scores and accurate reports. Then, we examine self-attentive language models that improve computational efficiency and performance over the recurrent models. Specifically, we utilize the Transformer architecture. First, we expand the automatic description generation to the domain of videos where we present a video-to-text (VTT) model that can easily synchronize audio-visual features. With an extensive experimental exploration, we verify the effectiveness of our video-to-text translation pipeline. Finally, we revisit our recurrent models with this self-attentive approach

    Improved Sequence Network for a Grid-Tied Current Controlled Inverter

    Get PDF
    The development of equipment for harvesting renewable energy has lead to an increase in the number of inverters connected to electric grid architectures. The power electronic inverter is a key element to interface most renewables with the grid. Often manufacturers will not provide the detailed schematics of the inverter control scheme that has been implemented. But, current control mode is one of the most common control strategies for inverter design. The control design of such inverters is realized by assuming nominal operating conditions for the grid voltage. However, it is common to model a current-controlled inverter as a three-phase current source even under non-nominal conditions. Therefore, the classical fault analysis tools, such as symmetrical components, needs to consider unbalanced condition impacts on control to make an accurate estimation of the fault current expected from the power electronic unit. The contribution of this work is to study the behavior of a grid-tied current controlled inverter when the grid is experiencing a single line-to-ground fault and to analytically develop a sequence network model that takes into account the control strategy implemented and the nature of the fault. A PLECS simulation of a current controlled inverter is realized to prove that the new sequence network model, that takes into account the impact of the fault on the inverter’s control system behavior, is more representative of inverter behavior compared to a sequence network developed using classical assumptions

    Synchronized audio-visual frames with fractional positional encoding for transformers in video-to-text translation

    Get PDF
    Video-to-Text (VTT) is the task of automatically generating descriptions for short audio-visual video clips, which can support visually impaired people to understand scenes of a YouTube video for instance. Transformer architectures have shown great performance in both machine translation and image captioning, lacking a straightforward and reproducible application for VTT. However, there is no comprehensive study on different strategies and advice for video description generation including exploiting the accompanying audio with fully self-attentive networks. Thus, we explore promising approaches from image captioning and video processing and apply them to VTT by developing a straightforward Transformer architecture. Additionally, we present a novel way of synchronizing audio and video features in Transformers which we call Fractional Positional Encoding (FPE). We run multiple experiments on the VATEX dataset to determine a configuration applicable to unseen datasets that helps describe short video clips in natural language and improved the CIDEr and BLEU-4 scores by 37.13 and 12.83 points compared to a vanilla Transformer network and achieve state-of-the-art results on the MSR-VTT and MSVD datasets. Also, FPE helps increase the CIDEr score by a relative factor of 8.6%

    Multimodal Image Captioning for Marketing Analysis

    Get PDF
    Automatically captioning images with natural language sentences is an important research topic. State of the art models are able to produce human-like sentences. These models typically describe the depicted scene as a whole and do not target specific objects of interest or emotional relationships between these objects in the image. However, marketing companies require to describe these important attributes of a given scene. In our case, objects of interest are consumer goods, which are usually identifiable by a product logo and are associated with certain brands. From a marketing point of view, it is desirable to also evaluate the emotional context of a trademarked product, i.e., whether it appears in a positive or a negative connotation. We address the problem of finding brands in images and deriving corresponding captions by introducing a modified image captioning network. We also add a third output modality, which simultaneously produces real-valued image ratings. Our network is trained using a classification-aware loss function in order to stimulate the generation of sentences with an emphasis on words identifying the brand of a product. We evaluate our model on a dataset of images depicting interactions between humans and branded products. The introduced network improves mean class accuracy by 24.5 percent. Thanks to adding the third output modality, it also considerably improves the quality of generated captions for images depicting branded products.Comment: 4 pages, 1 figure, accepted at MIPR201

    Colonialism, decolonisation, and the right to be human : Britain and the 1951 Geneva Convention on the status of refugees

    Get PDF
    The Geneva Convention on the Status of Refugees is central to scholar-ship on refugee and asylum issues. It is the primary basis upon which asylumseekers make their claims to the majority of host states today and, as a key text ofthe human rights framework, has come to be associated with the very idea of auniversalised rights-bearing human being. Yet British asylum policy today is char-acterized by efforts to limit access to the right to asylum. Many scholars believe thisis because asylum seekers today are different, in character and number, to previouscohorts of applicants. This article goes back to the founding of the refugee rightsregime and investigates the exclusions of colonized peoples from access to the rightto asylum. Using Chimni’s concept of the “myth of difference”, the article demon-strates that asylum seekers have long existed outside of Europe, and that theirexclusion from international rights has been both longstanding and intentional. Thishistorical sociology suggests that the basis for critical work on the issue of asylumpolicy today must be one which takes colonial histories into account

    Tendencias de la cultura y cambio organizacional: estudio de caso

    Get PDF
    La imagen corporativa en relación con el medio se evidencia en el informe de Deloitte (2018) sobre tendencias del capital humano, en donde se reportan alrededor de 11.000 cuestionarios aplicados a gerentes de 140 países y 150 líderes de empresas colombianas, el planteamiento realizado sugiere que el capital social cobra tanto importancia como el físico y el financiero. Estos aspectos están relacionados con la identidad corporativa y cómo se relaciona a su vez con la cultura y la gestión del en la organización. La cultura y la gestión del cambio han cobrado mucho interés para las personas que guían las organizaciones, los estudios realizados por Deloitte en 2017 se focalizaron en la relación de la cultura y el compromiso como elementos importantes del empleado; los resultados del estudio dejan en evidencia cómo la habilidad de las organizaciones para afrontar inconvenientes de compromiso y cultura tenían una reducción del 14% con respecto al año anterior, estos datos permiten entender la complejidad del ambiente en el ámbito laboral y dan cuenta de la importancia de desarrollar conocimiento válido que oriente a académicos y empresarios para que puedan abordar de una manera adecuada estos aspectos.1a edició

    Internationalizing Working-Class History since the 1970s: Challenges from Historiography, Archives, and the Web

    Get PDF
    In this essay the communication practices of labor migrants and their evolution from nineteenth-century print media to late twentieth-century electronic media provide the frame for a discussion of the limitations of national approaches to collection and interpretation. Multiple languages and knowledge of cultures of origin are required, cooperative library and research projects are necessary. On the basis of the Labor Newspaper Preservation Project it is argued that analysis of the bibliographic data by themselves, without going into the contents of the newspapers, revises current assumptions about processes of migration, acculturation, and internationalist class positions. The classic North American immigrant labor press came to an end in the 1970s. New patterns, feminization of migration and mobility to domestic and caregiving work, and new patterns of communication led to an ascendancy of electronic publications. Electronic publications and global rather than hemispheric migration will require different collecting strategies. These, like their printed predecessors, provide a perspective on migrants that differs from ethnicity and state-side approaches. Human rights rather than class struggles and migrant remittances rather the denationalization are the themes, nongovernmental organizations (NGOs) rather than labor organizations are the publishers.published or submitted for publicatio
    corecore