79 research outputs found
Automatic generation of natural language descriptions of visual data: describing images and videos using recurrent and self-attentive models
Humans are faced with a constant flow of visual stimuli, e.g., from the environment or when looking at social media. In contrast, visually-impaired people are often incapable to perceive and process this advantageous and beneficial information that could help maneuver them through everyday situations and activities. However, audible feedback such as natural language can give them the ability to better be aware of their surroundings, thus enabling them to autonomously master everyday's challenges. One possibility to create audible feedback is to produce natural language descriptions for visual data such as still images and then read this text to the person. Moreover, textual descriptions for images can be further utilized for text analysis (e.g., sentiment analysis) and information aggregation. In this work, we investigate different approaches and techniques for the automatic generation of natural language of visual data such as still images and video clips.
In particular, we look at language models that generate textual descriptions with recurrent neural networks: First, we present a model that allows to generate image captions for scenes that depict interactions between humans and branded products. Thereby, we focus on the correct identification of the brand name in a multi-task training setting and present two new metrics that allow us to evaluate this requirement. Second, we explore the automatic answering of questions posed for an image. In fact, we propose a model that generates answers from scratch instead of predicting an answer from a limited set of possible answers. In comparison to related works, we are therefore able to generate rare answers, which are not contained in the pool of frequent answers. Third, we review the automatic generation of doctors' reports for chest X-ray images. That is, we introduce a model that can cope with a dataset bias of medical datasets (i.e., abnormal cases are very rare) and generates reports with a hierarchical recurrent model. We also investigate the correlation between the distinctiveness of the report and the score in traditional metrics and find a discrepancy between good scores and accurate reports.
Then, we examine self-attentive language models that improve computational efficiency and performance over the recurrent models. Specifically, we utilize the Transformer architecture. First, we expand the automatic description generation to the domain of videos where we present a video-to-text (VTT) model that can easily synchronize audio-visual features. With an extensive experimental exploration, we verify the effectiveness of our video-to-text translation pipeline. Finally, we revisit our recurrent models with this self-attentive approach
Improved Sequence Network for a Grid-Tied Current Controlled Inverter
The development of equipment for harvesting renewable energy has lead to an increase in the number of inverters connected to electric grid architectures. The power electronic inverter is a key element to interface most renewables with the grid. Often manufacturers will not provide the detailed schematics of the inverter control scheme that has been implemented. But, current control mode is one of the most common control strategies for inverter design.
The control design of such inverters is realized by assuming nominal operating conditions for the grid voltage. However, it is common to model a current-controlled inverter as a three-phase current source even under non-nominal conditions. Therefore, the classical fault analysis tools, such as symmetrical components, needs to consider unbalanced condition impacts on control to make an accurate estimation of the fault current expected from the power electronic unit. The contribution of this work is to study the behavior of a grid-tied current controlled inverter when the grid is experiencing a single line-to-ground fault and to analytically develop a sequence network model that takes into account the control strategy implemented and the nature of the fault. A PLECS simulation of a current controlled inverter is realized to prove that the new sequence network model, that takes into account the impact of the fault on the inverter’s control system behavior, is more representative of inverter behavior compared to a sequence network developed using classical assumptions
Synchronized audio-visual frames with fractional positional encoding for transformers in video-to-text translation
Video-to-Text (VTT) is the task of automatically generating descriptions for
short audio-visual video clips, which can support visually impaired people to
understand scenes of a YouTube video for instance. Transformer architectures
have shown great performance in both machine translation and image captioning,
lacking a straightforward and reproducible application for VTT. However, there
is no comprehensive study on different strategies and advice for video
description generation including exploiting the accompanying audio with fully
self-attentive networks. Thus, we explore promising approaches from image
captioning and video processing and apply them to VTT by developing a
straightforward Transformer architecture. Additionally, we present a novel way
of synchronizing audio and video features in Transformers which we call
Fractional Positional Encoding (FPE). We run multiple experiments on the VATEX
dataset to determine a configuration applicable to unseen datasets that helps
describe short video clips in natural language and improved the CIDEr and
BLEU-4 scores by 37.13 and 12.83 points compared to a vanilla Transformer
network and achieve state-of-the-art results on the MSR-VTT and MSVD datasets.
Also, FPE helps increase the CIDEr score by a relative factor of 8.6%
Multimodal Image Captioning for Marketing Analysis
Automatically captioning images with natural language sentences is an
important research topic. State of the art models are able to produce
human-like sentences. These models typically describe the depicted scene as a
whole and do not target specific objects of interest or emotional relationships
between these objects in the image. However, marketing companies require to
describe these important attributes of a given scene. In our case, objects of
interest are consumer goods, which are usually identifiable by a product logo
and are associated with certain brands. From a marketing point of view, it is
desirable to also evaluate the emotional context of a trademarked product,
i.e., whether it appears in a positive or a negative connotation. We address
the problem of finding brands in images and deriving corresponding captions by
introducing a modified image captioning network. We also add a third output
modality, which simultaneously produces real-valued image ratings. Our network
is trained using a classification-aware loss function in order to stimulate the
generation of sentences with an emphasis on words identifying the brand of a
product. We evaluate our model on a dataset of images depicting interactions
between humans and branded products. The introduced network improves mean class
accuracy by 24.5 percent. Thanks to adding the third output modality, it also
considerably improves the quality of generated captions for images depicting
branded products.Comment: 4 pages, 1 figure, accepted at MIPR201
Colonialism, decolonisation, and the right to be human : Britain and the 1951 Geneva Convention on the status of refugees
The Geneva Convention on the Status of Refugees is central to scholar-ship on refugee and asylum issues. It is the primary basis upon which asylumseekers make their claims to the majority of host states today and, as a key text ofthe human rights framework, has come to be associated with the very idea of auniversalised rights-bearing human being. Yet British asylum policy today is char-acterized by efforts to limit access to the right to asylum. Many scholars believe thisis because asylum seekers today are different, in character and number, to previouscohorts of applicants. This article goes back to the founding of the refugee rightsregime and investigates the exclusions of colonized peoples from access to the rightto asylum. Using Chimni’s concept of the “myth of difference”, the article demon-strates that asylum seekers have long existed outside of Europe, and that theirexclusion from international rights has been both longstanding and intentional. Thishistorical sociology suggests that the basis for critical work on the issue of asylumpolicy today must be one which takes colonial histories into account
Tendencias de la cultura y cambio organizacional: estudio de caso
La imagen corporativa en relación con el medio se evidencia en el informe de Deloitte (2018) sobre tendencias del capital humano, en donde se reportan alrededor de 11.000 cuestionarios aplicados a gerentes de 140 países y 150 líderes de empresas colombianas, el planteamiento realizado sugiere que el capital social cobra tanto importancia como el físico y el financiero. Estos aspectos están relacionados con la identidad corporativa y cómo se relaciona a su vez con la cultura y la gestión del en la organización. La cultura y la gestión del cambio han cobrado mucho interés para las personas que guían las organizaciones, los estudios realizados por Deloitte en 2017 se focalizaron en la relación de la cultura y el compromiso como elementos importantes del empleado; los resultados del estudio dejan en evidencia cómo la habilidad de las organizaciones para afrontar inconvenientes de compromiso y cultura tenían una reducción del 14% con respecto al año anterior, estos datos permiten entender la complejidad del ambiente en el ámbito laboral y dan cuenta de la importancia de desarrollar conocimiento válido que oriente a académicos y empresarios para que puedan abordar
de una manera adecuada estos aspectos.1a edició
Internationalizing Working-Class History since the 1970s: Challenges from Historiography, Archives, and the Web
In this essay the communication practices of labor migrants and their
evolution from nineteenth-century print media to late twentieth-century
electronic media provide the frame for a discussion of the
limitations of national approaches to collection and interpretation.
Multiple languages and knowledge of cultures of origin are required,
cooperative library and research projects are necessary. On the basis
of the Labor Newspaper Preservation Project it is argued that analysis
of the bibliographic data by themselves, without going into the contents
of the newspapers, revises current assumptions about processes
of migration, acculturation, and internationalist class positions. The
classic North American immigrant labor press came to an end in the
1970s. New patterns, feminization of migration and mobility to domestic
and caregiving work, and new patterns of communication led
to an ascendancy of electronic publications. Electronic publications
and global rather than hemispheric migration will require different
collecting strategies. These, like their printed predecessors, provide
a perspective on migrants that differs from ethnicity and state-side
approaches. Human rights rather than class struggles and migrant
remittances rather the denationalization are the themes, nongovernmental
organizations (NGOs) rather than labor organizations
are the publishers.published or submitted for publicatio
- …