9 research outputs found

    FrameNet annotation for multimodal corpora: devising a methodology for the semantic representation of text-image interactions in audiovisual productions

    Get PDF
    Multimodal analyses have been growing in importance within several approaches to Cognitive Linguistics and applied fields such as Natural Language Understanding. Nonetheless fine-grained semantic representations of multimodal objects are still lacking, especially in terms of integrating areas such as Natural Language Processing and Computer Vision, which are key for the implementation of multimodality in Computational Linguistics. In this dissertation, we propose a methodology for extending FrameNet annotation to the multimodal domain, since FrameNet can provide fine-grained semantic representations, particularly with a database enriched by Qualia and other interframal and intraframal relations, as it is the case of FrameNet Brasil. To make FrameNet Brasil able to conduct multimodal analysis, we outlined the hypothesis that similarly to the way in which words in a sentence evoke frames and organize their elements in the syntactic locality accompanying them, visual elements in video shots may, also, evoke frames and organize their elements on the screen or work complementarily with the frame evocation patterns of the sentences narrated simultaneously to their appearance on screen, providing different profiling and perspective options for meaning construction. The corpus annotated for testing the hypothesis is composed of episodes of a Brazilian TV Travel Series critically acclaimed as an exemplar of good practices in audiovisual composition. The TV genre chosen also configures a novel experimental setting for research on integrated image and text comprehension, since, in this corpus, text is not a direct description of the image sequence but correlates with it indirectly in a myriad of ways. The dissertation also reports on an eye-tracker experiment conducted to validate the approach proposed to a text-oriented annotation. The experiment demonstrated that it is not possible to determine that text impacts gaze directly and was taken as a reinforcement to the approach of valorizing modes combination. Last, we present the Frame2 dataset, the product of the annotation task carried out for the corpus following both the methodology and guidelines proposed. The results achieved demonstrate that, at least for this TV genre but possibly also for others, a fine-grained semantic annotation tackling the diverse correlations that take place in a multimodal setting provides new perspective in multimodal comprehension modeling. Moreover, multimodal annotation also enriches the development of FrameNets, to the extent that correlations found between modalities can attest the modeling choices made by those building frame-based resources.Análises multimodais vêm crescendo em importância em várias abordagens da Linguística Cognitiva e em diversas áreas de aplicação, como o da Compreensão de Linguagem Natural. No entanto, há significativa carência de representações semânticas refinadas de objetos multimodais, especialmente em termos de integração de áreas como Processamento de Linguagem Natural e Visão Computacional, que são fundamentais para a implementação de multimodalidade no campo da Linguística Computacional. Nesta tese, propomos uma metodologia para estender o método de anotação da FrameNet ao domínio multimodal, uma vez que a FrameNet pode fornecer representações semânticas refinadas, particularmente com um banco de dados enriquecido por Qualia e outras relações interframe e intraframe, como é o caso do FrameNet Brasil. Para tornar a FrameNet Brasil capaz de realizar análises multimodais, delineamos a hipótese de que, assim como as palavras em uma frase evocam frames e organizam seus elementos na localidade sintática que os acompanha, os elementos visuais nos planos de vídeo também podem evocar frames e organizar seus elementos na tela ou trabalhar de forma complementar aos padrões de evocação de frames das sentenças narradas simultaneamente ao seu aparecimento na tela, proporcionando diferentes perfis e opções de perspectiva para a construção de sentido. O corpus anotado para testar a hipótese é composto por episódios de um programa televisivo de viagens brasileiro aclamado pela crítica como um exemplo de boas práticas em composição audiovisual. O gênero televisivo escolhido também configura um novo conjunto experimental para a pesquisa em imagem integrada e compreensão textual, uma vez que, neste corpus, o texto não é uma descrição direta da sequência de imagens, mas se correlaciona com ela indiretamente em uma miríade de formas diversa. A Tese também relata um experimento de rastreamento ocular realizado para validar a abordagem proposta para uma anotação orientada por texto. O experimento demonstrou que não é possível determinar que o texto impacta diretamente o direcionamento do olhar e foi tomado como um reforço para a abordagem de valorização da combinação de modos. Por fim, apresentamos o conjunto de dados Frame2, produto da tarefa de anotação realizada para o corpus seguindo a metodologia e as diretrizes propostas. Os resultados obtidos demonstram que, pelo menos para esse gênero de TV, mas possivelmente também para outros, uma anotação semântica refinada que aborde as diversas correlações que ocorrem em um ambiente multimodal oferece uma nova perspectiva na modelagem da compreensão multimodal. Além disso, a anotação multimodal também enriquece o desenvolvimento de FrameNets, na medida em que as correlações encontradas entre as modalidades podem atestar as escolhas de modelagem feitas por aqueles que criam recursos baseados em frames.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superio

    Deep Learning-Based Robotic Perception for Adaptive Facility Disinfection

    Get PDF
    Hospitals, schools, airports, and other environments built for mass gatherings can become hot spots for microbial pathogen colonization, transmission, and exposure, greatly accelerating the spread of infectious diseases across communities, cities, nations, and the world. Outbreaks of infectious diseases impose huge burdens on our society. Mitigating the spread of infectious pathogens within mass-gathering facilities requires routine cleaning and disinfection, which are primarily performed by cleaning staff under current practice. However, manual disinfection is limited in terms of both effectiveness and efficiency, as it is labor-intensive, time-consuming, and health-undermining. While existing studies have developed a variety of robotic systems for disinfecting contaminated surfaces, those systems are not adequate for intelligent, precise, and environmentally adaptive disinfection. They are also difficult to deploy in mass-gathering infrastructure facilities, given the high volume of occupants. Therefore, there is a critical need to develop an adaptive robot system capable of complete and efficient indoor disinfection. The overarching goal of this research is to develop an artificial intelligence (AI)-enabled robotic system that adapts to ambient environments and social contexts for precise and efficient disinfection. This would maintain environmental hygiene and health, reduce unnecessary labor costs for cleaning, and mitigate opportunity costs incurred from infections. To these ends, this dissertation first develops a multi-classifier decision fusion method, which integrates scene graph and visual information, in order to recognize patterns in human activity in infrastructure facilities. Next, a deep-learning-based method is proposed for detecting and classifying indoor objects, and a new mechanism is developed to map detected objects in 3D maps. A novel framework is then developed to detect and segment object affordance and to project them into a 3D semantic map for precise disinfection. Subsequently, a novel deep-learning network, which integrates multi-scale features and multi-level features, and an encoder network are developed to recognize the materials of surfaces requiring disinfection. Finally, a novel computational method is developed to link the recognition of object surface information to robot disinfection actions with optimal disinfection parameters

    Introduction: Ways of Machine Seeing

    Get PDF
    How do machines, and, in particular, computational technologies, change the way we see the world? This special issue brings together researchers from a wide range of disciplines to explore the entanglement of machines and their ways of seeing from new critical perspectives. This 'editorial' is for a special issue of AI & Society, which includes contributions from: María Jesús Schultz Abarca, Peter Bell, Tobias Blanke, Benjamin Bratton, Claudio Celis Bueno, Kate Crawford, Iain Emsley, Abelardo Gil-Fournier, Daniel Chávez Heras, Vladan Joler, Nicolas Malevé, Lev Manovich, Nicholas Mirzoeff, Perle Møhl, Bruno Moreschi, Fabian Offert, Trevor Paglan, Jussi Parikka, Luciana Parisi, Matteo Pasquinelli, Gabriel Pereira, Carloalberto Treccani, Rebecca Uliasz, and Manuel van der Veen

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF

    Metaphor: window to the native speakers' mind

    Get PDF
    This paper looks at the importance of learning and understanding metaphor among second language learners. It begins with a brief overview of the Sapir-Whorf linguistic relativity theory. It provides further discussion of Lakoff and Johnson (1980), Andrew Goatly (2000) and CharterisBlack’s (2011) framework on conceptual metaphor. Discussion of these theories will highlight the critical roles played by metaphor in everyday basic communication to convey complex idea and persuasion. Failure to recognize and understand metaphor will be detrimental to L2 learners’ proficiency and hinder their ability to communicate effectively. L2 learners may also overlook important information or overemphasize trivia. The ability to recognize and understand metaphor will help L2 learners communicate effectively with the native speakers and this will provides first-hand second language cultural experience to the learners

    Psychological Engagement in Choice and Judgment Under Risk and Uncertainty

    Get PDF
    Theories of choice and judgment assume that agents behave rationally, choose the higher expected value option, and evaluate the choice consistently (Expected Utility Theory, Von Neumann, & Morgenstern, 1947). However, researchers in decision-making showed that human behaviour is different in choice and judgement tasks (Slovic & Lichtenstein, 1968; 1971; 1973). In this research, we propose that psychological engagement and control deprivation predict behavioural inconsistencies and utilitarian performance with judgment and choice. Moreover, we explore the influences of engagement and control deprivation on agent’s behaviours, while manipulating content of utility (Kusev et al., 2011, Hertwig & Gigerenzer 1999, Tversky & Khaneman, 1996) and decision reward (Kusev et al, 2013, Shafir et al., 2002)
    corecore