25 research outputs found

    Text–to–Video: Image Semantics and NLP

    Get PDF
    When aiming at automatically translating an arbitrary text into a visual story, the main challenge consists in finding a semantically close visual representation whereby the displayed meaning should remain the same as in the given text. Besides, the appearance of an image itself largely influences how its meaningful information is transported towards an observer. This thesis now demonstrates that investigating in both, image semantics as well as the semantic relatedness between visual and textual sources enables us to tackle the challenging semantic gap and to find a semantically close translation from natural language to a corresponding visual representation. Within the last years, social networking became of high interest leading to an enormous and still increasing amount of online available data. Photo sharing sites like Flickr allow users to associate textual information with their uploaded imagery. Thus, this thesis exploits this huge knowledge source of user generated data providing initial links between images and words, and other meaningful data. In order to approach visual semantics, this work presents various methods to analyze the visual structure as well as the appearance of images in terms of meaningful similarities, aesthetic appeal, and emotional effect towards an observer. In detail, our GPU-based approach efficiently finds visual similarities between images in large datasets across visual domains and identifies various meanings for ambiguous words exploring similarity in online search results. Further, we investigate in the highly subjective aesthetic appeal of images and make use of deep learning to directly learn aesthetic rankings from a broad diversity of user reactions in social online behavior. To gain even deeper insights into the influence of visual appearance towards an observer, we explore how simple image processing is capable of actually changing the emotional perception and derive a simple but effective image filter. To identify meaningful connections between written text and visual representations, we employ methods from Natural Language Processing (NLP). Extensive textual processing allows us to create semantically relevant illustrations for simple text elements as well as complete storylines. More precisely, we present an approach that resolves dependencies in textual descriptions to arrange 3D models correctly. Further, we develop a method that finds semantically relevant illustrations to texts of different types based on a novel hierarchical querying algorithm. Finally, we present an optimization based framework that is capable of not only generating semantically relevant but also visually coherent picture stories in different styles.Bei der automatischen Umwandlung eines beliebigen Textes in eine visuelle Geschichte, besteht die größte Herausforderung darin eine semantisch passende visuelle Darstellung zu finden. Dabei sollte die Bedeutung der Darstellung dem vorgegebenen Text entsprechen. Darüber hinaus hat die Erscheinung eines Bildes einen großen Einfluß darauf, wie seine bedeutungsvollen Inhalte auf einen Betrachter übertragen werden. Diese Dissertation zeigt, dass die Erforschung sowohl der Bildsemantik als auch der semantischen Verbindung zwischen visuellen und textuellen Quellen es ermöglicht, die anspruchsvolle semantische Lücke zu schließen und eine semantisch nahe Übersetzung von natürlicher Sprache in eine entsprechend sinngemäße visuelle Darstellung zu finden. Des Weiteren gewann die soziale Vernetzung in den letzten Jahren zunehmend an Bedeutung, was zu einer enormen und immer noch wachsenden Menge an online verfügbaren Daten geführt hat. Foto-Sharing-Websites wie Flickr ermöglichen es Benutzern, Textinformationen mit ihren hochgeladenen Bildern zu verknüpfen. Die vorliegende Arbeit nutzt die enorme Wissensquelle von benutzergenerierten Daten welche erste Verbindungen zwischen Bildern und Wörtern sowie anderen aussagekräftigen Daten zur Verfügung stellt. Zur Erforschung der visuellen Semantik stellt diese Arbeit unterschiedliche Methoden vor, um die visuelle Struktur sowie die Wirkung von Bildern in Bezug auf bedeutungsvolle Ähnlichkeiten, ästhetische Erscheinung und emotionalem Einfluss auf einen Beobachter zu analysieren. Genauer gesagt, findet unser GPU-basierter Ansatz effizient visuelle Ähnlichkeiten zwischen Bildern in großen Datenmengen quer über visuelle Domänen hinweg und identifiziert verschiedene Bedeutungen für mehrdeutige Wörter durch die Erforschung von Ähnlichkeiten in Online-Suchergebnissen. Des Weiteren wird die höchst subjektive ästhetische Anziehungskraft von Bildern untersucht und "deep learning" genutzt, um direkt ästhetische Einordnungen aus einer breiten Vielfalt von Benutzerreaktionen im sozialen Online-Verhalten zu lernen. Um noch tiefere Erkenntnisse über den Einfluss des visuellen Erscheinungsbildes auf einen Betrachter zu gewinnen, wird erforscht, wie alleinig einfache Bildverarbeitung in der Lage ist, tatsächlich die emotionale Wahrnehmung zu verändern und ein einfacher aber wirkungsvoller Bildfilter davon abgeleitet werden kann. Um bedeutungserhaltende Verbindungen zwischen geschriebenem Text und visueller Darstellung zu ermitteln, werden Methoden des "Natural Language Processing (NLP)" verwendet, die der Verarbeitung natürlicher Sprache dienen. Der Einsatz umfangreicher Textverarbeitung ermöglicht es, semantisch relevante Illustrationen für einfache Textteile sowie für komplette Handlungsstränge zu erzeugen. Im Detail wird ein Ansatz vorgestellt, der Abhängigkeiten in Textbeschreibungen auflöst, um 3D-Modelle korrekt anzuordnen. Des Weiteren wird eine Methode entwickelt die, basierend auf einem neuen hierarchischen Such-Anfrage Algorithmus, semantisch relevante Illustrationen zu Texten verschiedener Art findet. Schließlich wird ein optimierungsbasiertes Framework vorgestellt, das nicht nur semantisch relevante, sondern auch visuell kohärente Bildgeschichten in verschiedenen Bildstilen erzeugen kann

    Choreographing the extended agent : performance graphics for dance theater

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2005.Includes bibliographical references (v. 2, leaves 448-458).The marriage of dance and interactive image has been a persistent dream over the past decades, but reality has fallen far short of potential for both technical and conceptual reasons. This thesis proposes a new approach to the problem and lays out the theoretical, technical and aesthetic framework for the innovative art form of digitally augmented human movement. I will use as example works a series of installations, digital projections and compositions each of which contains a choreographic component - either through collaboration with a choreographer directly or by the creation of artworks that automatically organize and understand purely virtual movement. These works lead up to two unprecedented collaborations with two of the greatest choreographers working today; new pieces that combine dance and interactive projected light using real-time motion capture live on stage. The existing field of"dance technology" is one with many problems. This is a domain with many practitioners, few techniques and almost no theory; a field that is generating "experimental" productions with every passing week, has literally hundreds of citable pieces and no canonical works; a field that is oddly disconnected from modern dance's history, pulled between the practical realities of the body and those of computer art, and has no influence on the prevailing digital art paradigms that it consumes.(cont.) This thesis will seek to address each of these problems: by providing techniques and a basis for "practical theory"; by building artworks with resources and people that have never previously been brought together, in theaters and in front of audiences previously inaccessible to the field; and by proving through demonstration that a profitable and important dialogue between digital art and the pioneers of modern dance can in fact occur. The methodological perspective of this thesis is that of biologically inspired, agent-based artificial intelligence, taken to a high degree of technical depth. The representations, algorithms and techniques behind such agent architectures are extended and pushed into new territory for both interactive art and artificial intelligence. In particular, this thesis ill focus on the control structures and the rendering of the extended agents' bodies, the tools for creating complex agent-based artworks in intense collaborative situations, and the creation of agent structures that can span live image and interactive sound production. Each of these parts becomes an element of what it means to "choreograph" an extended agent for live performance.Marc Downie.Ph.D

    Mapping the evolving landscape of child-computer interaction research: structures and processes of knowledge (re)production

    Get PDF
    Implementing an iterative sequential mixed methods design (Quantitative → Qualitative → Quantitative) framed within a sociology of knowledge approach to discourse, this study offers an account of the structure of the field of Child-Computer Interaction (CCI), its development over time, and the practices through which researchers have (re)structured knowledge comprising the field. Thematic structure of knowledge within the field, and its evolution over time, is quantified through implementation of a Correlated Topic Model (CTM), an automated inductive content analysis method, in analysing 4,771 CCI research papers published between 2003 and 2021. Detailed understanding of practices through which researchers (re)structure knowledge within the field, including factors influencing these practices, is obtained through thematic analysis of online workshops involving prominent contributors to the field (n=7). Strategic practices utilised by researchers in negotiating tensions impeding integration of novel concepts in the field are investigated through analysis of semantic features of retrieved papers using linear and negative binomial regression models. Contributing an extensive mapping, results portray the field of CCI as a varied research landscape, comprising 48 major themes of study, which has evolved dynamically over time. Research priorities throughout the field have been subject to influence from a range of endogenous and exogenous factors which researchers actively negotiate through research and publication practices. Tacitly structuring research practices, these factors have broadly sustained a technology-driven, novelty-dominated paradigm throughout the field which has failed to substantively progress cumulative knowledge. Through strategic negotiation of persistent tensions arising as consequence of these factors, researchers have nonetheless affected structural change within the field, contributing to a shift towards a user needs-driven agenda and progression of knowledge therein. Findings demonstrate that the field of CCI is proceeding through an intermediary phase in maturation, forming an increasingly distinct disciplinary shape and identity through the cumulative structuring effect of community members’ continued negotiation of tensions

    Gaining Insight into Determinants of Physical Activity using Bayesian Network Learning

    Get PDF
    Contains fulltext : 228326pre.pdf (preprint version ) (Open Access) Contains fulltext : 228326pub.pdf (publisher's version ) (Open Access)BNAIC/BeneLearn 202

    Proceedings of the 21st International Congress of Aesthetics, Possible Worlds of Contemporary Aesthetics Aesthetics Between History, Geography and Media

    Get PDF
    The Faculty of Architecture, University of Belgrade and the Society for Aesthetics of Architecture and Visual Arts of Serbia (DEAVUS) are proud to be able to organize the 21st ICA Congress on “Possible Worlds of Contemporary Aesthetics: Aesthetics Between History, Geography and Media”. We are proud to announce that we received over 500 submissions from 56 countries, which makes this Congress the greatest gathering of aestheticians in this region in the last 40 years. The ICA 2019 Belgrade aims to map out contemporary aesthetics practices in a vivid dialogue of aestheticians, philosophers, art theorists, architecture theorists, culture theorists, media theorists, artists, media entrepreneurs, architects, cultural activists and researchers in the fields of humanities and social sciences. More precisely, the goal is to map the possible worlds of contemporary aesthetics in Europe, Asia, North and South America, Africa and Australia. The idea is to show, interpret and map the unity and diverseness in aesthetic thought, expression, research, and philosophies on our shared planet. Our goal is to promote a dialogue concerning aesthetics in those parts of the world that have not been involved with the work of the International Association for Aesthetics to this day. Global dialogue, understanding and cooperation are what we aim to achieve. That said, the 21st ICA is the first Congress to highlight the aesthetic issues of marginalised regions that have not been fully involved in the work of the IAA. This will be accomplished, among others, via thematic round tables discussing contemporary aesthetics in East Africa and South America. Today, aesthetics is recognized as an important philosophical, theoretical and even scientific discipline that aims at interpreting the complexity of phenomena in our contemporary world. People rather talk about possible worlds or possible aesthetic regimes rather than a unique and consistent philosophical, scientific or theoretical discipline

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered

    Premodern Experience of the Natural World in Translation

    Get PDF

    Premodern Experience of the Natural World in Translation

    Get PDF
    This innovative collection showcases the importance of the relationship between translation and experience in premodern science, bringing together an interdisciplinary group of scholars to offer a nuanced understanding of knowledge transfer across premodern time and space. The volume considers experience as a tool and object of science in the premodern world, using this idea as a jumping-off point from which to view translation as a process of interaction between diff erent epistemic domains. The book is structured around four dimensions of translation—between terms within and across languages; across sciences and scientific norms; between verbal and visual systems; and through the expertise of practitioners and translators—which raise key questions on what constituted experience of the natural world in the premodern area and the impact of translation processes and agents in shaping experience. Providing a wide-ranging global account of historical studies on the travel and translation of experience in the premodern world, this book will be of interest to scholars in history, the history of translation, and the history and philosophy of science
    corecore