1,529 research outputs found

    Live Coding as a Model for Cultural Practice & Cultural-Epistemological Aspects of Live Coding

    Get PDF
    This report documents the program and the outcomes of Dagstuhl Seminar 13382 “Collaboration and learning through live coding”. Live coding is improvised interactive programming, typically to create electronic music and other digital media, done live with an audience. Our seminar was motivated by the phenomenon and experience of live coding. Our conviction was that those represent an important and broad, but seldom articulated, set of opportunities for computer science and the arts and humanities. The seminar participants included a broad range of scholars, researchers, and practitioners spanning fields from music theory to software engineering. We held live coding performances, and facilitated discussions on three main perspectives, the humanities, computing education, and software engineering. The main outcome of our seminar was better understanding of the potential of live coding for informing cross-disciplinary scholarship and practice, connecting the arts, cultural studies, and computing. The report is edited by Alan Blackwell and Alex McLean and James Noble and Julian Rohrhuber

    Combining visual recognition and computational linguistics : linguistic knowledge for visual recognition and natural language descriptions of visual content

    Get PDF
    Extensive efforts are being made to improve visual recognition and semantic understanding of language. However, surprisingly little has been done to exploit the mutual benefits of combining both fields. In this thesis we show how the different fields of research can profit from each other. First, we scale recognition to 200 unseen object classes and show how to extract robust semantic relatedness from linguistic resources. Our novel approach extends zero-shot to few shot recognition and exploits unlabeled data by adopting label propagation for transfer learning. Second, we capture the high variability but low availability of composite activity videos by extracting the essential information from text descriptions. For this we recorded and annotated a corpus for fine-grained activity recognition. We show improvements in a supervised case but we are also able to recognize unseen composite activities. Third, we present a corpus of videos and aligned descriptions. We use it for grounding activity descriptions and for learning how to automatically generate natural language descriptions for a video. We show that our proposed approach is also applicable to image description and that it outperforms baselines and related work. In summary, this thesis presents a novel approach for automatic video description and shows the benefits of extracting linguistic knowledge for object and activity recognition as well as the advantage of visual recognition for understanding activity descriptions.Trotz umfangreicher Anstrengungen zur Verbesserung der die visuelle Erkennung und dem automatischen Verständnis von Sprache, ist bisher wenig getan worden, um diese beiden Forschungsbereiche zu kombinieren. In dieser Dissertation zeigen wir, wie beide voneinander profitieren können. Als erstes skalieren wir Objekterkennung zu 200 ungesehen Klassen und zeigen, wie man robust semantische Ähnlichkeiten von Sprachressourcen extrahiert. Unser neuer Ansatz kombiniert Transfer und halbüberwachten Lernverfahren und kann so Daten ohne Annotation ausnutzen und mit keinen als auch mit wenigen Trainingsbeispielen auskommen. Zweitens erfassen wir die hohe Variabilität aber geringe Verfügbarkeit von Videos mit zusammengesetzten Aktivitäten durch Extraktion der wesentlichen Informationen aus Textbeschreibungen. Wir verbessern überwachtes Training als auch die Erkennung von ungesehenen Aktivitäten. Drittens stellen wir einen parallelen Datensatz von Videos und Beschreibungen vor. Wir verwenden ihn für Grounding von Aktivitätsbeschreibungen und um die automatische Generierung natürlicher Sprache für ein Video zu erlernen. Wir zeigen, dass sich unsere Ansatz auch für Bildbeschreibung einsetzten lässt und das er bisherige Ansätze übertrifft. Zusammenfassend stellt die Dissertation einen neuen Ansatz zur automatische Videobeschreibung vor und zeigt die Vorteile von sprachbasierten Ähnlichkeitsmaßen für die Objekt- und Aktivitätserkennung als auch umgekehrt

    Experience requirements

    Get PDF
    Video game development is a high-risk effort with low probability of success. The interactive nature of the resulting artifact increases production complexity, often doing so in ways that are unexpected. New methodologies are needed to address issues in this domain. Video game development has two major phases: preproduction and production. During preproduction, the game designer and other members of the creative team create and capture a vision of the intended player experience in the game design document. The game design document tells the story and describes the game - it does not usually explicitly elaborate all of the details of the intended player experience, particularly with respect to how the player is intended to feel as the game progresses. Details of the intended experience tend to be communicated verbally, on an as-needed basis during iterations of the production effort. During production, the software and media development teams attempt to realize the preproduction vision in a game artifact. However, the game design document is not traditionally intended to capture production-ready requirements, particularly for software development. As a result, there is a communications chasm between preproduction and production efforts that can lead to production issues such as excessive reliance on direct communication with the game designer, difficulty scoping project elements, and difficulty in determining reasonably accurate effort estimates. We posit that defining and capturing the intended player experience in a manner that is influenced and informed by established requirements engineering principles and techniques will help cross the communications chasm between preproduction and production. The proposed experience requirements methodology is a novel contribution composed of: a model for the elements that compose experience requirements, a framework that provides guidance for expressing experience requirements, and an exemplary process for the elicitation, capture, and negotiation of experience requirements. Experience requirements capture the designer' s intent for the user experience; they represent user experience goals for the artifact and constraints upon the implementation and are not expected to be formal in the mathematical sense. Experience requirements are evolutionary in intent - they incrementally enhance and extend existing practices in a relatively lightweight manner using language and representations that are intended to be mutually acceptable to preproduction and to production

    TOWARDS BUILDING AN INTELLIGENT INTEGRATED MULTI-MODE TIME DIARY SURVEY FRAMEWORK

    Get PDF
    Enabling true responses is an important characteristic in surveys; where the responses are free from bias and satisficing. In this thesis, we examine the current state of surveys, briefly touching upon questionnaire surveys, and then on time diary surveys (TDS). TDS are open-ended conversational surveys of a free-form nature with both, the interviewer and the respondent, playing a part in its progress and successful completion. With limited research available on how intelligent and assistive components can affect TDS respondents, we explore ways in which intelligent systems such as Computer Adaptive Testing, Intelligent Tutoring Systems, Recommender Systems, and Decision Support Systems can be leveraged for use in TDS. The motivation for this work is from realizing the opportunity that an enhanced web based instrument can offer the survey domain to unite the various facets of web based surveys to create an intelligent integrated multi-mode TDS framework. We envision the framework to provide all the advantages of web based surveys and interviewer assisted surveys. The two primary challenges are in determining what data is to be used by the system and how to interact with the user – specifically integrating the (1) Interviewer-assisted mode, and (2) Self-administered mode. Our proposed solution – the intelligent integrated multi-mode framework – is essentially the solution to a set of modeling problems and we propose two sets of overreaching mechanisms: (1) Knowledge Engineering Mechanisms (KEM), and (2) Interaction Mechanisms (IxM), where KEM serves the purpose of understanding what data can be created, used and stored while IxM deals with interacting with the user. We build and study a prototype instrument in the interviewer-assisted mode based on the framework. We are able to determine that the instrument improves the interview process as intended and increases the data quality of the response data and is able to assist the interviewer. We also observe that the framework’s mechanisms contribute towards reducing interviewers’ cognitive load, data entry times and interview time by predicting the next activity. Advisor: Leenkiat So

    The Visual Novel: Fictional Space and Print After 1900

    Get PDF
    My dissertation, The Visual Novel: Fictional Space and Print After 1900, examines how and why the novel has assimilated visual mediums—film, art, and the digital—into the genre as a means of adapting to the proliferation of mass media and technology. This project connects a history of the novel (genre) with a history of the book (the genre’s physical form), thereby theorizing and narrating a history of the visual novel. I demonstrate that through fictional space, a critical term used by narratologists and textual studies scholars, visual writing emerges as a hybridized mode of creative composition where we can see most vividly the relationship between author, text, and reader. Characterized by the use of eccentric typography, nonstandard design, and experimental layout, the visual novel relies on text as image by defamiliarizing readerly expectations of print, type, and page space by assimilating composition techniques from the visual arts such as montage and collage. I argue that the visual novel’s multimodality expands definitions of “novel” and “narrative” through a discussion of British and American writers—A. S. Byatt, John Dos Passos, Steven Hall, and James Joyce, as well as contemporary small press editions of works by Laurence Sterne and Oscar Wilde. At times when screen culture advances print’s obsolescence, both historically and more recently, visual writing makes print predominant in the media ecology once again by drawing upon the very technologies that threaten it, and my dissertation responds to this recurrent milieu by arguing that these novelists utilize self-reflexive techniques to create works that actualize print’s potential and the novel’s flexibility

    Toward Relational Craft

    Get PDF
    This thesis report contends with the contemporary context of studio craft here in the USA. Studio craft is predicated on habits of a modern world, and the political and economic relations comprising our social landscape have left the viability of those habits behind. The framework of studio craft, i.e. the politics that form its enactment, is necessarily shifting with the pressures of the environment. This project identifies this shift as an opportunity to recognize and intervene in these politics, and thus craft's reproductive power. This report focuses particularly on craft's process of political reproduction as an ideological one, and thus an enabler of the status relations of its contemporary place. In addition, as an ideological process, whether they win or lose in the status reproduction game craft's participants are left unable to account for the relations they promulgate. This report identifies the characteristics of craft's modernist mode, including the dysfunctions buried in the practice of its forms. The report articulates frames for the shifting contemporary landscape that impose new pressures on craft. The report then moves toward modes of intervention in the political landscape that craft both is produced by and reproduces. The field of craft is conventionally assumed as an object moving on its own terms through the world. This report finds instead the field of craft to be an activity, i.e. the sharing of sensibilities which enable action, and those sensibilities to be rooted in the political economic relations of its location. The force of this report then, in interventional terms, is to locate craft's material effects within the site of those relations. Here we can account, and thus take responsibility, for the habits of craft's enactment.  M.F.A

    BRIDGING THE SEMANTIC GAP : IMAGE AND VIDEO UNDERSTANDING BY EXPLOITING ATTRIBUTES

    Get PDF
    Understanding image and video is one of the fundamental problems in the field of computer vision. Traditionally, the research in this area focused on extracting low level features from images and videos and learning classifiers to categorize these features to pre-defined classes of objects, scenes or activities. However, it is well known that there exists a ``semantic gap'' between low level features and high level semantic concepts, which greatly obstructs the progress of research on image and video understanding. Our work departs from the traditional view of image and video understanding in that we add a middle layer between high level concepts and low level features, which is called as attribute, and use this layer to facilitate the description of concepts and detection of entities from images and videos. On one hand, attributes are relatively simple and thus can be more reliably detected from the low level features; on the other hand, we can exploit high level knowledge about the relationship between the attributes and the high level concepts and the relationship among attributes, and therefore reduce the semantic gap. Our ideas are demonstrates in three applications as follows: First, we presented an attribute-based learning approach for object recognition, where attributes are used to transfer knowledge on object properties from known classes to unknown classes and consequently reduce the number of training examples needed to learn the new object classes. Next, we illustrate an active framework to recognize scenes based on the objects therein, which are considered as the attributes of the scenes. The active framework utilizes the correlation among objects in a scene and thus significantly reduces the number of objects to be detected in order to recognize the scene. Finally, we propose a novel approach to detect the activity attributes from sports videos, where the contextual constraints are explored to decrease the ambiguity in attribute detection. The activity attributes enable us to go beyond naming the activity categories and achieve a fine-grained description of the activities in the videos

    Data and methods for a visual understanding of sign languages

    Get PDF
    Signed languages are complete and natural languages used as the first or preferred mode of communication by millions of people worldwide. However, they, unfortunately, continue to be marginalized languages. Designing, building, and evaluating models that work on sign languages presents compelling research challenges and requires interdisciplinary and collaborative efforts. The recent advances in Machine Learning (ML) and Artificial Intelligence (AI) has the power to enable better accessibility to sign language users and narrow down the existing communication barrier between the Deaf community and non-sign language users. However, recent AI-powered technologies still do not account for sign language in their pipelines. This is mainly because sign languages are visual languages, that use manual and non-manual features to convey information, and do not have a standard written form. Thus, the goal of this thesis is to contribute to the development of new technologies that account for sign language by creating large-scale multimodal resources suitable for training modern data-hungry machine learning models and developing automatic systems that focus on computer vision tasks related to sign language that aims at learning better visual understanding of sign languages. Thus, in Part I, we introduce the How2Sign dataset, which is a large-scale collection of multimodal and multiview sign language videos in American Sign Language. In Part II, we contribute to the development of technologies that account for sign languages by presenting in Chapter 4 a framework called Spot-Align, based on sign spotting methods, to automatically annotate sign instances in continuous sign language. We further present the benefits of this framework and establish a baseline for the sign language recognition task on the How2Sign dataset. In addition to that, in Chapter 5 we benefit from the different annotations and modalities of the How2Sign to explore sign language video retrieval by learning cross-modal embeddings. Later in Chapter 6, we explore sign language video generation by applying Generative Adversarial Networks to the sign language domain and assess if and how well sign language users can understand automatically generated sign language videos by proposing an evaluation protocol based on How2Sign topics and English translationLes llengües de signes són llengües completes i naturals que utilitzen milions de persones de tot el món com mode de comunicació primer o preferit. Tanmateix, malauradament, continuen essent llengües marginades. Dissenyar, construir i avaluar tecnologies que funcionin amb les llengües de signes presenta reptes de recerca que requereixen d’esforços interdisciplinaris i col·laboratius. Els avenços recents en l’aprenentatge automàtic i la intel·ligència artificial (IA) poden millorar l’accessibilitat tecnològica dels signants, i alhora reduir la barrera de comunicació existent entre la comunitat sorda i les persones no-signants. Tanmateix, les tecnologies més modernes en IA encara no consideren les llengües de signes en les seves interfícies amb l’usuari. Això es deu principalment a que les llengües de signes són llenguatges visuals, que utilitzen característiques manuals i no manuals per transmetre informació, i no tenen una forma escrita estàndard. Els objectius principals d’aquesta tesi són la creació de recursos multimodals a gran escala adequats per entrenar models d’aprenentatge automàtic per a llengües de signes, i desenvolupar sistemes de visió per computador adreçats a una millor comprensió automàtica de les llengües de signes. Així, a la Part I presentem la base de dades How2Sign, una gran col·lecció multimodal i multivista de vídeos de la llengua de signes nord-americana. A la Part II, contribuïm al desenvolupament de tecnologia per a llengües de signes, presentant al capítol 4 una solució per anotar signes automàticament anomenada Spot-Align, basada en mètodes de localització de signes en seqüències contínues de signes. Després, presentem els avantatges d’aquesta solució i proporcionem uns primers resultats per la tasca de reconeixement de la llengua de signes a la base de dades How2Sign. A continuació, al capítol 5 aprofitem de les anotacions i diverses modalitats de How2Sign per explorar la cerca de vídeos en llengua de signes a partir de l’entrenament d’incrustacions multimodals. Finalment, al capítol 6, explorem la generació de vídeos en llengua de signes aplicant xarxes adversàries generatives al domini de la llengua de signes. Avaluem fins a quin punt els signants poden entendre els vídeos generats automàticament, proposant un nou protocol d’avaluació basat en les categories dins de How2Sign i la traducció dels vídeos a l’anglès escritLas lenguas de signos son lenguas completas y naturales que utilizan millones de personas de todo el mundo como modo de comunicación primero o preferido. Sin embargo, desgraciadamente, siguen siendo lenguas marginadas. Diseñar, construir y evaluar tecnologías que funcionen con las lenguas de signos presenta retos de investigación que requieren esfuerzos interdisciplinares y colaborativos. Los avances recientes en el aprendizaje automático y la inteligencia artificial (IA) pueden mejorar la accesibilidad tecnológica de los signantes, al tiempo que reducir la barrera de comunicación existente entre la comunidad sorda y las personas no signantes. Sin embargo, las tecnologías más modernas en IA todavía no consideran las lenguas de signos en sus interfaces con el usuario. Esto se debe principalmente a que las lenguas de signos son lenguajes visuales, que utilizan características manuales y no manuales para transmitir información, y carecen de una forma escrita estándar. Los principales objetivos de esta tesis son la creación de recursos multimodales a gran escala adecuados para entrenar modelos de aprendizaje automático para lenguas de signos, y desarrollar sistemas de visión por computador dirigidos a una mejor comprensión automática de las lenguas de signos. Así, en la Parte I presentamos la base de datos How2Sign, una gran colección multimodal y multivista de vídeos de lenguaje la lengua de signos estadounidense. En la Part II, contribuimos al desarrollo de tecnología para lenguas de signos, presentando en el capítulo 4 una solución para anotar signos automáticamente llamada Spot-Align, basada en métodos de localización de signos en secuencias continuas de signos. Después, presentamos las ventajas de esta solución y proporcionamos unos primeros resultados por la tarea de reconocimiento de la lengua de signos en la base de datos How2Sign. A continuación, en el capítulo 5 aprovechamos de las anotaciones y diversas modalidades de How2Sign para explorar la búsqueda de vídeos en lengua de signos a partir del entrenamiento de incrustaciones multimodales. Finalmente, en el capítulo 6, exploramos la generación de vídeos en lengua de signos aplicando redes adversarias generativas al dominio de la lengua de signos. Evaluamos hasta qué punto los signantes pueden entender los vídeos generados automáticamente, proponiendo un nuevo protocolo de evaluación basado en las categorías dentro de How2Sign y la traducción de los vídeos al inglés escrito.Postprint (published version

    Data and methods for a visual understanding of sign languages

    Get PDF
    Signed languages are complete and natural languages used as the first or preferred mode of communication by millions of people worldwide. However, they, unfortunately, continue to be marginalized languages. Designing, building, and evaluating models that work on sign languages presents compelling research challenges and requires interdisciplinary and collaborative efforts. The recent advances in Machine Learning (ML) and Artificial Intelligence (AI) has the power to enable better accessibility to sign language users and narrow down the existing communication barrier between the Deaf community and non-sign language users. However, recent AI-powered technologies still do not account for sign language in their pipelines. This is mainly because sign languages are visual languages, that use manual and non-manual features to convey information, and do not have a standard written form. Thus, the goal of this thesis is to contribute to the development of new technologies that account for sign language by creating large-scale multimodal resources suitable for training modern data-hungry machine learning models and developing automatic systems that focus on computer vision tasks related to sign language that aims at learning better visual understanding of sign languages. Thus, in Part I, we introduce the How2Sign dataset, which is a large-scale collection of multimodal and multiview sign language videos in American Sign Language. In Part II, we contribute to the development of technologies that account for sign languages by presenting in Chapter 4 a framework called Spot-Align, based on sign spotting methods, to automatically annotate sign instances in continuous sign language. We further present the benefits of this framework and establish a baseline for the sign language recognition task on the How2Sign dataset. In addition to that, in Chapter 5 we benefit from the different annotations and modalities of the How2Sign to explore sign language video retrieval by learning cross-modal embeddings. Later in Chapter 6, we explore sign language video generation by applying Generative Adversarial Networks to the sign language domain and assess if and how well sign language users can understand automatically generated sign language videos by proposing an evaluation protocol based on How2Sign topics and English translationLes llengües de signes són llengües completes i naturals que utilitzen milions de persones de tot el món com mode de comunicació primer o preferit. Tanmateix, malauradament, continuen essent llengües marginades. Dissenyar, construir i avaluar tecnologies que funcionin amb les llengües de signes presenta reptes de recerca que requereixen d’esforços interdisciplinaris i col·laboratius. Els avenços recents en l’aprenentatge automàtic i la intel·ligència artificial (IA) poden millorar l’accessibilitat tecnològica dels signants, i alhora reduir la barrera de comunicació existent entre la comunitat sorda i les persones no-signants. Tanmateix, les tecnologies més modernes en IA encara no consideren les llengües de signes en les seves interfícies amb l’usuari. Això es deu principalment a que les llengües de signes són llenguatges visuals, que utilitzen característiques manuals i no manuals per transmetre informació, i no tenen una forma escrita estàndard. Els objectius principals d’aquesta tesi són la creació de recursos multimodals a gran escala adequats per entrenar models d’aprenentatge automàtic per a llengües de signes, i desenvolupar sistemes de visió per computador adreçats a una millor comprensió automàtica de les llengües de signes. Així, a la Part I presentem la base de dades How2Sign, una gran col·lecció multimodal i multivista de vídeos de la llengua de signes nord-americana. A la Part II, contribuïm al desenvolupament de tecnologia per a llengües de signes, presentant al capítol 4 una solució per anotar signes automàticament anomenada Spot-Align, basada en mètodes de localització de signes en seqüències contínues de signes. Després, presentem els avantatges d’aquesta solució i proporcionem uns primers resultats per la tasca de reconeixement de la llengua de signes a la base de dades How2Sign. A continuació, al capítol 5 aprofitem de les anotacions i diverses modalitats de How2Sign per explorar la cerca de vídeos en llengua de signes a partir de l’entrenament d’incrustacions multimodals. Finalment, al capítol 6, explorem la generació de vídeos en llengua de signes aplicant xarxes adversàries generatives al domini de la llengua de signes. Avaluem fins a quin punt els signants poden entendre els vídeos generats automàticament, proposant un nou protocol d’avaluació basat en les categories dins de How2Sign i la traducció dels vídeos a l’anglès escritLas lenguas de signos son lenguas completas y naturales que utilizan millones de personas de todo el mundo como modo de comunicación primero o preferido. Sin embargo, desgraciadamente, siguen siendo lenguas marginadas. Diseñar, construir y evaluar tecnologías que funcionen con las lenguas de signos presenta retos de investigación que requieren esfuerzos interdisciplinares y colaborativos. Los avances recientes en el aprendizaje automático y la inteligencia artificial (IA) pueden mejorar la accesibilidad tecnológica de los signantes, al tiempo que reducir la barrera de comunicación existente entre la comunidad sorda y las personas no signantes. Sin embargo, las tecnologías más modernas en IA todavía no consideran las lenguas de signos en sus interfaces con el usuario. Esto se debe principalmente a que las lenguas de signos son lenguajes visuales, que utilizan características manuales y no manuales para transmitir información, y carecen de una forma escrita estándar. Los principales objetivos de esta tesis son la creación de recursos multimodales a gran escala adecuados para entrenar modelos de aprendizaje automático para lenguas de signos, y desarrollar sistemas de visión por computador dirigidos a una mejor comprensión automática de las lenguas de signos. Así, en la Parte I presentamos la base de datos How2Sign, una gran colección multimodal y multivista de vídeos de lenguaje la lengua de signos estadounidense. En la Part II, contribuimos al desarrollo de tecnología para lenguas de signos, presentando en el capítulo 4 una solución para anotar signos automáticamente llamada Spot-Align, basada en métodos de localización de signos en secuencias continuas de signos. Después, presentamos las ventajas de esta solución y proporcionamos unos primeros resultados por la tarea de reconocimiento de la lengua de signos en la base de datos How2Sign. A continuación, en el capítulo 5 aprovechamos de las anotaciones y diversas modalidades de How2Sign para explorar la búsqueda de vídeos en lengua de signos a partir del entrenamiento de incrustaciones multimodales. Finalmente, en el capítulo 6, exploramos la generación de vídeos en lengua de signos aplicando redes adversarias generativas al dominio de la lengua de signos. Evaluamos hasta qué punto los signantes pueden entender los vídeos generados automáticamente, proponiendo un nuevo protocolo de evaluación basado en las categorías dentro de How2Sign y la traducción de los vídeos al inglés escrito.Teoria del Senyal i Comunicacion
    corecore