727 research outputs found

    Pathway to Future Symbiotic Creativity

    Full text link
    This report presents a comprehensive view of our vision on the development path of the human-machine symbiotic art creation. We propose a classification of the creative system with a hierarchy of 5 classes, showing the pathway of creativity evolving from a mimic-human artist (Turing Artists) to a Machine artist in its own right. We begin with an overview of the limitations of the Turing Artists then focus on the top two-level systems, Machine Artists, emphasizing machine-human communication in art creation. In art creation, it is necessary for machines to understand humans' mental states, including desires, appreciation, and emotions, humans also need to understand machines' creative capabilities and limitations. The rapid development of immersive environment and further evolution into the new concept of metaverse enable symbiotic art creation through unprecedented flexibility of bi-directional communication between artists and art manifestation environments. By examining the latest sensor and XR technologies, we illustrate the novel way for art data collection to constitute the base of a new form of human-machine bidirectional communication and understanding in art creation. Based on such communication and understanding mechanisms, we propose a novel framework for building future Machine artists, which comes with the philosophy that a human-compatible AI system should be based on the "human-in-the-loop" principle rather than the traditional "end-to-end" dogma. By proposing a new form of inverse reinforcement learning model, we outline the platform design of machine artists, demonstrate its functions and showcase some examples of technologies we have developed. We also provide a systematic exposition of the ecosystem for AI-based symbiotic art form and community with an economic model built on NFT technology. Ethical issues for the development of machine artists are also discussed

    The grammar of immersion: a social semiotic study of nonfiction cinematic virtual reality

    Get PDF
    Cinematic virtual reality (CVR) is an audio-visual form viewed in a virtual reality headset. Its novelty lies in the way it immerses its audience in highly realistic 360° visual representations. Being camera-based, CVR facilitates many of the practices of conventional filmmaking but fundamentally alters them through its lack of a rectangular frame. As such, CVR has garnered scholarly attention as a ‘frameless’ storytelling medium yet to develop its own language. The form has gained traction with producers of nonfiction who recognize CVR’s capacity to transport audiences to remote social worlds, leading to claims that equate CVR’s immersion with a social and emotional response to its filmed subjects. A strand of CVR scholarship has emerged, grounding nonfiction CVR theoretically and critiquing such deterministic claims. Broadly speaking, these parallel strands of inquiry point to a common concern with CVR’s semiotics; as the meaning potential of the 360° format, and the social aspects of its use in documenting reality. Currently however, there appears to be a lack of systematic analyses that foreground CVR’s semiotics. This study addresses this gap by using social semiotic methods to complement these threads of inquiry, subsuming them into a holistic account of CVR’s semantics. Utilizing systemic functional methods, multimodal discourse analyses were performed on nonfiction CVR texts addressing core research objectives. The first objective is the systematic description of CVR as a semiotic technology, and the configuring of discourse through its novel 360° modality. The CVR spectator is described for their role in the real-time construction of low-level meanings. Higher-level concepts further characterize CVR texts as technologically enabled, virtual sites of social discourse. The second research objective concerns clarifying the implications of CVR for nonfiction practitioners. Nonfiction discourse is conceptualized as the negotiation of semiotic autonomy, independence, and control, between viewing spectator, filmed subject, and CVR author respectively. The third objective concerns the development of an analytical approach tailored specifically for CVR. Extant systems from image, text, film, and action analyses are reflexively applied, appraised, and adapted for use in the study of CVR and new frames are presented to cater for the 360° modality. The findings show CVR to be an inherently logical, contextualizing form, where the spectator has a degree of sense-making autonomy in the construction of representational and social meanings. This semantic autonomy is found to camouflage the deeper textual constructions in what appear as ‘reality experiences’. The repercussions for the CVR producer are the indeterminacy of meanings which are ‘at risk’ in particular ways when conventional framing methods cannot be utilized, and when the spectator is given reflexive agency to make meaningful connections across the 360° image. Systemic functional analytical methods prove flexible enough to be applied to the texts, and open enough for the study to present additional systems and frames for a more fulsome approach to the analysis of CVR

    Efficient image-based rendering

    Get PDF
    Recent advancements in real-time ray tracing and deep learning have significantly enhanced the realism of computer-generated images. However, conventional 3D computer graphics (CG) can still be time-consuming and resource-intensive, particularly when creating photo-realistic simulations of complex or animated scenes. Image-based rendering (IBR) has emerged as an alternative approach that utilizes pre-captured images from the real world to generate realistic images in real-time, eliminating the need for extensive modeling. Although IBR has its advantages, it faces challenges in providing the same level of control over scene attributes as traditional CG pipelines and accurately reproducing complex scenes and objects with different materials, such as transparent objects. This thesis endeavors to address these issues by harnessing the power of deep learning and incorporating the fundamental principles of graphics and physical-based rendering. It offers an efficient solution that enables interactive manipulation of real-world dynamic scenes captured from sparse views, lighting positions, and times, as well as a physically-based approach that facilitates accurate reproduction of the view dependency effect resulting from the interaction between transparent objects and their surrounding environment. Additionally, this thesis develops a visibility metric that can identify artifacts in the reconstructed IBR images without observing the reference image, thereby contributing to the design of an effective IBR acquisition pipeline. Lastly, a perception-driven rendering technique is developed to provide high-fidelity visual content in virtual reality displays while retaining computational efficiency.JĂŒngste Fortschritte im Bereich Echtzeit-Raytracing und Deep Learning haben den Realismus computergenerierter Bilder erheblich verbessert. Konventionelle 3DComputergrafik (CG) kann jedoch nach wie vor zeit- und ressourcenintensiv sein, insbesondere bei der Erstellung fotorealistischer Simulationen von komplexen oder animierten Szenen. Das bildbasierte Rendering (IBR) hat sich als alternativer Ansatz herauskristallisiert, bei dem vorab aufgenommene Bilder aus der realen Welt verwendet werden, um realistische Bilder in Echtzeit zu erzeugen, so dass keine umfangreiche Modellierung erforderlich ist. Obwohl IBR seine Vorteile hat, ist es eine Herausforderung, das gleiche Maß an Kontrolle ĂŒber Szenenattribute zu bieten wie traditionelle CG-Pipelines und komplexe Szenen und Objekte mit unterschiedlichen Materialien, wie z.B. transparente Objekte, akkurat wiederzugeben. In dieser Arbeit wird versucht, diese Probleme zu lösen, indem die Möglichkeiten des Deep Learning genutzt und die grundlegenden Prinzipien der Grafik und des physikalisch basierten Renderings einbezogen werden. Sie bietet eine effiziente Lösung, die eine interaktive Manipulation von dynamischen Szenen aus der realen Welt ermöglicht, die aus spĂ€rlichen Ansichten, Beleuchtungspositionen und Zeiten erfasst wurden, sowie einen physikalisch basierten Ansatz, der eine genaue Reproduktion des Effekts der SichtabhĂ€ngigkeit ermöglicht, der sich aus der Interaktion zwischen transparenten Objekten und ihrer Umgebung ergibt. DarĂŒber hinaus wird in dieser Arbeit eine Sichtbarkeitsmetrik entwickelt, mit der Artefakte in den rekonstruierten IBR-Bildern identifiziert werden können, ohne das Referenzbild zu betrachten, und die somit zur Entwicklung einer effektiven IBR-Erfassungspipeline beitrĂ€gt. Schließlich wird ein wahrnehmungsgesteuertes Rendering-Verfahren entwickelt, um visuelle Inhalte in Virtual-Reality-Displays mit hoherWiedergabetreue zu liefern und gleichzeitig die Rechenleistung zu erhalten

    50 Years of quantum chromodynamics – Introduction and Review

    Get PDF

    Representations of girlhood trauma in Aotearoa, New Zealand literature written by women

    Get PDF
    In “Representations of Girlhood Trauma in Aotearoa New Zealand Literature Written by Women”, I investigate the way literary genres affect trauma-telling and how culturally sensitive forms of trauma-reading allow girl trauma-tellers to be heard and not re-traumatised by a patriarchal and colonial interpretation of their pain. I analyse about twenty texts by women writers of diverse ethnic descents, encompassing Aotearoa’s four main ethnic groups (Pākehā [also called European New Zealanders], Māori, Asian, and Pasifika), to illustrate New Zealand’s contemporary multiculturalism and multilingualism ‒ an approach which contests a Western-imported, male-dominated, and Freudian-inspired reading of minorities’ cultural traumas. Following Dominick LaCapra’s theory, I argue that the signing of the Treaty of Waitangi on 6 February 1840 can be understood as a foundational trauma for the Indigenous people of Aotearoa. This document is at the root of their intergenerational cultural trauma as it led to massive land confiscation, the loss of their sovereignty, and their forced assimilation into a Western way of life. As New Zealand is still a settler colony today, women writers employ feminist and – especially when they are of non-Pākehā descent – decolonial trauma-telling devices to formulate traumatic narratives which otherwise would remain unspeakable. The five chapters of the thesis are each dedicated to a literary genre: life writing, poetry, fictional diaries and the epistolary mode, the female Bildungsroman, and young adult fiction. To analyse the interpersonal, intergenerational, transgenerational, and/or vicarious forms of girlhood trauma expressed in the corpus, I create a dialogue between the literary field of Trauma Studies founded at Yale University in the 1990s and four New Zealand-based trauma-reading traditions: Mason Durie’s theory of te whare tapa whā (the four-walled house) which is a holistic approach to Māori health as physical, mental, spiritual, and familial health are intertwined; David Epston’s and Michael White’s research on Narrative Therapy; Charles Waldegrave’s and Kiwi Tamasese’s work on Just Therapy; as well as Linda and Mark Kopua’s storytelling practices during Mahi a atua (healing with ancestors) sessions

    Downstream Task Self-Supervised Learning for Object Recognition and Tracking

    Get PDF
    This dissertation addresses three limitations of deep learning methods in image and video understanding-based machine vision applications. Firstly, although deep convolutional neural networks (CNNs) are efficient for image recognition applications such as object detection and segmentation, they perform poorly under perspective distortions. In real-world applications, the camera perspective is a common problem that we can address by annotating large amounts of data, thus limiting the applicability of the deep learning models. Secondly, the typical approach for single-camera tracking problems is to use separate motion and appearance models, which are expensive in terms of computations and training data requirements. Finally, conventional multi-camera video understanding techniques use supervised learning algorithms to determine temporal relationships among objects. In large-scale applications, these methods are also limited by the requirement of extensive manually annotated data and computational resources.To address these limitations, we develop an uncertainty-aware self-supervised learning (SSL) technique that captures a model\u27s instance or semantic segmentation uncertainty from overhead images and guides the model to learn the impact of the new perspective on object appearance. The test-time data augmentation-based pseudo-label refinement technique continuously trains a model until convergence on new perspective images. The proposed method can be applied for both self-supervision and semi-supervision, thus increasing the effectiveness of a deep pre-trained model in new domains. Extensive experiments demonstrate the effectiveness of the SSL technique in both object detection and semantic segmentation problems. In video understanding applications, we introduce simultaneous segmentation and tracking as an unsupervised spatio-temporal latent feature clustering problem. The jointly learned multi-task features leverage the task-dependent uncertainty to generate discriminative features in multi-object videos. Experiments have shown that the proposed tracker outperforms several state-of-the-art supervised methods. Finally, we proposed an unsupervised multi-camera tracklet association (MCTA) algorithm to track multiple objects in real-time. MCTA leverages the self-supervised detector model for single-camera tracking and solves the multi-camera tracking problem using multiple pair-wise camera associations modeled as a connected graph. The graph optimization method generates a global solution for partially or fully overlapping camera networks

    Goal-seeking compresses neural codes for space in the human hippocampus and orbitofrontal cortex

    Get PDF
    Humans can navigate flexibly to meet their goals. Here, we asked how the neural representation of allocentric space is distorted by goal-directed behavior. Participants navigated an agent to two successive goal locations in a grid world environment comprising four interlinked rooms, with a contextual cue indicating the conditional dependence of one goal location on another. Examining the neural geometry by which room and context were encoded in fMRI signals, we found that map-like representations of the environment emerged in both hippocampus and neocortex. Cognitive maps in hippocampus and orbitofrontal cortices were compressed so that locations cued as goals were coded together in neural state space, and these distortions predicted successful learning. This effect was captured by a computational model in which current and prospective locations are jointly encoded in a place code, providing a theory of how goals warp the neural representation of space in macroscopic neural signals
    • 

    corecore