809 research outputs found

    An Interactive System for Generating Music from Moving Images

    Get PDF
    Moving images contain a wealth of information pertaining to motion. Motivated by the interconnectedness of music and movement, we present a framework for transforming the kinetic qualities of moving images into music. We developed an interactive software system that takes video as input and maps its motion attributes into the musical dimension based on perceptually grounded principles. The system combines existing sonification frameworks with theories and techniques of generative music. To evaluate the system, we conducted a two-part experiment. First, we asked participants to make judgements on video-audio correspondence from clips generated by the system. Second, we asked participants to give ratings for audiovisual works created using the system. These experiments revealed that 1) the system is able to generate music with a significant level of perceptual correspondence to the source video’s motion and 2) the system can effectively be used as an artistic tool for generative composition

    Data-driven deep-learning methods for the accelerated simulation of Eulerian fluid dynamics

    Get PDF
    Deep-learning (DL) methods for the fast inference of the temporal evolution of fluid-dynamics systems, based on the previous recognition of features underlying large sets of fluid-dynamics data, have been studied. Specifically, models based on convolution neural networks (CNNs) and graph neural networks (GNNs) were proposed and discussed. A U-Net, a popular fully-convolutional architecture, was trained to infer wave dynamics on liquid surfaces surrounded by walls, given as input the system state at previous time-points. A term for penalising the error of the spatial derivatives was added to the loss function, which resulted in a suppression of spurious oscillations and a more accurate location and length of the predicted wavefronts. This model proved to accurately generalise to complex wall geometries not seen during training. As opposed to the image data-structures processed by CNNs, graphs offer higher freedom on how data is organised and processed. This motivated the use of graphs to represent the state of fluid-dynamic systems discretised by unstructured sets of nodes, and GNNs to process such graphs. Graphs have enabled more accurate representations of curvilinear geometries and higher resolution placement exclusively in areas where physics is more challenging to resolve. Two novel GNN architectures were designed for fluid-dynamics inference: the MuS-GNN, a multi-scale GNN, and the REMuS-GNN, a rotation-equivariant multi-scale GNN. Both architectures work by repeatedly passing messages from each node to its nearest nodes in the graph. Additionally, lower-resolutions graphs, with a reduced number of nodes, are defined from the original graph, and messages are also passed from finer to coarser graphs and vice-versa. The low-resolution graphs allowed for efficiently capturing physics encompassing a range of lengthscales. Advection and fluid flow, modelled by the incompressible Navier-Stokes equations, were the two types of problems used to assess the proposed GNNs. Whereas a single-scale GNN was sufficient to achieve high generalisation accuracy in advection simulations, flow simulation highly benefited from an increasing number of low-resolution graphs. The generalisation and long-term accuracy of these simulations were further improved by the REMuS-GNN architecture, which processes the system state independently of the orientation of the coordinate system thanks to a rotation-invariant representation and carefully designed components. To the best of the author’s knowledge, the REMuS-GNN architecture was the first rotation-equivariant and multi-scale GNN. The simulations were accelerated between one (in a CPU) and three (in a GPU) orders of magnitude with respect to a CPU-based numerical solver. Additionally, the parallelisation of multi-scale GNNs resulted in a close-to-linear speedup with the number of CPU cores or GPUs.Open Acces

    Towards Object-Centric Scene Understanding

    Get PDF
    Visual perception for autonomous agents continues to attract community attention due to the disruptive technologies and the wide applicability of such solutions. Autonomous Driving (AD), a major application in this domain, promises to revolutionize our approach to mobility while bringing critical advantages in limiting accident fatalities. Fueled by recent advances in Deep Learning (DL), more computer vision tasks are being addressed using a learning paradigm. Deep Neural Networks (DNNs) succeeded consistently in pushing performances to unprecedented levels and demonstrating the ability of such approaches to generalize to an increasing number of difficult problems, such as 3D vision tasks. In this thesis, we address two main challenges arising from the current approaches. Namely, the computational complexity of multi-task pipelines, and the increasing need for manual annotations. On the one hand, AD systems need to perceive the surrounding environment on different levels of detail and, subsequently, take timely actions. This multitasking further limits the time available for each perception task. On the other hand, the need for universal generalization of such systems to massively diverse situations requires the use of large-scale datasets covering long-tailed cases. Such requirement renders the use of traditional supervised approaches, despite the data readily available in the AD domain, unsustainable in terms of annotation costs, especially for 3D tasks. Driven by the AD environment nature and the complexity dominated (unlike indoor scenes) by the presence of other scene elements (mainly cars and pedestrians) we focus on the above-mentioned challenges in object-centric tasks. We, then, situate our contributions appropriately in fast-paced literature, while supporting our claims with extensive experimental analysis leveraging up-to-date state-of-the-art results and community-adopted benchmarks

    INSAM Journal of Contemporary Music, Art and Technology 10 (I/2023)

    Get PDF
    Having in mind the foundational idea not only of our Journal but also the INSAM Institute itself, the main theme of this issue is titled “Technological Aspects of Contemporary Artistic and Scientific Research”. This theme was recognized as important, timely, and necessary by a number of authors coming from various disciplines. The (Inter)Views section brings us three diverse pieces; the issue is opened by Aida Adžović’s interview with the legendary Slovene act Laibach regarding their performance of the Wir sing das Volk project at the Sarajevo National Theater on May 9, 2023. Following this, Marija Mitrović prepared an interview with media artist Leon Eckard, concerning this artist’s views on contemporary art and the interaction between technology and human sensitivity. An essay by Alexander Liebermann on the early 20th-century composer Erwin Schulhoff, whose search for a unique personal voice could be encouraging in any given period, closes this rubric. The Main theme section contains seven scientific articles. In the first one, Filipa Magalhães, Inês Filipe, Mariana Silva and Henrique Carvalho explore the process and details of technological and artistic challenges of reviving the music theater work FE...DE...RI...CO... (1987) by Constança Capdeville. The second article, written by Milan Milojković, is dedicated to the analysis of historical composer Vojislav Vučković and his ChatGPT-generated doppelganger and opera. The fictional narrative woven around the actual historical figure served as an example of the current possibilities of AI in the domain of musicological work. In the next paper, Luís Arandas, Miguel Carvalhais and Mick Grierson expand on their work on the film Irreplaceable Biography, which was created via language-guided generative models in audiovisual production. Thomas Moore focuses on the Belgium-based Nadar Ensemble and discusses the ways in which the performers of the ensemble understand the concept of the integrated concert and distinguish themselves from it, specifying the broadening of performers’ competencies and responsibilities. In her paper, Dana Papachristou contributes to the discussion on the politics of connectivity based on the examination of three projects: the online project Xenakis Networked Performance Marathon 2022, 2023Eleusis Mystery 91_Magnetic Dance in Elefsina European Capital of Culture, and Spaces of Reflection offline PirateBox network in the 10th Berlin Biennale. The penultimate article in the section is written by Kenrick Ho and presents us with the author’s composition Flou for solo violin through the prism of the relationship between (historically present) algorithmic processes, the composer, and the performer. Finally, Rijad Kaniža adds to the critical discourse on the reshaping of the musical experience via technology and the understanding of said technology using the example of musique concrète. In the final Review section, Bakir Memišević gives an overview of the 13th International Symposium “Music in Society” that was held in Sarajevo in December 2022

    Animate Being: Extending a Practice of the Image to New Mediums via Speculative Game Design

    Get PDF
    This post-disciplinary practice as research thesis examines the potential of Carl Jung's therapeutic method of active imagination as a strategy for engaging with an increasingly complex and interconnected technological reality. Embracing a non-clinical, practice-driven approach, I harness James Hillman’s notion of the image and the imaginal to investigate the interdisciplinary capacity and ethical dimensions of an expansive mode of image-work. My approach to practice theoretically and practically intertwines analytical psychology, feminist worlding and design speculation. Building upon Susan Rowland’s work, I study image-work as an ecological alchemical craft that seeks to matter the immaterial. Through the cyclic iterative design of a video game, I mobilise and respond to image-work as a mode of myth-making that may facilitate dialogue between human and non-human intelligences. Departing from the essentialism of the hero's journey, I adopt Le Guin's Carrier Bag (1986/2019) as a feminist video game form and by utilising the framework of a video game (Bogost, 2007; Flannigan, 2013), the alchemical processes of image-work are transformed into novel interactive game mechanics. The game I design is both a vessel and a portal to an imaginal ecological realm, an open-world, procedurally generated ‘living world’ sandbox exploration game. This game integrates real-time, real-world data streams to invite the non-human to enter into play as player two, facilitating experimentation with possible new forms of cross-species dialogue, collaboration, and healing

    Mixed Reality Annotation of Robotic-Assisted Surgery videos with real- time tracking and stereo matching

    Get PDF
    Robotic-Assisted Surgery (RAS) is beginning to unlock its potential. However, despite the latest advances in RAS, the steep learning curve of RAS devices remains a problem. A common teaching resource in surgery is the use of videos of previous procedures, which in RAS are almost always stereoscopic. It is important to be able to add virtual annotations onto these videos so that certain elements of the surgical process are tracked and highlighted during the teaching session. Including virtual annotations in stereoscopic videos turns them into Mixed Reality (MR) experiences, in which tissues, tools and procedures are better observed. However, an MR-based annotation of objects requires tracking and some kind of depth estimation. For this reason, this paper proposes a real-time hybrid tracking–matching method for performing virtual annotations on RAS videos. The proposed method is hybrid because it combines tracking and stereo matching, avoiding the need to calculate the real depth of the pixels. The method was tested with six different state-of-the-art trackers and assessed with videos of a sigmoidectomy of a sigma neoplasia, performed with a Da Vinci® X surgical system. Objective assessment metrics are proposed, presented and calculated for the different solutions. The results show that the method can successfully annotate RAS videos in real-time. Of all the trackers tested for the presented method, the CSRT (Channel and Spatial Reliability Tracking) tracker seems to be the most reliable and robust in terms of tracking capabilities. In addition, in the absence of an absolute ground truth, an assessment with a domain expert using a novel continuous-rating method with an Oculus Quest 2 Virtual Reality device was performed, showing that the depth perception of the virtual annotations is good, despite the fact that no absolute depth values are calculated

    Offene-Welt-Strukturen: Architektur, Stadt- und Naturlandschaft im Computerspiel

    Get PDF
    Welche Rolle spielen Algorithmen für den Bildbau und die Darstellung von Welt und Wetter in Computerspielen? Wie beeinflusst die Gestaltung der Räume, Level und Topografien die Entscheidungen und das Verhalten der Spieler_innen? Ist der Brutalismus der erste genuine Architekturstil der Computerspiele? Welche Bedeutung haben Landschaftsgärten und Nationalparks im Strukturieren von Spielwelten? Wie wird Natur in Zeiten des Klimawandels dargestellt? Insbesondere in den letzten 20 Jahren adaptieren digitale Spielwelten akribischer denn je Merkmale der physisch-realen Welt. Durch aufwändige Produktionsverfahren und komplexe Visualisierungsstrategien wird die Angleichung an unsere übrige Alltagswelt stets in Abhängigkeit von Spielmechanik und Weltlichkeit erzeugt. Wie sich spätestens am Beispiel der Open-World-Spiele zeigt, führt die Übernahme bestimmter Weltbilder und Bildtraditionen zu ideologischen Implikationen, die weit über die bisher im Fokus der Forschung stehenden, aus anderen Medienformaten transferierten Erzählkonventionen hinausgehen. Mit seiner Theorie der Architektur als medialem Scharnier legt der Autor offen, dass digitale Spielwelten medienspezifische Eigenschaften aufweisen, die bisher nicht zu greifen waren und der Erforschung harrten. Durch Verschränken von Konzepten aus u.a. Medienwissenschaft, Game Studies, Philosophie, Architekturtheorie, Humangeografie, Landschaftstheorie und Kunstgeschichte erarbeitet Bonner ein transdisziplinäres Theoriemodell und ermöglicht anhand der daraus entwickelten analytischen Methoden erstmals, die komplexe Struktur heutiger Computerspiele - vom Indie Game bis zur AAA Open World - zu verstehen und zu benennen. Mit "Offene-Welt-Strukturen" wird die Architektonik digitaler Spielwelten umfassend zugänglich

    Intelligent Sensing and Learning for Advanced MIMO Communication Systems

    Get PDF

    Towards Generalizable Deep Image Matting: Decomposition, Interaction, and Merging

    Get PDF
    Image matting refers to extracting the precise alpha mattes from images, playing a critical role in many downstream applications. Despite extensive attention, key challenges persist and motivate the research presented in this thesis. One major challenge is the reliance of auxiliary inputs in previous methods, hindering real-time practicality. To address this, we introduce fully automatic image matting by decomposing the task into high-level semantic segmentation and low-level details matting. We then incorporate plug-in modules to enhance the interaction between the sub-tasks through feature integration. Furthermore, we propose an attention-based mechanism to guide the matting process through collaboration merging. Another challenge lies in limited matting datasets, resulting in reliance on composite images and inferior performance on images in the wild. In response, our research proposes a composition route to mitigate the discrepancies and result in remarkable generalization ability. Additionally, we construct numerous large datasets of high-quality real-world images with manually labeled alpha mattes, providing a solid foundation for training and evaluation. Moreover, our research uncovers new observations that warrant further investigation. Firstly, we systematically analyze and address privacy issues that have been neglected in previous portrait matting research. Secondly, we explore the adaptation of automatic matting methods to non-salient or transparent categories beyond salient ones. Furthermore, we collaborate with language modality to achieve a more controllable matting process, enabling specific target selection at a low cost. To validate our studies, we conduct extensive experiments and provide all codes and datasets through the link (https://github.com/JizhiziLi/). We believe that the analyses, methods, and datasets presented in this thesis will offer valuable insights for future research endeavors in the field of image matting
    corecore