132 research outputs found

    Exemplar-Based Image and Video Stylization Using Fully Convolutional Semantic Features

    Get PDF
    postprin

    AI-generated Content for Various Data Modalities: A Survey

    Full text link
    AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the demonstrated potential of recent works, AIGC developments have been attracting lots of attention recently, and AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape (as voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human avatar (body and head), 3D motion, and audio -- each presenting different characteristics and challenges. Furthermore, there have also been many significant developments in cross-modality AIGC methods, where generative methods can receive conditioning input in one modality and produce outputs in another. Examples include going from various modalities to image, video, 3D shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar), and audio modalities. In this paper, we provide a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods, highlighting the various challenges, representative works, and recent technical directions in each setting. We also survey the representative datasets throughout the modalities, and present comparative results for various modalities. Moreover, we also discuss the challenges and potential future research directions

    Confocal Laser Endomicroscopy Image Analysis with Deep Convolutional Neural Networks

    Get PDF
    abstract: Rapid intraoperative diagnosis of brain tumors is of great importance for planning treatment and guiding the surgeon about the extent of resection. Currently, the standard for the preliminary intraoperative tissue analysis is frozen section biopsy that has major limitations such as tissue freezing and cutting artifacts, sampling errors, lack of immediate interaction between the pathologist and the surgeon, and time consuming. Handheld, portable confocal laser endomicroscopy (CLE) is being explored in neurosurgery for its ability to image histopathological features of tissue at cellular resolution in real time during brain tumor surgery. Over the course of examination of the surgical tumor resection, hundreds to thousands of images may be collected. The high number of images requires significant time and storage load for subsequent reviewing, which motivated several research groups to employ deep convolutional neural networks (DCNNs) to improve its utility during surgery. DCNNs have proven to be useful in natural and medical image analysis tasks such as classification, object detection, and image segmentation. This thesis proposes using DCNNs for analyzing CLE images of brain tumors. Particularly, it explores the practicality of DCNNs in three main tasks. First, off-the shelf DCNNs were used to classify images into diagnostic and non-diagnostic. Further experiments showed that both ensemble modeling and transfer learning improved the classifier’s accuracy in evaluating the diagnostic quality of new images at test stage. Second, a weakly-supervised learning pipeline was developed for localizing key features of diagnostic CLE images from gliomas. Third, image style transfer was used to improve the diagnostic quality of CLE images from glioma tumors by transforming the histology patterns in CLE images of fluorescein sodium-stained tissue into the ones in conventional hematoxylin and eosin-stained tissue slides. These studies suggest that DCNNs are opted for analysis of CLE images. They may assist surgeons in sorting out the non-diagnostic images, highlighting the key regions and enhancing their appearance through pattern transformation in real time. With recent advances in deep learning such as generative adversarial networks and semi-supervised learning, new research directions need to be followed to discover more promises of DCNNs in CLE image analysis.Dissertation/ThesisDoctoral Dissertation Neuroscience 201

    Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control

    Full text link
    We present Free-HeadGAN, a person-generic neural talking head synthesis system. We show that modeling faces with sparse 3D facial landmarks are sufficient for achieving state-of-the-art generative performance, without relying on strong statistical priors of the face, such as 3D Morphable Models. Apart from 3D pose and facial expressions, our method is capable of fully transferring the eye gaze, from a driving actor to a source identity. Our complete pipeline consists of three components: a canonical 3D key-point estimator that regresses 3D pose and expression-related deformations, a gaze estimation network and a generator that is built upon the architecture of HeadGAN. We further experiment with an extension of our generator to accommodate few-shot learning using an attention mechanism, in case more than one source images are available. Compared to the latest models for reenactment and motion transfer, our system achieves higher photo-realism combined with superior identity preservation, while offering explicit gaze control

    3D Hand reconstruction from monocular camera with model-based priors

    Get PDF
    As virtual and augmented reality (VR/AR) technology gains popularity, facilitating intuitive digital interactions in 3D is of crucial importance. Tools such as VR controllers exist, but such devices support only a limited range of interactions, mapped onto complex sequences of button presses that can be intimidating to learn. In contrast, users already have an instinctive understanding of manual interactions in the real world, which is readily transferable to the virtual world. This makes hands the ideal mode of interaction for down-stream applications such as robotic teleoperation, sign-language translation, and computer-aided design. Existing hand-tracking systems come with several inconvenient limitations. Wearable solutions such as gloves and markers unnaturally limit the range of articulation. Multi-camera systems are not trivial to calibrate and have specialized hardware requirements which make them cumbersome to use. Given these drawbacks, recent research tends to focus on monocular inputs, as these do not constrain articulation and suitable devices are pervasive in everyday life. 3D reconstruction in this setting is severely under-constrained, however, due to occlusions and depth ambiguities. The majority of state-of-the-art works rely on a learning framework to resolve these ambiguities statistically; as a result they have several limitations in common. For example, they require a vast amount of annotated 3D data that is labor intensive to obtain and prone to systematic error. Additionally, traits that are hard to quantify with annotations - the details of individual hand appearance - are difficult to reconstruct in such a framework. Existing methods also make the simplifying assumption that only a single hand is present in the scene. Two-hand interactions introduce additional challenges, however, in the form of inter-hand occlusion, left-right confusion, and collision constraints, that single hand methods cannot address. To tackle the aforementioned shortcomings of previous methods, this thesis advances the state-of-the-art through the novel use of model-based priors to incorporate hand-specific knowledge. In particular, this thesis presents a training method that reduces the amount of annotations required and is robust to systemic biases; it presents the first tracking method that addresses the challenging two-hand-interaction scenario using monocular RGB video, and also the first probabilistic method to model image ambiguity for two-hand interactions. Additionally, this thesis also contributes the first parametric hand texture model with example applications in hand personalization.Virtual- und Augmented-Reality-Technologien (VR/AR) gewinnen rapide an Beliebtheit und Einfluss, und so ist die Erleichterung intuitiver digitaler Interaktionen in 3D von wachsender Bedeutung. Zwar gibt es Tools wie VR-Controller, doch solche Geräte unterstützen nur ein begrenztes Spektrum an Interaktionen, oftmals abgebildet auf komplexe Sequenzen von Tastendrücken, deren Erlernen einschüchternd sein kann. Im Gegensatz dazu haben Nutzer bereits ein instinktives Verständnis für manuelle Interaktionen in der realen Welt, das sich leicht auf die virtuelle Welt übertragen lässt. Dies macht Hände zum idealen Werkzeug der Interaktion für nachgelagerte Anwendungen wie robotergestützte Teleoperation, Übersetzung von Gebärdensprache und computergestütztes Design. Existierende Hand-Tracking Systeme leiden unter mehreren unbequemen Einschränkungen. Tragbare Lösungen wie Handschuhe und aufgesetzte Marker schränken den Bewegungsspielraum auf unnatürliche Weise ein. Systeme mit mehreren Kameras erfordern genaue Kalibrierung und haben spezielle Hardwareanforderungen, die ihre Anwendung umständlich gestalten. Angesichts dieser Nachteile konzentriert sich die neuere Forschung tendenziell auf monokularen Input, da so Bewegungsabläufe nicht gestört werden und geeignete Geräte im Alltag allgegenwärtig sind. Die 3D-Rekonstruktion in diesem Kontext stößt jedoch aufgrund von Okklusionen und Tiefenmehrdeutigkeiten schnell an ihre Grenzen. Die Mehrheit der Arbeiten auf dem neuesten Stand der Technik setzt hierbei auf ein ML-Framework, um diese Mehrdeutigkeiten statistisch aufzulösen; infolgedessen haben all diese mehrere Einschränkungen gemein. Beispielsweise benötigen sie eine große Menge annotierter 3D-Daten, deren Beschaffung arbeitsintensiv und anfällig für systematische Fehler ist. Darüber hinaus sind Merkmale, die mit Anmerkungen nur schwer zu quantifizieren sind – die Details des individuellen Erscheinungsbildes – in einem solchen Rahmen schwer zu rekonstruieren. Bestehende Verfahren gehen auch vereinfachend davon aus, dass nur eine einzige Hand in der Szene vorhanden ist. Zweihand-Interaktionen bringen jedoch zusätzliche Herausforderungen in Form von Okklusion der Hände untereinander, Links-Rechts-Verwirrung und Kollisionsbeschränkungen mit sich, die Einhand-Methoden nicht bewältigen können. Um die oben genannten Mängel früherer Methoden anzugehen, bringt diese Arbeit den Stand der Technik durch die neuartige Verwendung modellbasierter Priors voran, um Hand-spezifisches Wissen zu integrieren. Insbesondere stellt diese Arbeit eine Trainingsmethode vor, die die Menge der erforderlichen Annotationen reduziert und robust gegenüber systemischen Verzerrungen ist; es wird die erste Tracking-Methode vorgestellt, die das herausfordernde Zweihand-Interaktionsszenario mit monokularem RGB-Video angeht, und auch die erste probabilistische Methode zur Modellierung der Bildmehrdeutigkeit für Zweihand-Interaktionen. Darüber hinaus trägt diese Arbeit auch das erste parametrische Handtexturmodell mit Beispielanwendungen in der Hand-Personalisierung bei

    Pathway to Future Symbiotic Creativity

    Full text link
    This report presents a comprehensive view of our vision on the development path of the human-machine symbiotic art creation. We propose a classification of the creative system with a hierarchy of 5 classes, showing the pathway of creativity evolving from a mimic-human artist (Turing Artists) to a Machine artist in its own right. We begin with an overview of the limitations of the Turing Artists then focus on the top two-level systems, Machine Artists, emphasizing machine-human communication in art creation. In art creation, it is necessary for machines to understand humans' mental states, including desires, appreciation, and emotions, humans also need to understand machines' creative capabilities and limitations. The rapid development of immersive environment and further evolution into the new concept of metaverse enable symbiotic art creation through unprecedented flexibility of bi-directional communication between artists and art manifestation environments. By examining the latest sensor and XR technologies, we illustrate the novel way for art data collection to constitute the base of a new form of human-machine bidirectional communication and understanding in art creation. Based on such communication and understanding mechanisms, we propose a novel framework for building future Machine artists, which comes with the philosophy that a human-compatible AI system should be based on the "human-in-the-loop" principle rather than the traditional "end-to-end" dogma. By proposing a new form of inverse reinforcement learning model, we outline the platform design of machine artists, demonstrate its functions and showcase some examples of technologies we have developed. We also provide a systematic exposition of the ecosystem for AI-based symbiotic art form and community with an economic model built on NFT technology. Ethical issues for the development of machine artists are also discussed

    Parametric meta-filter modeling from a single example pair

    Get PDF
    We present a method for learning a meta- �lter from an example pair comprising an original image A and its �ltered version A0 using an unknown image �lter. A meta-�lter is a parametric model, consisting of a spatially varying linear combination of simple basis �lters. We introduce a technique for learning the parameters of the meta-�lter f such that it approximates the e�ects of the unknown �lter, i.e., f(A) approximates A0. The meta-�lter can be transferred to novel input images, and its parametric representation enables intuitive tuning of its parameters to achieve controlled variations. We show that our technique successfully learns and models meta-�lters that approximate a large variety of common image �lters with high accuracy both visually and quantitatively
    corecore