Search CORE

26 research outputs found

IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models

Author: Campbell Dylan
Hartley Richard
Singh Jaskirat
Tu Peter
Xu Zhiwei
Yang Zhaoyuan
Yu Zhengyang
Zhang Jing
Publication venue
Publication date: 12/11/2023
Field of study

We present a diffusion-based image morphing approach with perceptually-uniform sampling (IMPUS) that produces smooth, direct, and realistic interpolations given an image pair. A latent diffusion model has distinct conditional distributions and data embeddings for each of the two images, especially when they are from different classes. To bridge this gap, we interpolate in the locally linear and continuous text embedding space and Gaussian latent space. We first optimize the endpoint text embeddings and then map the images to the latent space using a probability flow ODE. Unlike existing work that takes an indirect morphing path, we show that the model adaptation yields a direct path and suppresses ghosting artifacts in the interpolated images. To achieve this, we propose an adaptive bottleneck constraint based on a novel relative perceptual path diversity score that automatically controls the bottleneck size and balances the diversity along the path with its directness. We also propose a perceptually-uniform sampling technique that enables visually smooth changes between the interpolated images. Extensive experiments validate that our IMPUS can achieve smooth, direct, and realistic image morphing and be applied to other image generation tasks

arXiv.org e-Print Archive

DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

Author: Dai Bo
Pan Xingang
Xu Xudong
Zhang Kaiwen
Zhou Yifan
Publication venue
Publication date: 12/12/2023
Field of study

Diffusion models have achieved remarkable image generation quality surpassing previous generative models. However, a notable limitation of diffusion models, in comparison to GANs, is their difficulty in smoothly interpolating between two image samples, due to their highly unstructured latent space. Such a smooth interpolation is intriguing as it naturally serves as a solution for the image morphing task with many applications. In this work, we present DiffMorpher, the first approach enabling smooth and natural image interpolation using diffusion models. Our key idea is to capture the semantics of the two images by fitting two LoRAs to them respectively, and interpolate between both the LoRA parameters and the latent noises to ensure a smooth semantic transition, where correspondence automatically emerges without the need for annotation. In addition, we propose an attention interpolation and injection technique and a new sampling schedule to further enhance the smoothness between consecutive images. Extensive experiments demonstrate that DiffMorpher achieves starkly better image morphing effects than previous methods across a variety of object categories, bridging a critical functional gap that distinguished diffusion models from GANs

arXiv.org e-Print Archive

Multiview Regenerative Morphing with Dual Flows

Author: Chen Hwann-Tzong
Sun Cheng
Tsai Chih-Jung
Publication venue
Publication date: 02/08/2022
Field of study

This paper aims to address a new task of image morphing under a multiview setting, which takes two sets of multiview images as the input and generates intermediate renderings that not only exhibit smooth transitions between the two input sets but also ensure visual consistency across different views at any transition state. To achieve this goal, we propose a novel approach called Multiview Regenerative Morphing that formulates the morphing process as an optimization to solve for rigid transformation and optimal-transport interpolation. Given the multiview input images of the source and target scenes, we first learn a volumetric representation that models the geometry and appearance for each scene to enable the rendering of novel views. Then, the morphing between the two scenes is obtained by solving optimal transport between the two volumetric representations in Wasserstein metrics. Our approach does not rely on user-specified correspondences or 2D/3D input meshes, and we do not assume any predefined categories of the source and target scenes. The proposed view-consistent interpolation scheme directly works on multiview images to yield a novel and visually plausible effect of multiview free-form morphing

arXiv.org e-Print Archive

Object-centric generative models for robot perception and action

Author: Wu Yizhe
Publication venue
Publication date: 09/02/2024
Field of study

The system of robot manipulation involves a pipeline consisting of the perception of objects in the environment and the planning of actions in 3D space. Deep learning approaches are employed to segment scenes into components of objects and then learn object-centric features to predict actions for downstream tasks. Despite having achieved promising performance in several manipulation tasks, supervised approaches lack inductive biases related to general properties of objects. Recent advances show that by encoding and reconstructing scenes in an object-centric fashion, the model can discover object-like entities from raw data without human supervision. Moreover, by reconstructing the discovered objects, the model can learn a variational latent space that captures the various shapes and textures of the objects, regularised by a chosen prior distribution. In this thesis, we investigate the properties of this learned object-centric latent space and develop novel object-centric generative models (OCGMs) that can be applied to real-world robotics scenarios. In the first part of this thesis, we investigate a tool-synthesis task which leverages a learned latent space to optimise a wide range of tools applied to a reaching task. Given an image that illustrates the obstacles and the reaching target in the scene, an affordance predictor is trained to predict the feasibility of the tool for the given task. To imitate human tool-use experiences, feasibility labels are acquired from simulated trial-and-errors of the reaching task. We found that by employing an activation maximisation step, the model can synthesis proper tools for the given tasks with high accuracy. Moreover, the tool-synthesis process indicates the existence of a task-relevant trajectory in the learned latent space that can be found by a trained affordance predictor. The second part of this thesis focuses on the development of novel OCGMs and their applications to robotic tasks. We first introduce a 2D OCGM that is deployed to robot manipulation datasets in both simulation and real-world scenarios. Despite the intensive interactions between robot arm and objects, we find the model discovers meaningful object entities from the raw observations without any human supervision. We next upgrade the 2D OCGM to 3D by leveraging NeRFs as decoders to explicitly model the 3D geometry of objects and the background. To disentangle the object spatial information from its appearance information, we propose a minimum volume principle for unsupervised 6D pose estimation of the objects. Considering the occlusion in the scene, we further improve the pose estimation by introducing a shape completion module to imagine the unobserved parts of the objects before the pose estimation step. In the end, we successfully apply the model in real-world robotics scenarios and compare its performance in several tasks including the 3D reconstruction, object-centric latent representation learning, 6D pose estimation for object rearrangement, against several baselines. We find that despite being an unsupervised approach, our model achieves improved performance across a range of different real-world tasks

Oxford University Research Archive

Efficient image-based rendering

Author: Bemana Mojtaba
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2023
Field of study

Recent advancements in real-time ray tracing and deep learning have significantly enhanced the realism of computer-generated images. However, conventional 3D computer graphics (CG) can still be time-consuming and resource-intensive, particularly when creating photo-realistic simulations of complex or animated scenes. Image-based rendering (IBR) has emerged as an alternative approach that utilizes pre-captured images from the real world to generate realistic images in real-time, eliminating the need for extensive modeling. Although IBR has its advantages, it faces challenges in providing the same level of control over scene attributes as traditional CG pipelines and accurately reproducing complex scenes and objects with different materials, such as transparent objects. This thesis endeavors to address these issues by harnessing the power of deep learning and incorporating the fundamental principles of graphics and physical-based rendering. It offers an efficient solution that enables interactive manipulation of real-world dynamic scenes captured from sparse views, lighting positions, and times, as well as a physically-based approach that facilitates accurate reproduction of the view dependency effect resulting from the interaction between transparent objects and their surrounding environment. Additionally, this thesis develops a visibility metric that can identify artifacts in the reconstructed IBR images without observing the reference image, thereby contributing to the design of an effective IBR acquisition pipeline. Lastly, a perception-driven rendering technique is developed to provide high-fidelity visual content in virtual reality displays while retaining computational efficiency.Jüngste Fortschritte im Bereich Echtzeit-Raytracing und Deep Learning haben den Realismus computergenerierter Bilder erheblich verbessert. Konventionelle 3DComputergrafik (CG) kann jedoch nach wie vor zeit- und ressourcenintensiv sein, insbesondere bei der Erstellung fotorealistischer Simulationen von komplexen oder animierten Szenen. Das bildbasierte Rendering (IBR) hat sich als alternativer Ansatz herauskristallisiert, bei dem vorab aufgenommene Bilder aus der realen Welt verwendet werden, um realistische Bilder in Echtzeit zu erzeugen, so dass keine umfangreiche Modellierung erforderlich ist. Obwohl IBR seine Vorteile hat, ist es eine Herausforderung, das gleiche Maß an Kontrolle über Szenenattribute zu bieten wie traditionelle CG-Pipelines und komplexe Szenen und Objekte mit unterschiedlichen Materialien, wie z.B. transparente Objekte, akkurat wiederzugeben. In dieser Arbeit wird versucht, diese Probleme zu lösen, indem die Möglichkeiten des Deep Learning genutzt und die grundlegenden Prinzipien der Grafik und des physikalisch basierten Renderings einbezogen werden. Sie bietet eine effiziente Lösung, die eine interaktive Manipulation von dynamischen Szenen aus der realen Welt ermöglicht, die aus spärlichen Ansichten, Beleuchtungspositionen und Zeiten erfasst wurden, sowie einen physikalisch basierten Ansatz, der eine genaue Reproduktion des Effekts der Sichtabhängigkeit ermöglicht, der sich aus der Interaktion zwischen transparenten Objekten und ihrer Umgebung ergibt. Darüber hinaus wird in dieser Arbeit eine Sichtbarkeitsmetrik entwickelt, mit der Artefakte in den rekonstruierten IBR-Bildern identifiziert werden können, ohne das Referenzbild zu betrachten, und die somit zur Entwicklung einer effektiven IBR-Erfassungspipeline beiträgt. Schließlich wird ein wahrnehmungsgesteuertes Rendering-Verfahren entwickelt, um visuelle Inhalte in Virtual-Reality-Displays mit hoherWiedergabetreue zu liefern und gleichzeitig die Rechenleistung zu erhalten

Universaar

Acronym

Animation and Interaction of Responsive, Expressive, and Tangible 3D Virtual Characters

Author: Fernández Baena Adso
Publication venue: Blanquerna - Universitat Ramon Llull
Publication date: 01/01/2015
Field of study

This thesis is framed within the field of 3D Character Animation. Virtual characters are used in many Human Computer Interaction applications such as video games and serious games. Within these virtual worlds they move and act in similar ways to humans controlled by users through some form of interface or by artificial intelligence. This work addresses the challenges of developing smoother movements and more natural behaviors driving motions in real-time, intuitively, and accurately. The interaction between virtual characters and intelligent objects will also be explored. With these subjects researched the work will contribute to creating more responsive, expressive, and tangible virtual characters. The navigation within virtual worlds uses locomotion such as walking, running, etc. To achieve maximum realism, actors' movements are captured and used to animate virtual characters. This is the philosophy of motion graphs: a structure that embeds movements where the continuous motion stream is generated from concatenating motion pieces. However, locomotion synthesis, using motion graphs, involves a tradeoff between the number of possible transitions between different kinds of locomotion, and the quality of these, meaning smooth transition between poses. To overcome this drawback, we propose the method of progressive transitions using Body Part Motion Graphs (BPMGs). This method deals with partial movements, and generates specific, synchronized transitions for each body part (group of joints) within a window of time. Therefore, the connectivity within the system is not linked to the similarity between global poses allowing us to find more and better quality transition points while increasing the speed of response and execution of these transitions in contrast to standard motion graphs method. Secondly, beyond getting faster transitions and smoother movements, virtual characters also interact with each other and with users by speaking. This interaction requires the creation of appropriate gestures according to the voice that they reproduced. Gestures are the nonverbal language that accompanies voiced language. The credibility of virtual characters when speaking is linked to the naturalness of their movements in sync with the voice in speech and intonation. Consequently, we analyzed the relationship between gestures, speech, and the performed gestures according to that speech. We defined intensity indicators for both gestures (GSI, Gesture Strength Indicator) and speech (PSI, Pitch Strength Indicator). We studied the relationship in time and intensity of these cues in order to establish synchronicity and intensity rules. Later we adapted the mentioned rules to select the appropriate gestures to the speech input (tagged text from speech signal) in the Gesture Motion Graph (GMG). The evaluation of resulting animations shows the importance of relating the intensity of speech and gestures to generate believable animations beyond time synchronization. Subsequently, we present a system that leads automatic generation of gestures and facial animation from a speech signal: BodySpeech. This system also includes animation improvements such as: increased use of data input, more flexible time synchronization, and new features like editing style of output animations. In addition, facial animation also takes into account speech intonation. Finally, we have moved virtual characters from virtual environments to the physical world in order to explore their interaction possibilities with real objects. To this end, we present AvatARs, virtual characters that have tangible representation and are integrated into reality through augmented reality apps on mobile devices. Users choose a physical object to manipulate in order to control the animation. They can select and configure the animation, which serves as a support for the virtual character represented. Then, we explored the interaction of AvatARs with intelligent physical objects like the Pleo social robot. Pleo is used to assist hospitalized children in therapy or simply for playing. Despite its benefits, there is a lack of emotional relationship and interaction between the children and Pleo which makes children lose interest eventually. This is why we have created a mixed reality scenario where Vleo (AvatAR as Pleo, virtual element) and Pleo (real element) interact naturally. This scenario has been tested and the results conclude that AvatARs enhances children's motivation to play with Pleo, opening a new horizon in the interaction between virtual characters and robots.Aquesta tesi s'emmarca dins del món de l'animació de personatges virtuals tridimensionals. Els personatges virtuals s'utilitzen en moltes aplicacions d'interacció home màquina, com els videojocs o els serious games, on es mouen i actuen de forma similar als humans dins de mons virtuals, i on són controlats pels usuaris per mitjà d'alguna interfície, o d'altra manera per sistemes intel·ligents. Reptes com aconseguir moviments fluids i comportament natural, controlar en temps real el moviment de manera intuitiva i precisa, i inclús explorar la interacció dels personatges virtuals amb elements físics intel·ligents; són els que es treballen a continuació amb l'objectiu de contribuir en la generació de personatges virtuals responsius, expressius i tangibles. La navegació dins dels mons virtuals fa ús de locomocions com caminar, córrer, etc. Per tal d'aconseguir el màxim de realisme, es capturen i reutilitzen moviments d'actors per animar els personatges virtuals. Així funcionen els motion graphs, una estructura que encapsula moviments i per mitjà de cerques dins d'aquesta, els concatena creant un flux continu. La síntesi de locomocions usant els motion graphs comporta un compromís entre el número de transicions entre les diferents locomocions, i la qualitat d'aquestes (similitud entre les postures a connectar). Per superar aquest inconvenient, proposem el mètode transicions progressives usant Body Part Motion Graphs (BPMGs). Aquest mètode tracta els moviments de manera parcial, i genera transicions específiques i sincronitzades per cada part del cos (grup d'articulacions) dins d'una finestra temporal. Per tant, la conectivitat del sistema no està lligada a la similitud de postures globals, permetent trobar més punts de transició i de més qualitat, i sobretot incrementant la rapidesa en resposta i execució de les transicions respecte als motion graphs estàndards. En segon lloc, més enllà d'aconseguir transicions ràpides i moviments fluids, els personatges virtuals també interaccionen entre ells i amb els usuaris parlant, creant la necessitat de generar moviments apropiats a la veu que reprodueixen. Els gestos formen part del llenguatge no verbal que acostuma a acompanyar a la veu. La credibilitat dels personatges virtuals parlants està lligada a la naturalitat dels seus moviments i a la concordança que aquests tenen amb la veu, sobretot amb l'entonació d'aquesta. Així doncs, hem realitzat l'anàlisi de la relació entre els gestos i la veu, i la conseqüent generació de gestos d'acord a la veu. S'han definit indicadors d'intensitat tant per gestos (GSI, Gesture Strength Indicator) com per la veu (PSI, Pitch Strength Indicator), i s'ha estudiat la relació entre la temporalitat i la intensitat de les dues senyals per establir unes normes de sincronia temporal i d'intensitat. Més endavant es presenta el Gesture Motion Graph (GMG), que selecciona gestos adients a la veu d'entrada (text anotat a partir de la senyal de veu) i les regles esmentades. L'avaluació de les animaciones resultants demostra la importància de relacionar la intensitat per generar animacions cre\"{ibles, més enllà de la sincronització temporal. Posteriorment, presentem un sistema de generació automàtica de gestos i animació facial a partir d'una senyal de veu: BodySpeech. Aquest sistema també inclou millores en l'animació, major reaprofitament de les dades d'entrada i sincronització més flexible, i noves funcionalitats com l'edició de l'estil les animacions de sortida. A més, l'animació facial també té en compte l'entonació de la veu. Finalment, s'han traslladat els personatges virtuals dels entorns virtuals al món físic per tal d'explorar les possibilitats d'interacció amb objectes reals. Per aquest fi, presentem els AvatARs, personatges virtuals que tenen representació tangible i que es visualitzen integrats en la realitat a través d'un dispositiu mòbil gràcies a la realitat augmentada. El control de l'animació es duu a terme per mitjà d'un objecte físic que l'usuari manipula, seleccionant i parametritzant les animacions, i que al mateix temps serveix com a suport per a la representació del personatge virtual. Posteriorment, s'ha explorat la interacció dels AvatARs amb objectes físics intel·ligents com el robot social Pleo. El Pleo s'utilitza per a assistir a nens hospitalitzats en teràpia o simplement per jugar. Tot i els seus beneficis, hi ha una manca de relació emocional i interacció entre els nens i el Pleo que amb el temps fa que els nens perdin l'interès en ell. Així doncs, hem creat un escenari d'interacció mixt on el Vleo (un AvatAR en forma de Pleo; element virtual) i el Pleo (element real) interactuen de manera natural. Aquest escenari s'ha testejat i els resultats conclouen que els AvatARs milloren la motivació per jugar amb el Pleo, obrint un nou horitzó en la interacció dels personatges virtuals amb robots.Esta tesis se enmarca dentro del mundo de la animación de personajes virtuales tridimensionales. Los personajes virtuales se utilizan en muchas aplicaciones de interacción hombre máquina, como los videojuegos y los serious games, donde dentro de mundo virtuales se mueven y actúan de manera similar a los humanos, y son controlados por usuarios por mediante de alguna interfaz, o de otro modo, por sistemas inteligentes. Retos como conseguir movimientos fluidos y comportamiento natural, controlar en tiempo real el movimiento de manera intuitiva y precisa, y incluso explorar la interacción de los personajes virtuales con elementos físicos inteligentes; son los que se trabajan a continuación con el objetivo de contribuir en la generación de personajes virtuales responsivos, expresivos y tangibles. La navegación dentro de los mundos virtuales hace uso de locomociones como andar, correr, etc. Para conseguir el máximo realismo, se capturan y reutilizan movimientos de actores para animar los personajes virtuales. Así funcionan los motion graphs, una estructura que encapsula movimientos y que por mediante búsquedas en ella, los concatena creando un flujo contínuo. La síntesi de locomociones usando los motion graphs comporta un compromiso entre el número de transiciones entre las distintas locomociones, y la calidad de estas (similitud entre las posturas a conectar). Para superar este inconveniente, proponemos el método transiciones progresivas usando Body Part Motion Graphs (BPMGs). Este método trata los movimientos de manera parcial, y genera transiciones específicas y sincronizadas para cada parte del cuerpo (grupo de articulaciones) dentro de una ventana temporal. Por lo tanto, la conectividad del sistema no está vinculada a la similitud de posturas globales, permitiendo encontrar más puntos de transición y de más calidad, incrementando la rapidez en respuesta y ejecución de las transiciones respeto a los motion graphs estándards. En segundo lugar, más allá de conseguir transiciones rápidas y movimientos fluídos, los personajes virtuales también interaccionan entre ellos y con los usuarios hablando, creando la necesidad de generar movimientos apropiados a la voz que reproducen. Los gestos forman parte del lenguaje no verbal que acostumbra a acompañar a la voz. La credibilidad de los personajes virtuales parlantes está vinculada a la naturalidad de sus movimientos y a la concordancia que estos tienen con la voz, sobretodo con la entonación de esta. Así pues, hemos realizado el análisis de la relación entre los gestos y la voz, y la consecuente generación de gestos de acuerdo a la voz. Se han definido indicadores de intensidad tanto para gestos (GSI, Gesture Strength Indicator) como para la voz (PSI, Pitch Strength Indicator), y se ha estudiado la relación temporal y de intensidad para establecer unas reglas de sincronía temporal y de intensidad. Más adelante se presenta el Gesture Motion Graph (GMG), que selecciona gestos adientes a la voz de entrada (texto etiquetado a partir de la señal de voz) y las normas mencionadas. La evaluación de las animaciones resultantes demuestra la importancia de relacionar la intensidad para generar animaciones creíbles, más allá de la sincronización temporal. Posteriormente, presentamos un sistema de generación automática de gestos y animación facial a partir de una señal de voz: BodySpeech. Este sistema también incluye mejoras en la animación, como un mayor aprovechamiento de los datos de entrada y una sincronización más flexible, y nuevas funcionalidades como la edición del estilo de las animaciones de salida. Además, la animación facial también tiene en cuenta la entonación de la voz. Finalmente, se han trasladado los personajes virtuales de los entornos virtuales al mundo físico para explorar las posibilidades de interacción con objetos reales. Para este fin, presentamos los AvatARs, personajes virtuales que tienen representación tangible y que se visualizan integrados en la realidad a través de un dispositivo móvil gracias a la realidad aumentada. El control de la animación se lleva a cabo mediante un objeto físico que el usuario manipula, seleccionando y configurando las animaciones, y que a su vez sirve como soporte para la representación del personaje. Posteriormente, se ha explorado la interacción de los AvatARs con objetos físicos inteligentes como el robot Pleo. Pleo se utiliza para asistir a niños en terapia o simplemente para jugar. Todo y sus beneficios, hay una falta de relación emocional y interacción entre los niños y Pleo que con el tiempo hace que los niños pierdan el interés. Así pues, hemos creado un escenario de interacción mixto donde Vleo (AvatAR en forma de Pleo; virtual) y Pleo (real) interactúan de manera natural. Este escenario se ha testeado y los resultados concluyen que los AvatARs mejoran la motivación para jugar con Pleo, abriendo un nuevo horizonte en la interacción de los personajes virtuales con robots

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

Detecting a visual object in the presence of other objects : the flanker facilitation effect in contour integration

Author: Gillespie Christopher
Publication venue: The University of St Andrews
Publication date: 12/09/2016
Field of study

When an observer views a complex visual scene and tries to identify an object, his or her visual system must decide what regions of the visual field correspond to the object of interest and which do not. One aspect of this process involves the grouping of the local contrast information (e.g., orientation, position and frequency) into a smooth contour object. This thesis investigated whether the presence of other flanking objects affected this contour integration of a central target contour. To test this, a set of Gaborized contour shapes were embedded in a randomised Gabor noise field. The detectability of the contours was altered by adjusting the alignment of the Gabor patches in the contour (orientation jitter) until a participant was unable to distinguish between a field with and without a target shape (2-AFC procedure). By varying the magnitude of this jitter, detection thresholds were determined for target contours under various experimental conditions. These thresholds were used to investigate whether contour integration was sensitive to shared shape information between objects across the visual field. This thesis determined that the presence of flanking contours of a similar shape (as the target) facilitated the detection of a noisy target contour. The specific results suggest that this facilitation does not involve a simple template matching or shape priming but is associated with integration of shape level information in the detection of the most likely smooth closed contour. The magnitude of this flanker facilitation effect was sensitive to a number of factors (e.g., numerosity, relative position of the flankers, and perimeter complexity/compactness). The implication of these findings is that the processing of highly localised contrast and orientation information originating from a single object is subject to modulation from other sources of shape information across the whole of the visual field

St Andrews Research Repository