305 research outputs found

    Avatar led interventions in the Metaverse reveal that interpersonal effectiveness can be measured, predicted, and improved

    Get PDF
    Experiential learning has been known to be an engaging and effective modality for personal and professional development. The Metaverse provides ample opportunities for the creation of environments in which such experiential learning can occur. In this work, we introduce a novel interpersonal effectiveness improvement framework (ELAINE) that combines Artificial Intelligence and Virtual Reality to create a highly immersive and efficient learning experience using avatars. We present findings from a study that uses this framework to measure and improve the interpersonal effectiveness of individuals interacting with an avatar. Results reveal that individuals with deficits in their interpersonal effectiveness show a significant improvement (p < 0.02) after multiple interactions with an avatar. The results also reveal that individuals interact naturally with avatars within this framework, and exhibit similar behavioral traits as they would in the real world. We use this as a basis to analyze the underlying audio and video data streams of individuals during these interactions. We extract relevant features from these data and present a machine-learning based approach to predict interpersonal effectiveness during human-avatar conversation. We conclude by discussing the implications of these findings to build beneficial applications for the real world

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Artificial Intelligence in the Creative Industries: A Review

    Full text link
    This paper reviews the current state of the art in Artificial Intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement Learning (DRL). We categorise creative applications into five groups related to how AI technologies are used: i) content creation, ii) information analysis, iii) content enhancement and post production workflows, iv) information extraction and enhancement, and v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, machine learning-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of machine learning in domains with fewer constraints, where AI is the `creator', remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity

    Computational Multimedia for Video Self Modeling

    Get PDF
    Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of oneself. This is the idea behind the psychological theory of self-efficacy - you can learn or model to perform certain tasks because you see yourself doing it, which provides the most ideal form of behavior modeling. The effectiveness of VSM has been demonstrated for many different types of disabilities and behavioral problems ranging from stuttering, inappropriate social behaviors, autism, selective mutism to sports training. However, there is an inherent difficulty associated with the production of VSM material. Prolonged and persistent video recording is required to capture the rare, if not existed at all, snippets that can be used to string together in forming novel video sequences of the target skill. To solve this problem, in this dissertation, we use computational multimedia techniques to facilitate the creation of synthetic visual content for self-modeling that can be used by a learner and his/her therapist with a minimum amount of training data. There are three major technical contributions in my research. First, I developed an Adaptive Video Re-sampling algorithm to synthesize realistic lip-synchronized video with minimal motion jitter. Second, to denoise and complete the depth map captured by structure-light sensing systems, I introduced a layer based probabilistic model to account for various types of uncertainties in the depth measurement. Third, I developed a simple and robust bundle-adjustment based framework for calibrating a network of multiple wide baseline RGB and depth cameras

    Development of the components of a low cost, distributed facial virtual conferencing system

    Get PDF
    This thesis investigates the development of a low cost, component based facial virtual conferencing system. The design is decomposed into an encoding phase and a decoding phase, which communicate with each other via a network connection. The encoding phase is composed of three components: model acquisition (which handles avatar generation), pose estimation and expression analysis. Audio is not considered part of the encoding and decoding process, and as such is not evaluated. The model acquisition component is implemented using a visual hull reconstruction algorithm that is able to reconstruct real-world objects using only sets of images of the object as input. The object to be reconstructed is assumed to lie in a bounding volume of voxels. The reconstruction process involves the following stages: - Space carving for basic shape extraction; - Isosurface extraction to remove voxels not part of the surface of the reconstruction; - Mesh connection to generate a closed, connected polyhedral mesh; - Texture generation. Texturing is achieved by Gouraud shading the reconstruction with a vertex colour map; - Mesh decimation to simplify the object. The original algorithm has complexity O(n), but suffers from an inability to reconstruct concave surfaces that do not form part of the visual hull of the object. A novel extension to this algorithm based on Normalised Cross Correlation (NCC) is proposed to overcome this problem. An extension to speed up traditional NCC evaluations is proposed which reduces the NCC search space from a 2D search problem down to a single evaluation. Pose estimation and expression analysis are performed by tracking six fiducial points on the face of a subject. A tracking algorithm is developed that uses Normalised Cross Correlation to facilitate robust tracking that is invariant to changing lighting conditions, rotations and scaling. Pose estimation involves the recovery of the head position and orientation through the tracking of the triangle formed by the subject's eyebrows and nose tip. A rule-based evaluation of points that are tracked around the subject's mouth forms the basis of the expression analysis. A user assisted feedback loop and caching mechanism is used to overcome tracking errors due to fast motion or occlusions. The NCC tracker is shown to achieve a tracking performance of 10 fps when tracking the six fiducial points. The decoding phase is divided into 3 tasks, namely: avatar movement, expression generation and expression management. Avatar movement is implemented using the base VR system. Expression generation is facilitated using a Vertex Interpolation Deformation method. A weighting system is proposed for expression management. Its function is to gradually transform from one expression to the next. The use of the vertex interpolation method allows real-time deformations of the avatar representation, achieving 16 fps when applied to a model consisting of 7500 vertices. An Expression Parameter Lookup Table (EPLT) facilitates an independent mapping between the two phases. It defines a list of generic expressions that are known to the system and associates an Expression ID with each one. For each generic expression, it relates the expression analysis rules for any subject with the expression generation parameters for any avatar model. The result is that facial expression replication between any subject and avatar combination can be performed by transferring only the Expression ID from the encoder application to the decoder application. The ideas developed in the thesis are demonstrated in an implementation using the CoRgi Virtual Reality system. It is shown that the virtual-conferencing application based on this design requires only a bandwidth of 2 Kbps.Adobe Acrobat Pro 9.4.6Adobe Acrobat 9.46 Paper Capture Plug-i

    Die Virtuelle Videokamera: ein System zur Blickpunktsynthese in beliebigen, dynamischen Szenen

    Get PDF
    The Virtual Video Camera project strives to create free viewpoint video from casually captured multi-view data. Multiple video streams of a dynamic scene are captured with off-the-shelf camcorders, and the user can re-render the scene from novel perspectives. In this thesis the algorithmic core of the Virtual Video Camera is presented. This includes the algorithm for image correspondence estimation as well as the image-based renderer. Furthermore, its application in the context of an actual video production is showcased, and the rendering and image processing pipeline is extended to incorporate depth information.Das Virtual Video Camera Projekt dient der Erzeugung von Free Viewpoint Video Ansichten von Multi-View Aufnahmen: Material mehrerer Videoströme wird hierzu mit handelsüblichen Camcordern aufgezeichnet. Im Anschluss kann die Szene aus beliebigen, von den ursprünglichen Kameras nicht abgedeckten Blickwinkeln betrachtet werden. In dieser Dissertation wird der algorithmische Kern der Virtual Video Camera vorgestellt. Dies beinhaltet das Verfahren zur Bildkorrespondenzschätzung sowie den bildbasierten Renderer. Darüber hinaus wird die Anwendung im Kontext einer Videoproduktion beleuchtet. Dazu wird die bildbasierte Erzeugung neuer Blickpunkte um die Erzeugung und Einbindung von Tiefeninformationen erweitert

    3D object reconstruction using computer vision : reconstruction and characterization applications for external human anatomical structures

    Get PDF
    Tese de doutoramento. Engenharia Informática. Faculdade de Engenharia. Universidade do Porto. 201

    Video based reconstruction system for mixed reality environments supporting contextualised non-verbal communication and its study

    Get PDF
    This Thesis presents a system to capture, reconstruct and render the three-dimensional form of people and objects of interest in such detail that the spatial and visual aspects of non-verbal behaviour can be communicated.The system supports live distribution and simultaneous rendering in multiple locations enabling the apparent teleportation of people and objects. Additionally, the system allows for the recording of live sessions and their playback in natural time with free-viewpoint.It utilises components of a video based reconstruction and a distributed video implementation to create an end-to-end system that can operate in real-time and on commodity hardware.The research addresses the specific challenges of spatial and colour calibration, segmentation and overall system architecture to overcome technical barriers, the requirement of domain specific knowledge to setup and generate avatars to a consistent high quality.Applications of the system include, but are not limited to, telepresence, where the computer generated avatars used in Immersive Collaborative Virtual Environments can be replaced with ones that are faithful of the people they represent and supporting researchers in their study of human communication such as gaze, inter-personal distance and facial expression.The system has been adopted in other research projects and is integrated with a mixed reality application where, during a live linkup, a three-dimensional avatar is streamed to multiple end-points across different countries
    corecore