375 research outputs found

    The Effects of Hand Tracking on User Performance: an experimental study of an object selection based memory game

    Get PDF
    Until recently, Virtual Reality (VR) applications relied on controllers to enable user interaction in virtual environments. With advances in tracking technology, HMDs are now able to track the movements of users’ hands in real-time with significantly greater accuracy, allowing us to interact with the digital world directly with our hands. However, it is not entirely clear how hand tracking affects users’ performance. In this study, we investigate user performance using an in-game analytics-based assessment methodology for a VR memory puzzle task. We conducted a within-subjects experiment with 30 participants in three conditions: 1- Hand-tracking, 2- Controller Without Haptics, and 3- Controller With Haptics. In all our measurements (correct order and pattern, correct pattern only, and trial completion) except for the initial selection time, participants performed best with hand tracking. The use of controllers with haptics did not outperform controllers without haptics in most measures, possibly because other feedback cues compensated for the lack of haptics. This study helps us better understand the three selected interactivity methods when used in VR, as well as the importance of naturalistic experience in interaction design

    Human Motion Generation: A Survey

    Full text link
    Human motion generation aims to generate natural human pose sequences and shows immense potential for real-world applications. Substantial progress has been made recently in motion data collection technologies and generation methods, laying the foundation for increasing interest in human motion generation. Most research within this field focuses on generating human motions based on conditional signals, such as text, audio, and scene contexts. While significant advancements have been made in recent years, the task continues to pose challenges due to the intricate nature of human motion and its implicit relationship with conditional signals. In this survey, we present a comprehensive literature review of human motion generation, which, to the best of our knowledge, is the first of its kind in this field. We begin by introducing the background of human motion and generative models, followed by an examination of representative methods for three mainstream sub-tasks: text-conditioned, audio-conditioned, and scene-conditioned human motion generation. Additionally, we provide an overview of common datasets and evaluation metrics. Lastly, we discuss open problems and outline potential future research directions. We hope that this survey could provide the community with a comprehensive glimpse of this rapidly evolving field and inspire novel ideas that address the outstanding challenges.Comment: 20 pages, 5 figure

    Application of Machine Learning within Visual Content Production

    Get PDF
    We are living in an era where digital content is being produced at a dazzling pace. The heterogeneity of contents and contexts is so varied that a numerous amount of applications have been created to respond to people and market demands. The visual content production pipeline is the generalisation of the process that allows a content editor to create and evaluate their product, such as a video, an image, a 3D model, etc. Such data is then displayed on one or more devices such as TVs, PC monitors, virtual reality head-mounted displays, tablets, mobiles, or even smartwatches. Content creation can be simple as clicking a button to film a video and then share it into a social network, or complex as managing a dense user interface full of parameters by using keyboard and mouse to generate a realistic 3D model for a VR game. In this second example, such sophistication results in a steep learning curve for beginner-level users. In contrast, expert users regularly need to refine their skills via expensive lessons, time-consuming tutorials, or experience. Thus, user interaction plays an essential role in the diffusion of content creation software, primarily when it is targeted to untrained people. In particular, with the fast spread of virtual reality devices into the consumer market, new opportunities for designing reliable and intuitive interfaces have been created. Such new interactions need to take a step beyond the point and click interaction typical of the 2D desktop environment. The interactions need to be smart, intuitive and reliable, to interpret 3D gestures and therefore, more accurate algorithms are needed to recognise patterns. In recent years, machine learning and in particular deep learning have achieved outstanding results in many branches of computer science, such as computer graphics and human-computer interface, outperforming algorithms that were considered state of the art, however, there are only fleeting efforts to translate this into virtual reality. In this thesis, we seek to apply and take advantage of deep learning models to two different content production pipeline areas embracing the following subjects of interest: advanced methods for user interaction and visual quality assessment. First, we focus on 3D sketching to retrieve models from an extensive database of complex geometries and textures, while the user is immersed in a virtual environment. We explore both 2D and 3D strokes as tools for model retrieval in VR. Therefore, we implement a novel system for improving accuracy in searching for a 3D model. We contribute an efficient method to describe models through 3D sketch via an iterative descriptor generation, focusing both on accuracy and user experience. To evaluate it, we design a user study to compare different interactions for sketch generation. Second, we explore the combination of sketch input and vocal description to correct and fine-tune the search for 3D models in a database containing fine-grained variation. We analyse sketch and speech queries, identifying a way to incorporate both of them into our system's interaction loop. Third, in the context of the visual content production pipeline, we present a detailed study of visual metrics. We propose a novel method for detecting rendering-based artefacts in images. It exploits analogous deep learning algorithms used when extracting features from sketches

    Exploring Robot Teleoperation in Virtual Reality

    Get PDF
    This thesis presents research on VR-based robot teleoperation with a focus on remote environment visualisation in virtual reality, the effects of remote environment reconstruction scale in virtual reality on the human-operator's ability to control the robot and human-operator's visual attention patterns when teleoperating a robot from virtual reality. A VR-based robot teleoperation framework was developed, it is compatible with various robotic systems and cameras, allowing for teleoperation and supervised control with any ROS-compatible robot and visualisation of the environment through any ROS-compatible RGB and RGBD cameras. The framework includes mapping, segmentation, tactile exploration, and non-physically demanding VR interface navigation and controls through any Unity-compatible VR headset and controllers or haptic devices. Point clouds are a common way to visualise remote environments in 3D, but they often have distortions and occlusions, making it difficult to accurately represent objects' textures. This can lead to poor decision-making during teleoperation if objects are inaccurately represented in the VR reconstruction. A study using an end-effector-mounted RGBD camera with OctoMap mapping of the remote environment was conducted to explore the remote environment with fewer point cloud distortions and occlusions while using a relatively small bandwidth. Additionally, a tactile exploration study proposed a novel method for visually presenting information about objects' materials in the VR interface, to improve the operator's decision-making and address the challenges of point cloud visualisation. Two studies have been conducted to understand the effect of virtual world dynamic scaling on teleoperation flow. The first study investigated the use of rate mode control with constant and variable mapping of the operator's joystick position to the speed (rate) of the robot's end-effector, depending on the virtual world scale. The results showed that variable mapping allowed participants to teleoperate the robot more effectively but at the cost of increased perceived workload. The second study compared how operators used a virtual world scale in supervised control, comparing the virtual world scale of participants at the beginning and end of a 3-day experiment. The results showed that as operators got better at the task they as a group used a different virtual world scale, and participants' prior video gaming experience also affected the virtual world scale chosen by operators. Similarly, the human-operator's visual attention study has investigated how their visual attention changes as they become better at teleoperating a robot using the framework. The results revealed the most important objects in the VR reconstructed remote environment as indicated by operators' visual attention patterns as well as their visual priorities shifts as they got better at teleoperating the robot. The study also demonstrated that operators’ prior video gaming experience affects their ability to teleoperate the robot and their visual attention behaviours

    Capturing Hand-Object Interaction and Reconstruction of Manipulated Objects

    Get PDF
    Hand motion capture with an RGB-D sensor gained recently a lot of research attention, however, even most recent approaches focus on the case of a single isolated hand. We focus instead on hands that interact with other hands or with a rigid or articulated object. Our framework successfully captures motion in such scenarios by combining a generative model with discriminatively trained salient points, collision detection and physics simulation to achieve a low tracking error with physically plausible poses. All components are unified in a single objective function that can be optimized with standard optimization techniques. We initially assume a-priori knowledge of the object’s shape and skeleton. In case of unknown object shape there are existing 3d reconstruction methods that capitalize on distinctive geometric or texture features. These methods though fail for textureless and highly symmetric objects like household articles, mechanical parts or toys. We show that extracting 3d hand motion for in-hand scanning e↵ectively facilitates the reconstruction of such objects and we fuse the rich additional information of hands into a 3d reconstruction pipeline. Finally, although shape reconstruction is enough for rigid objects, there is a lack of tools that build rigged models of articulated objects that deform realistically using RGB-D data. We propose a method that creates a fully rigged model consisting of a watertight mesh, embedded skeleton and skinning weights by employing a combination of deformable mesh tracking, motion segmentation based on spectral clustering and skeletonization based on mean curvature flow

    Agents for educational games and simulations

    Get PDF
    This book consists mainly of revised papers that were presented at the Agents for Educational Games and Simulation (AEGS) workshop held on May 2, 2011, as part of the Autonomous Agents and MultiAgent Systems (AAMAS) conference in Taipei, Taiwan. The 12 full papers presented were carefully reviewed and selected from various submissions. The papers are organized topical sections on middleware applications, dialogues and learning, adaption and convergence, and agent applications

    Emerging Approaches for THz Array Imaging: A Tutorial Review and Software Tool

    Full text link
    Accelerated by the increasing attention drawn by 5G, 6G, and Internet of Things applications, communication and sensing technologies have rapidly evolved from millimeter-wave (mmWave) to terahertz (THz) in recent years. Enabled by significant advancements in electromagnetic (EM) hardware, mmWave and THz frequency regimes spanning 30 GHz to 300 GHz and 300 GHz to 3000 GHz, respectively, can be employed for a host of applications. The main feature of THz systems is high-bandwidth transmission, enabling ultra-high-resolution imaging and high-throughput communications; however, challenges in both the hardware and algorithmic arenas remain for the ubiquitous adoption of THz technology. Spectra comprising mmWave and THz frequencies are well-suited for synthetic aperture radar (SAR) imaging at sub-millimeter resolutions for a wide spectrum of tasks like material characterization and nondestructive testing (NDT). This article provides a tutorial review of systems and algorithms for THz SAR in the near-field with an emphasis on emerging algorithms that combine signal processing and machine learning techniques. As part of this study, an overview of classical and data-driven THz SAR algorithms is provided, focusing on object detection for security applications and SAR image super-resolution. We also discuss relevant issues, challenges, and future research directions for emerging algorithms and THz SAR, including standardization of system and algorithm benchmarking, adoption of state-of-the-art deep learning techniques, signal processing-optimized machine learning, and hybrid data-driven signal processing algorithms...Comment: Submitted to Proceedings of IEE

    A neural integrator model for planning and value-based decision making of a robotics assistant

    Get PDF
    Modern manufacturing and assembly environments are characterized by a high variability in the built process which challenges human–robot cooperation. To reduce the cognitive workload of the operator, the robot should not only be able to learn from experience but also to plan and decide autonomously. Here, we present an approach based on Dynamic Neural Fields that apply brain-like computations to endow a robot with these cognitive functions. A neural integrator is used to model the gradual accumulation of sensory and other evidence as time-varying persistent activity of neural populations. The decision to act is modeled by a competitive dynamics between neural populations linked to different motor behaviors. They receive the persistent activation pattern of the integrators as input. In the first experiment, a robot learns rapidly by observation the sequential order of object transfers between an assistant and an operator to subsequently substitute the assistant in the joint task. The results show that the robot is able to proactively plan the series of handovers in the correct order. In the second experiment, a mobile robot searches at two different workbenches for a specific object to deliver it to an operator. The object may appear at the two locations in a certain time period with independent probabilities unknown to the robot. The trial-by-trial decision under uncertainty is biased by the accumulated evidence of past successes and choices. The choice behavior over a longer period reveals that the robot achieves a high search efficiency in stationary as well as dynamic environments.The work received financial support from FCT through the PhD fellowships PD/BD/128183/2016 and SFRH/BD/124912/2016, the project “Neurofield” (PTDC/MAT-APL/31393/2017) and the research centre CMAT within the project UID/MAT/00013/2013

    Personalized face and gesture analysis using hierarchical neural networks

    Full text link
    The video-based computational analyses of human face and gesture signals encompass a myriad of challenging research problems involving computer vision, machine learning and human computer interaction. In this thesis, we focus on the following challenges: a) the classification of hand and body gestures along with the temporal localization of their occurrence in a continuous stream, b) the recognition of facial expressivity levels in people with Parkinson's Disease using multimodal feature representations, c) the prediction of student learning outcomes in intelligent tutoring systems using affect signals, and d) the personalization of machine learning models, which can adapt to subject and group-specific nuances in facial and gestural behavior. Specifically, we first conduct a quantitative comparison of two approaches to the problem of segmenting and classifying gestures on two benchmark gesture datasets: a method that simultaneously segments and classifies gestures versus a cascaded method that performs the tasks sequentially. Second, we introduce a framework that computationally predicts an accurate score for facial expressivity and validate it on a dataset of interview videos of people with Parkinson's disease. Third, based on a unique dataset of videos of students interacting with MathSpring, an intelligent tutoring system, collected by our collaborative research team, we build models to predict learning outcomes from their facial affect signals. Finally, we propose a novel solution to a relatively unexplored area in automatic face and gesture analysis research: personalization of models to individuals and groups. We develop hierarchical Bayesian neural networks to overcome the challenges posed by group or subject-specific variations in face and gesture signals. We successfully validate our formulation on the problems of personalized subject-specific gesture classification, context-specific facial expressivity recognition and student-specific learning outcome prediction. We demonstrate the flexibility of our hierarchical framework by validating the utility of both fully connected and recurrent neural architectures
    • …
    corecore