2,565 research outputs found

    Volumetric kombat:a case study on developing a VR game with Volumetric Video

    Get PDF
    This paper presents a case study on the development of a Virtual Reality (VR) game using Volumetric Video (VV) for character animation. We delve into the potential of VV, a technology that fuses video and depth sensor data, which has progressively matured since its initial introduction in 1995. Despite its potential to deliver unmatched realism and dynamic 4D sequences, VV applications are predominantly used in non-interactive scenarios. We explore the barriers to entry such as high costs associated with large-scale VV capture systems and the lack of tools optimized for VV in modern game engines. By actively using VV to develop a VR game, we examine and overcome these constraints developing a set of tools that address these challenges. Drawing lessons from past games, we propose an open-source data processing workflow for future VV games. This case study provides insights into the opportunities and challenges of VV in game development and contributes towards making VV more accessible for creators and researchers

    Bridging formal methods and machine learning with model checking and global optimisation

    Get PDF
    Formal methods and machine learning are two research fields with drastically different foundations and philosophies. Formal methods utilise mathematically rigorous techniques for software and hardware systems' specification, development and verification. Machine learning focuses on pragmatic approaches to gradually improve a parameterised model by observing a training data set. While historically, the two fields lack communication, this trend has changed in the past few years with an outburst of research interest in the robustness verification of neural networks. This paper will briefly review these works, and focus on the urgent need for broader and more in-depth communication between the two fields, with the ultimate goal of developing learning-enabled systems with excellent performance and acceptable safety and security. We present a specification language, MLS2, and show that it can express a set of known safety and security properties, including generalisation, uncertainty, robustness, data poisoning, backdoor, model stealing, membership inference, model inversion, interpretability, and fairness. To verify MLS2 properties, we promote the global optimisation-based methods, which have provable guarantees on the convergence to the optimal solution. Many of them have theoretical bounds on the gap between current solutions and the optimal solution

    A deep learning-enhanced digital twin framework for improving safety and reliability in human-robot collaborative manufacturing

    Get PDF
    In Industry 5.0, Digital Twins bring in flexibility and efficiency for smart manufacturing. Recently, the success of artificial intelligence techniques such as deep learning has led to their adoption in manufacturing and especially in human–robot collaboration. Collaborative manufacturing tasks involving human operators and robots pose significant safety and reliability concerns. In response to these concerns, a deep learning-enhanced Digital Twin framework is introduced through which human operators and robots can be detected and their actions can be classified during the manufacturing process, enabling autonomous decision making by the robot control system. Developed using Unreal Engine 4, our Digital Twin framework complies with the Robotics Operating System specification, and supports synchronous control and communication between the Digital Twin and the physical system. In our framework, a fully-supervised detector based on a faster region-based convolutional neural network is firstly trained on synthetic data generated by the Digital Twin, and then tested on the physical system to demonstrate the effectiveness of the proposed Digital Twin-based framework. To ensure safety and reliability, a semi-supervised detector is further designed to bridge the gap between the twin system and the physical system, and improved performance is achieved by the semi-supervised detector compared to the fully-supervised detector that is simply trained on either synthetic data or real data. The evaluation of the framework in multiple scenarios in which human operators collaborate with a Universal Robot 10 shows that it can accurately detect the human and robot, and classify their actions under a variety of conditions. The data from this evaluation have been made publicly available, and can be widely used for research and operational purposes. Additionally, a semi-automated annotation tool from the Digital Twin framework is published to benefit the collaborative robotics community

    Development and user evaluation of an immersive light field system for space exploration

    Get PDF
    This paper presents the developmental work and user evaluation results of an immersive light field system built for the European Space Agency’s (ESA) project called “Light field-enhanced immersive teleoperation system for space station and ground control.” The main aim of the project is to evaluate the usefulness and feasibility of light fields in space exploration, and compare it to other types of immersive content, such as 360° photos and point clouds. In the course of the project, light field data were captured with a robotically controlled camera and processed into a suitable format. The light field authoring process was performed, and a light field renderer capable of displaying immersive panoramic or planar light fields on modern virtual reality hardware was developed. The planetary surface points of interest (POIs) were modeled in the laboratory environment, and three distinct test use cases utilizing them were developed. The user evaluation was held in the European Astronaut Centre (EAC) in the summer of 2023, involving prospective end-users of various backgrounds. During the evaluation, questionnaires, interviews, and observation were used for data collection. At the end of the paper, the evaluation results, as well as a discussion about lessons learned and possible improvements to the light field system, are presented

    A Deep Learning-enhanced Digital Twin Framework for Improving Safety and Reliability in Human-Robot Collaborative Manufacturing

    Get PDF
    In Industry 5.0, Digital Twins bring in flexibility and efficiency for smart manufacturing. Recently, the success of artificial intelligence techniques such as deep learning has led to their adoption in manufacturing and especially in human–robot collaboration. Collaborative manufacturing tasks involving human operators and robots pose significant safety and reliability concerns. In response to these concerns, a deep learning-enhanced Digital Twin framework is introduced through which human operators and robots can be detected and their actions can be classified during the manufacturing process, enabling autonomous decision making by the robot control system. Developed using Unreal Engine 4, our Digital Twin framework complies with the Robotics Operating System specification, and supports synchronous control and communication between the Digital Twin and the physical system. In our framework, a fully-supervised detector based on a faster region-based convolutional neural network is firstly trained on synthetic data generated by the Digital Twin, and then tested on the physical system to demonstrate the effectiveness of the proposed Digital Twin-based framework. To ensure safety and reliability, a semi-supervised detector is further designed to bridge the gap between the twin system and the physical system, and improved performance is achieved by the semi-supervised detector compared to the fully-supervised detector that is simply trained on either synthetic data or real data. The evaluation of the framework in multiple scenarios in which human operators collaborate with a Universal Robot 10 shows that it can accurately detect the human and robot, and classify their actions under a variety of conditions. The data from this evaluation have been made publicly available, and can be widely used for research and operational purposes. Additionally, a semi-automated annotation tool from the Digital Twin framework is published to benefit the collaborative robotics community

    Fast Learning Radiance Fields by Shooting Much Fewer Rays

    Full text link
    Learning radiance fields has shown remarkable results for novel view synthesis. The learning procedure usually costs lots of time, which motivates the latest methods to speed up the learning procedure by learning without neural networks or using more efficient data structures. However, these specially designed approaches do not work for most of radiance fields based methods. To resolve this issue, we introduce a general strategy to speed up the learning procedure for almost all radiance fields based methods. Our key idea is to reduce the redundancy by shooting much fewer rays in the multi-view volume rendering procedure which is the base for almost all radiance fields based methods. We find that shooting rays at pixels with dramatic color change not only significantly reduces the training burden but also barely affects the accuracy of the learned radiance fields. In addition, we also adaptively subdivide each view into a quadtree according to the average rendering error in each node in the tree, which makes us dynamically shoot more rays in more complex regions with larger rendering error. We evaluate our method with different radiance fields based methods under the widely used benchmarks. Experimental results show that our method achieves comparable accuracy to the state-of-the-art with much faster training.Comment: Accepted by lEEE Transactions on lmage Processing 2023. Project Page: https://zparquet.github.io/Fast-Learning . Code: https://github.com/zParquet/Fast-Learnin

    DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models

    Full text link
    The recent progress in diffusion-based text-to-image generation models has significantly expanded generative capabilities via conditioning the text descriptions. However, since relying solely on text prompts is still restrictive for fine-grained customization, we aim to extend the boundaries of conditional generation to incorporate diverse types of modalities, e.g., sketch, box, and style embedding, simultaneously. We thus design a multimodal text-to-image diffusion model, coined as DiffBlender, that achieves the aforementioned goal in a single model by training only a few small hypernetworks. DiffBlender facilitates a convenient scaling of input modalities, without altering the parameters of an existing large-scale generative model to retain its well-established knowledge. Furthermore, our study sets new standards for multimodal generation by conducting quantitative and qualitative comparisons with existing approaches. By diversifying the channels of conditioning modalities, DiffBlender faithfully reflects the provided information or, in its absence, creates imaginative generation.Comment: 18 pages, 16 figures, and 3 table

    Scaling up GANs for Text-to-Image Synthesis

    Full text link
    The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL-E 2, auto-regressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that na\"Ively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel pixels in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.Comment: CVPR 2023. Project webpage at https://mingukkang.github.io/GigaGAN

    Scalable Exploration of Complex Objects and Environments Beyond Plain Visual Replication​

    Get PDF
    Digital multimedia content and presentation means are rapidly increasing their sophistication and are now capable of describing detailed representations of the physical world. 3D exploration experiences allow people to appreciate, understand and interact with intrinsically virtual objects. Communicating information on objects requires the ability to explore them under different angles, as well as to mix highly photorealistic or illustrative presentations of the object themselves with additional data that provides additional insights on these objects, typically represented in the form of annotations. Effectively providing these capabilities requires the solution of important problems in visualization and user interaction. In this thesis, I studied these problems in the cultural heritage-computing-domain, focusing on the very common and important special case of mostly planar, but visually, geometrically, and semantically rich objects. These could be generally roughly flat objects with a standard frontal viewing direction (e.g., paintings, inscriptions, bas-reliefs), as well as visualizations of fully 3D objects from a particular point of views (e.g., canonical views of buildings or statues). Selecting a precise application domain and a specific presentation mode allowed me to concentrate on the well defined use-case of the exploration of annotated relightable stratigraphic models (in particular, for local and remote museum presentation). My main results and contributions to the state of the art have been a novel technique for interactively controlling visualization lenses while automatically maintaining good focus-and-context parameters, a novel approach for avoiding clutter in an annotated model and for guiding users towards interesting areas, and a method for structuring audio-visual object annotations into a graph and for using that graph to improve guidance and support storytelling and automated tours. We demonstrated the effectiveness and potential of our techniques by performing interactive exploration sessions on various screen sizes and types ranging from desktop devices to large-screen displays for a walk-up-and-use museum installation. KEYWORDS - Computer Graphics, Human-Computer Interaction, Interactive Lenses, Focus-and-Context, Annotated Models, Cultural Heritage Computing

    3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation

    Full text link
    Text-guided 3D object generation aims to generate 3D objects described by user-defined captions, which paves a flexible way to visualize what we imagined. Although some works have been devoted to solving this challenging task, these works either utilize some explicit 3D representations (e.g., mesh), which lack texture and require post-processing for rendering photo-realistic views; or require individual time-consuming optimization for every single case. Here, we make the first attempt to achieve generic text-guided cross-category 3D object generation via a new 3D-TOGO model, which integrates a text-to-views generation module and a views-to-3D generation module. The text-to-views generation module is designed to generate different views of the target 3D object given an input caption. prior-guidance, caption-guidance and view contrastive learning are proposed for achieving better view-consistency and caption similarity. Meanwhile, a pixelNeRF model is adopted for the views-to-3D generation module to obtain the implicit 3D neural representation from the previously-generated views. Our 3D-TOGO model generates 3D objects in the form of the neural radiance field with good texture and requires no time-cost optimization for every single caption. Besides, 3D-TOGO can control the category, color and shape of generated 3D objects with the input caption. Extensive experiments on the largest 3D object dataset (i.e., ABO) are conducted to verify that 3D-TOGO can better generate high-quality 3D objects according to the input captions across 98 different categories, in terms of PSNR, SSIM, LPIPS and CLIP-score, compared with text-NeRF and Dreamfields
    • …
    corecore