6,549 research outputs found

    Survey of image-based representations and compression techniques

    Get PDF
    In this paper, we survey the techniques for image-based rendering (IBR) and for compressing image-based representations. Unlike traditional three-dimensional (3-D) computer graphics, in which 3-D geometry of the scene is known, IBR techniques render novel views directly from input images. IBR techniques can be classified into three categories according to how much geometric information is used: rendering without geometry, rendering with implicit geometry (i.e., correspondence), and rendering with explicit geometry (either with approximate or accurate geometry). We discuss the characteristics of these categories and their representative techniques. IBR techniques demonstrate a surprising diverse range in their extent of use of images and geometry in representing 3-D scenes. We explore the issues in trading off the use of images and geometry by revisiting plenoptic-sampling analysis and the notions of view dependency and geometric proxies. Finally, we highlight compression techniques specifically designed for image-based representations. Such compression techniques are important in making IBR techniques practical.published_or_final_versio

    Livrable D2.2 of the PERSEE project : Analyse/Synthese de Texture

    Get PDF
    Livrable D2.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D2.2 du projet. Son titre : Analyse/Synthese de Textur

    Livrable D5.2 of the PERSEE project : 2D/3D Codec architecture

    Get PDF
    Livrable D5.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D5.2 du projet. Son titre : 2D/3D Codec architectur

    Navigation domain representation for interactive multiview imaging

    Full text link
    Enabling users to interactively navigate through different viewpoints of a static scene is a new interesting functionality in 3D streaming systems. While it opens exciting perspectives towards rich multimedia applications, it requires the design of novel representations and coding techniques in order to solve the new challenges imposed by interactive navigation. Interactivity clearly brings new design constraints: the encoder is unaware of the exact decoding process, while the decoder has to reconstruct information from incomplete subsets of data since the server can generally not transmit images for all possible viewpoints due to resource constrains. In this paper, we propose a novel multiview data representation that permits to satisfy bandwidth and storage constraints in an interactive multiview streaming system. In particular, we partition the multiview navigation domain into segments, each of which is described by a reference image and some auxiliary information. The auxiliary information enables the client to recreate any viewpoint in the navigation segment via view synthesis. The decoder is then able to navigate freely in the segment without further data request to the server; it requests additional data only when it moves to a different segment. We discuss the benefits of this novel representation in interactive navigation systems and further propose a method to optimize the partitioning of the navigation domain into independent segments, under bandwidth and storage constraints. Experimental results confirm the potential of the proposed representation; namely, our system leads to similar compression performance as classical inter-view coding, while it provides the high level of flexibility that is required for interactive streaming. Hence, our new framework represents a promising solution for 3D data representation in novel interactive multimedia services

    Neural radiance fields in the industrial and robotics domain: applications, research opportunities and use cases

    Full text link
    The proliferation of technologies, such as extended reality (XR), has increased the demand for high-quality three-dimensional (3D) graphical representations. Industrial 3D applications encompass computer-aided design (CAD), finite element analysis (FEA), scanning, and robotics. However, current methods employed for industrial 3D representations suffer from high implementation costs and reliance on manual human input for accurate 3D modeling. To address these challenges, neural radiance fields (NeRFs) have emerged as a promising approach for learning 3D scene representations based on provided training 2D images. Despite a growing interest in NeRFs, their potential applications in various industrial subdomains are still unexplored. In this paper, we deliver a comprehensive examination of NeRF industrial applications while also providing direction for future research endeavors. We also present a series of proof-of-concept experiments that demonstrate the potential of NeRFs in the industrial domain. These experiments include NeRF-based video compression techniques and using NeRFs for 3D motion estimation in the context of collision avoidance. In the video compression experiment, our results show compression savings up to 48\% and 74\% for resolutions of 1920x1080 and 300x168, respectively. The motion estimation experiment used a 3D animation of a robotic arm to train Dynamic-NeRF (D-NeRF) and achieved an average peak signal-to-noise ratio (PSNR) of disparity map with the value of 23 dB and an structural similarity index measure (SSIM) 0.97

    Three-dimensional range data compression using computer graphics rendering pipeline

    Get PDF
    This paper presents the idea of naturally encoding three-dimensional (3D) range data into regular two-dimensional (2D) images utilizing computer graphics rendering pipeline. The computer graphics pipeline provides a means to sample 3D geometry data into regular 2D images, and also to retrieve the depth information for each sampled pixel. The depth information for each pixel is further encoded into red, green, and blue color channels of regular 2D images. The 2D images can further be compressed with existing 2D image compression techniques. By this novel means, 3D geometry data obtained by 3D range scanners can be instantaneously compressed into 2D images, providing a novel way of storing 3D range data into its 2D counterparts. We will present experimental results to verify the performance of this proposed technique

    3D video compression with the H.264 codec

    Get PDF
    Advances in 3D scanning have enabled the real-time capture of high-resolution 3D video. With these advances comes the challenge of streaming and storing this 3D video in a manner that can be quickly and effectively used. To do this, different approaches have been taken, a popular one being image based encoding, which projects from 3D into 2D, uses 2D compression techniques, and then decodes from 2D back to 3D. One such technique that does this is the Holovideo technique, which has been shown to yield great compression ratios. However, the technique was originally designed for the RGB color space and until recently could not be used with codecs that use the YUV color space such as the H.264 codec. This paper addresses this issue, generalizing Holovideo to the YUV color space, allowing it to leverage the H.264 codec. Compression ratios of over 352 : 1 have been achieved when comparing it to the OBJ file format, with mean squared error as low as .204% making it a viable solution for 3D video compression

    Portal-s: High-resolution real-time 3D video telepresence

    Get PDF
    The goal of telepresence is to allow a person to feel as if they are present in a location other than their true location; a common application of telepresence is video conferencing in which live video of a user is transmitted to a remote location for viewing. In conventional two-dimensional (2D) video conferencing, loss of correct eye gaze commonly occurs, due to a disparity between the capture and display optical axes. Newer systems are being developed which allow for three-dimensional (3D) video conferencing, circumventing issues with this disparity, but new challenges are arising in the capture, delivery, and redisplay of 3D contents across existing infrastructure. To address these challenges, a novel system is proposed which allows for 3D video conferencing across existing networks while delivering full resolution 3D video and establishing correct eye gaze. During the development of Portal-s, many innovations to the field of 3D scanning and its applications were made; specifically, this dissertation research has achieved the following innovations: a technique to realize 3D video processing entirely on a graphics processing unit (GPU), methods to compress 3D videos on a GPU, and combination of the aforementioned innovations with a special holographic display hardware system to enable the novel 3D telepresence system entitled Portal-s. The first challenge this dissertation addresses is the cost of real-time 3D scanning technology, both from a monetary and computing power perspective. New advancements in 3D scanning and computation technology are continuing to increase, simplifying the acquisition and display of 3D data. These advancements are allowing users new methods of interaction and analysis of the 3D world around them. Although the acquisition of static 3D geometry is becoming easy, the same cannot be said of dynamic geometry, since all aspects of the 3D processing pipeline, capture, processing, and display, must be realized in real-time simultaneously. Conventional approaches to solve these problems utilize workstation computers with powerful central processing units (CPUs) and GPUs to accomplish the large amounts of processing power required for a single 3D frame. A challenge arises when trying to realize real-time 3D scanning on commodity hardware such as a laptop computer. To address the cost of a real-time 3D scanning system, an entirely parallel 3D data processing pipeline that makes use of a multi-frequency phase-shifting technique is presented. This novel processing pipeline can achieve simultaneous 3D data capturing, processing, and display at 30 frames per second (fps) on a laptop computer. By implementing the pipeline within the OpenGL Shading Language (GLSL), nearly any modern computer with a dedicated graphics device can run the pipeline. Making use of multiple threads sharing GPU resources and direct memory access transfers, high frame rates on low compute power devices can be achieved. Although these advancements allow for low compute power devices such as a laptop to achieve real-time 3D scanning, this technique is not without challenges. The main challenge being selecting frequencies that allow for high quality phase, yet do not include phase jumps in equivalent frequencies. To address this issue, a new modified multi-frequency phase shifting technique was developed that allows phase jumps to be introduced in equivalent frequencies yet unwrapped in parallel, increasing phase quality and reducing reconstruction error. Utilizing these techniques, a real-time 3D scanner was developed that captures 3D geometry at 30 fps with a root mean square error (RMSE) of 0:00081 mm for a measurement area of 100 mm X 75 mm at a resolution of 800 X 600 on a laptop computer. With the above mentioned pipeline the CPU is nearly idle, freeing it to perform additional tasks such as image processing and analysis. The second challenge this dissertation addresses is associated with delivering huge amounts of 3D video data in real-time across existing network infrastructure. As the speed of 3D scanning continues to increase, and real-time scanning is achieved on low compute power devices, a way of compressing the massive amounts of 3D data being generated is needed. At a scan resolution of 800 X 600, streaming a 3D point cloud at 30 frames per second (FPS) would require a throughput of over 1.3 Gbps. This amount of throughput is large for a PCIe bus, and too much for most commodity network cards. Conventional approaches involve serializing the data into a compressible state such as a polygon file format (PLY) or Wavefront object (OBJ) file. While this technique works well for structured 3D geometry, such as that created with computer aided drafting (CAD) or 3D modeling software, this does not hold true for 3D scanned data as it is inherently unstructured. A challenge arises when trying to compress this unstructured 3D information in such a way that it can be easily utilized with existing infrastructure. To address the need for real-time 3D video compression, new techniques entitled Holoimage and Holovideo are presented, which have the ability to compress, respectively, 3D geometry and 3D video into 2D counterparts and apply both lossless and lossy encoding. Similar to the aforementioned 3D scanning pipeline, these techniques make use of a completely parallel pipeline for encoding and decoding; this affords high speed processing on the GPU, as well as compression before streaming the data over the PCIe bus. Once in the compressed 2D state, the information can be streamed and saved until the 3D information is needed, at which point 3D geometry can be reconstructed while maintaining a low amount of reconstruction error. Further enhancements of the technique have allowed additional information, such as texture information, to be encoded by reducing the bit rate of the data through image dithering. This allows both the 3D video and associated 2D texture information to be interlaced and compressed into 2D video, synchronizing the streams automatically. The third challenge this dissertation addresses is achieving correct eye gaze in video conferencing. In 2D video conferencing, loss of correct eye gaze commonly occurs, due to a disparity between the capture and display optical axes. Conventional approaches to mitigate this issue involve either reducing the angle of disparity between the axes by increasing the distance of the user to the system, or merging the axes through the use of beam splitters. Newer approaches to this issue make use of 3D capture and display technology, as the angle of disparity can be corrected through transforms of the 3D data. Challenges arise when trying to create such novel systems, as all aspects of the pipeline, capture, transmission, and redisplay must be simultaneously achieved in real-time with the massive amounts of 3D data. Finally, the Portal-s system is presented, which is an integration of all the aforementioned technologies into a holistic software and hardware system that enables real-time 3D video conferencing with correct mutual eye gaze. To overcome the loss of eye contact in conventional video conferencing, Portal-s makes use of dual structured-light scanners that capture through the same optical axis as the display. The real-time 3D video frames generated on the GPU are then compressed using the Holovideo technique. This allows the 3D video to be streamed across a conventional network or the Internet, and redisplayed at a remote node for another user on the Holographic display glass. Utilizing two connected Portal-s nodes, users of the systems can engage in 3D video conferencing with natural eye gaze established. In conclusion, this dissertation research substantially advances the field of real-time 3D scanning and its applications. Contributions of this research span into both academic and industrial practices, where the use of this information has allowed users new methods of interaction and analysis of the 3D world around them
    • …
    corecore