6,549 research outputs found
Survey of image-based representations and compression techniques
In this paper, we survey the techniques for image-based rendering (IBR) and for compressing image-based representations. Unlike traditional three-dimensional (3-D) computer graphics, in which 3-D geometry of the scene is known, IBR techniques render novel views directly from input images. IBR techniques can be classified into three categories according to how much geometric information is used: rendering without geometry, rendering with implicit geometry (i.e., correspondence), and rendering with explicit geometry (either with approximate or accurate geometry). We discuss the characteristics of these categories and their representative techniques. IBR techniques demonstrate a surprising diverse range in their extent of use of images and geometry in representing 3-D scenes. We explore the issues in trading off the use of images and geometry by revisiting plenoptic-sampling analysis and the notions of view dependency and geometric proxies. Finally, we highlight compression techniques specifically designed for image-based representations. Such compression techniques are important in making IBR techniques practical.published_or_final_versio
Livrable D2.2 of the PERSEE project : Analyse/Synthese de Texture
Livrable D2.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D2.2 du projet. Son titre : Analyse/Synthese de Textur
Livrable D5.2 of the PERSEE project : 2D/3D Codec architecture
Livrable D5.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D5.2 du projet. Son titre : 2D/3D Codec architectur
Navigation domain representation for interactive multiview imaging
Enabling users to interactively navigate through different viewpoints of a
static scene is a new interesting functionality in 3D streaming systems. While
it opens exciting perspectives towards rich multimedia applications, it
requires the design of novel representations and coding techniques in order to
solve the new challenges imposed by interactive navigation. Interactivity
clearly brings new design constraints: the encoder is unaware of the exact
decoding process, while the decoder has to reconstruct information from
incomplete subsets of data since the server can generally not transmit images
for all possible viewpoints due to resource constrains. In this paper, we
propose a novel multiview data representation that permits to satisfy bandwidth
and storage constraints in an interactive multiview streaming system. In
particular, we partition the multiview navigation domain into segments, each of
which is described by a reference image and some auxiliary information. The
auxiliary information enables the client to recreate any viewpoint in the
navigation segment via view synthesis. The decoder is then able to navigate
freely in the segment without further data request to the server; it requests
additional data only when it moves to a different segment. We discuss the
benefits of this novel representation in interactive navigation systems and
further propose a method to optimize the partitioning of the navigation domain
into independent segments, under bandwidth and storage constraints.
Experimental results confirm the potential of the proposed representation;
namely, our system leads to similar compression performance as classical
inter-view coding, while it provides the high level of flexibility that is
required for interactive streaming. Hence, our new framework represents a
promising solution for 3D data representation in novel interactive multimedia
services
Neural radiance fields in the industrial and robotics domain: applications, research opportunities and use cases
The proliferation of technologies, such as extended reality (XR), has
increased the demand for high-quality three-dimensional (3D) graphical
representations. Industrial 3D applications encompass computer-aided design
(CAD), finite element analysis (FEA), scanning, and robotics. However, current
methods employed for industrial 3D representations suffer from high
implementation costs and reliance on manual human input for accurate 3D
modeling. To address these challenges, neural radiance fields (NeRFs) have
emerged as a promising approach for learning 3D scene representations based on
provided training 2D images. Despite a growing interest in NeRFs, their
potential applications in various industrial subdomains are still unexplored.
In this paper, we deliver a comprehensive examination of NeRF industrial
applications while also providing direction for future research endeavors. We
also present a series of proof-of-concept experiments that demonstrate the
potential of NeRFs in the industrial domain. These experiments include
NeRF-based video compression techniques and using NeRFs for 3D motion
estimation in the context of collision avoidance. In the video compression
experiment, our results show compression savings up to 48\% and 74\% for
resolutions of 1920x1080 and 300x168, respectively. The motion estimation
experiment used a 3D animation of a robotic arm to train Dynamic-NeRF (D-NeRF)
and achieved an average peak signal-to-noise ratio (PSNR) of disparity map with
the value of 23 dB and an structural similarity index measure (SSIM) 0.97
Three-dimensional range data compression using computer graphics rendering pipeline
This paper presents the idea of naturally encoding three-dimensional (3D) range data into regular two-dimensional (2D) images utilizing computer graphics rendering pipeline. The computer graphics pipeline provides a means to sample 3D geometry data into regular 2D images, and also to retrieve the depth information for each sampled pixel. The depth information for each pixel is further encoded into red, green, and blue color channels of regular 2D images. The 2D images can further be compressed with existing 2D image compression techniques. By this novel means, 3D geometry data obtained by 3D range scanners can be instantaneously compressed into 2D images, providing a novel way of storing 3D range data into its 2D counterparts. We will present experimental results to verify the performance of this proposed technique
3D video compression with the H.264 codec
Advances in 3D scanning have enabled the real-time capture of high-resolution 3D video. With these advances comes the challenge of streaming and storing this 3D video in a manner that can be quickly and effectively used. To do this, different approaches have been taken, a popular one being image based encoding, which projects from 3D into 2D, uses 2D compression techniques, and then decodes from 2D back to 3D. One such technique that does this is the Holovideo technique, which has been shown to yield great compression ratios. However, the technique was originally designed for the RGB color space and until recently could not be used with codecs that use the YUV color space such as the H.264 codec. This paper addresses this issue, generalizing Holovideo to the YUV color space, allowing it to leverage the H.264 codec. Compression ratios of over 352 : 1 have been achieved when comparing it to the OBJ file format, with mean squared error as low as .204% making it a viable solution for 3D video compression
Portal-s: High-resolution real-time 3D video telepresence
The goal of telepresence is to allow a person to feel as if they are present in a location other than their true location; a common application of telepresence is video conferencing in which live video of a user is transmitted to a remote location for viewing. In conventional two-dimensional (2D) video conferencing, loss of correct eye gaze commonly occurs, due to a disparity between the capture and display optical axes. Newer systems are being developed which allow for three-dimensional (3D) video conferencing, circumventing issues with this disparity, but new challenges are arising in the capture, delivery, and redisplay of 3D contents across existing infrastructure. To address these challenges, a novel system is proposed which allows for 3D video conferencing across existing networks while delivering full resolution 3D video and establishing correct eye gaze. During the development of Portal-s, many innovations to the field of 3D scanning and its applications were made; specifically, this dissertation research has achieved the following innovations: a technique to realize 3D video processing entirely on a graphics processing unit (GPU), methods to compress 3D videos on a GPU, and combination of the aforementioned innovations with a special holographic display hardware system to enable the novel 3D telepresence system entitled Portal-s.
The first challenge this dissertation addresses is the cost of real-time 3D scanning technology, both from a monetary and computing power perspective. New advancements in 3D scanning and computation technology are continuing to increase, simplifying the acquisition and display of 3D data. These advancements are allowing users new methods of interaction and analysis of the 3D world around them. Although the acquisition of static 3D geometry is becoming easy, the same cannot be said of dynamic geometry, since all aspects of the 3D processing pipeline, capture, processing, and display, must be realized in real-time simultaneously. Conventional approaches to solve these problems utilize workstation computers with powerful central processing units (CPUs) and GPUs to accomplish the large amounts of processing power required for a single 3D frame. A challenge arises when trying to realize real-time 3D scanning on commodity hardware such as a laptop computer.
To address the cost of a real-time 3D scanning system, an entirely parallel 3D data processing pipeline that makes use of a multi-frequency phase-shifting technique is presented. This novel processing pipeline can achieve simultaneous 3D data capturing, processing, and display at 30 frames per second (fps) on a laptop computer. By implementing the pipeline within the OpenGL Shading Language (GLSL), nearly any modern computer with a dedicated graphics device can run the pipeline. Making use of multiple threads sharing GPU resources and direct memory access transfers, high frame rates on low compute power devices can be achieved. Although these advancements allow for low compute power devices such as a laptop to achieve real-time 3D scanning, this technique is not without challenges. The main challenge being selecting frequencies that allow for high quality phase, yet do not include phase jumps in equivalent frequencies. To address this issue, a new modified multi-frequency phase shifting technique was developed that allows phase jumps to be introduced in equivalent frequencies yet unwrapped in parallel, increasing phase quality and reducing reconstruction error. Utilizing these techniques, a real-time 3D scanner was developed that captures 3D geometry at 30 fps with a root mean square error (RMSE) of 0:00081 mm for a measurement area of 100 mm X 75 mm at a resolution of 800 X 600 on a laptop computer. With the above mentioned pipeline the CPU is nearly idle, freeing it to perform additional tasks such as image processing and analysis.
The second challenge this dissertation addresses is associated with delivering huge amounts of 3D video data in real-time across existing network infrastructure. As the speed of 3D scanning continues to increase, and real-time scanning is achieved on low compute power devices, a way of compressing the massive amounts of 3D data being generated is needed. At a scan resolution of 800 X 600, streaming a 3D point cloud at 30 frames per second (FPS) would require a throughput of over 1.3 Gbps. This amount of throughput is large for a PCIe bus, and too much for most commodity network cards. Conventional approaches involve serializing the data into a compressible state such as a polygon file format (PLY) or Wavefront object (OBJ) file. While this technique works well for structured 3D geometry, such as that created with computer aided drafting (CAD) or 3D modeling software, this does not hold true for 3D scanned data as it is inherently unstructured. A challenge arises when trying to compress this unstructured 3D information in such a way that it can be easily utilized with existing infrastructure.
To address the need for real-time 3D video compression, new techniques entitled Holoimage and Holovideo are presented, which have the ability to compress, respectively, 3D geometry and 3D video into 2D counterparts and apply both lossless and lossy encoding. Similar to the aforementioned 3D scanning pipeline, these techniques make use of a completely parallel pipeline for encoding and decoding; this affords high speed processing on the GPU, as well as compression before streaming the data over the PCIe bus. Once in the compressed 2D state, the information can be streamed and saved until the 3D information is needed, at which point 3D geometry can be reconstructed while maintaining a low amount of reconstruction error. Further enhancements of the technique have allowed additional information, such as texture information, to be encoded by reducing the bit rate of the data through image dithering. This allows both the 3D video and associated 2D texture information to be interlaced and compressed into 2D video, synchronizing the streams automatically.
The third challenge this dissertation addresses is achieving correct eye gaze in video conferencing. In 2D video conferencing, loss of correct eye gaze commonly occurs, due to a disparity between the capture and display optical axes. Conventional approaches to mitigate this issue involve either reducing the angle of disparity between the axes by increasing the distance of the user to the system, or merging the axes through the use of beam splitters. Newer approaches to this issue make use of 3D capture and display technology, as the angle of disparity can be corrected through transforms of the 3D data. Challenges arise when trying to create such novel systems, as all aspects of the pipeline, capture, transmission, and redisplay must be simultaneously achieved in real-time with the massive amounts of 3D data.
Finally, the Portal-s system is presented, which is an integration of all the aforementioned technologies into a holistic software and hardware system that enables real-time 3D video conferencing with correct mutual eye gaze. To overcome the loss of eye contact in conventional video conferencing, Portal-s makes use of dual structured-light scanners that capture through the same optical axis as the display. The real-time 3D video frames generated on the GPU are then compressed using the Holovideo technique. This allows the 3D video to be streamed across a conventional network or the Internet, and redisplayed at a remote node for another user on the Holographic display glass. Utilizing two connected Portal-s nodes, users of the systems can engage in 3D video conferencing with natural eye gaze established. In conclusion, this dissertation research substantially advances the field of real-time 3D scanning and its applications. Contributions of this research span into both academic and industrial practices, where the use of this information has allowed users new methods of interaction and analysis of the 3D world around them
- …