8,093 research outputs found
RLFC: Random Access Light Field Compression using Key Views and Bounded Integer Encoding
We present a new hierarchical compression scheme for encoding light field
images (LFI) that is suitable for interactive rendering. Our method (RLFC)
exploits redundancies in the light field images by constructing a tree
structure. The top level (root) of the tree captures the common high-level
details across the LFI, and other levels (children) of the tree capture
specific low-level details of the LFI. Our decompressing algorithm corresponds
to tree traversal operations and gathers the values stored at different levels
of the tree. Furthermore, we use bounded integer sequence encoding which
provides random access and fast hardware decoding for compressing the blocks of
children of the tree. We have evaluated our method for 4D two-plane
parameterized light fields. The compression rates vary from 0.08 - 2.5 bits per
pixel (bpp), resulting in compression ratios of around 200:1 to 20:1 for a PSNR
quality of 40 to 50 dB. The decompression times for decoding the blocks of LFI
are 1 - 3 microseconds per channel on an NVIDIA GTX-960 and we can render new
views with a resolution of 512X512 at 200 fps. Our overall scheme is simple to
implement and involves only bit manipulations and integer arithmetic
operations.Comment: Accepted for publication at Symposium on Interactive 3D Graphics and
Games (I3D '19
Comparative Analysis of Open Source Frameworks for Machine Learning with Use Case in Single-Threaded and Multi-Threaded Modes
The basic features of some of the most versatile and popular open source
frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are
considered and compared. Their comparative analysis was performed and
conclusions were made as to the advantages and disadvantages of these
platforms. The performance tests for the de facto standard MNIST data set were
carried out on H2O framework for deep learning algorithms designed for CPU and
GPU platforms for single-threaded and multithreaded modes of operation.Comment: 4 pages, 6 figures, 4 tables; XIIth International Scientific and
Technical Conference on Computer Sciences and Information Technologies (CSIT
2017), Lviv, Ukrain
Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization
Global covariance pooling in convolutional neural networks has achieved
impressive improvement over the classical first-order pooling. Recent works
have shown matrix square root normalization plays a central role in achieving
state-of-the-art performance. However, existing methods depend heavily on
eigendecomposition (EIG) or singular value decomposition (SVD), suffering from
inefficient training due to limited support of EIG and SVD on GPU. Towards
addressing this problem, we propose an iterative matrix square root
normalization method for fast end-to-end training of global covariance pooling
networks. At the core of our method is a meta-layer designed with loop-embedded
directed graph structure. The meta-layer consists of three consecutive
nonlinear structured layers, which perform pre-normalization, coupled matrix
iteration and post-compensation, respectively. Our method is much faster than
EIG or SVD based ones, since it involves only matrix multiplications, suitable
for parallel implementation on GPU. Moreover, the proposed network with ResNet
architecture can converge in much less epochs, further accelerating network
training. On large-scale ImageNet, we achieve competitive performance superior
to existing counterparts. By finetuning our models pre-trained on ImageNet, we
establish state-of-the-art results on three challenging fine-grained
benchmarks. The source code and network models will be available at
http://www.peihuali.org/iSQRT-COVComment: Accepted to CVPR 201
GPU-based Streaming for Parallel Level of Detail on Massive Model Rendering
Rendering massive 3D models in real-time has long been recognized as a very challenging problem because of the limited computational power and memory space available in a workstation. Most existing rendering techniques, especially level of detail (LOD) processing, have suffered from their sequential execution natures, and does not scale well with the size of the models. We present a GPU-based progressive mesh simplification approach which enables the interactive rendering of large 3D models with hundreds of millions of triangles. Our work contributes to the massive rendering research in two ways. First, we develop a novel data structure to represent the progressive LOD mesh, and design a parallel mesh simplification algorithm towards GPU architecture. Second, we propose a GPU-based streaming approach which adopt a frame-to-frame coherence scheme in order to minimize the high communication cost between CPU and GPU. Our results show that the parallel mesh simplification algorithm and GPU-based streaming approach significantly improve the overall rendering performance
RSGM: Real-time Raster-Respecting Semi-Global Matching for Power-Constrained Systems
Stereo depth estimation is used for many computer vision applications. Though
many popular methods strive solely for depth quality, for real-time mobile
applications (e.g. prosthetic glasses or micro-UAVs), speed and power
efficiency are equally, if not more, important. Many real-world systems rely on
Semi-Global Matching (SGM) to achieve a good accuracy vs. speed balance, but
power efficiency is hard to achieve with conventional hardware, making the use
of embedded devices such as FPGAs attractive for low-power applications.
However, the full SGM algorithm is ill-suited to deployment on FPGAs, and so
most FPGA variants of it are partial, at the expense of accuracy. In a non-FPGA
context, the accuracy of SGM has been improved by More Global Matching (MGM),
which also helps tackle the streaking artifacts that afflict SGM. In this
paper, we propose a novel, resource-efficient method that is inspired by MGM's
techniques for improving depth quality, but which can be implemented to run in
real time on a low-power FPGA. Through evaluation on multiple datasets (KITTI
and Middlebury), we show that in comparison to other real-time capable stereo
approaches, we can achieve a state-of-the-art balance between accuracy, power
efficiency and speed, making our approach highly desirable for use in real-time
systems with limited power.Comment: Accepted in FPT 2018 as Oral presentation, 8 pages, 6 figures, 4
table
- …