93 research outputs found

    An optimal yet fast pruning algorithm to reduce latency in multiview prediction structures

    Get PDF
    We propose a new algorithm for the design of prediction structures with low delay and limited penalty in the rate-distortion performance for multiview video coding schemes. This algorithm constitutes one of the elements of a framework for the analysis and optimization of delay in multiview coding schemes that is based in graph theory. The objective of the algorithm is to find the best combination of prediction dependencies to prune from a multiview prediction structure, given a number of cuts. Taking into account the properties of the graph-based analysis of the encoding delay, the algorithm is able to find the best prediction dependencies to eliminate from an original prediction structure, while limiting the number of cut combinations to evaluate. We show that this algorithm obtains optimum results in the reduction of the encoding latency with a lower computational complexity than exhaustive search alternatives

    High Performance Multiview Video Coding

    Get PDF
    Following the standardization of the latest video coding standard High Efficiency Video Coding in 2013, in 2014, multiview extension of HEVC (MV-HEVC) was published and brought significantly better compression performance of around 50% for multiview and 3D videos compared to multiple independent single-view HEVC coding. However, the extremely high computational complexity of MV-HEVC demands significant optimization of the encoder. To tackle this problem, this work investigates the possibilities of using modern parallel computing platforms and tools such as single-instruction-multiple-data (SIMD) instructions, multi-core CPU, massively parallel GPU, and computer cluster to significantly enhance the MVC encoder performance. The aforementioned computing tools have very different computing characteristics and misuse of the tools may result in poor performance improvement and sometimes even reduction. To achieve the best possible encoding performance from modern computing tools, different levels of parallelism inside a typical MVC encoder are identified and analyzed. Novel optimization techniques at various levels of abstraction are proposed, non-aggregation massively parallel motion estimation (ME) and disparity estimation (DE) in prediction unit (PU), fractional and bi-directional ME/DE acceleration through SIMD, quantization parameter (QP)-based early termination for coding tree unit (CTU), optimized resource-scheduled wave-front parallel processing for CTU, and workload balanced, cluster-based multiple-view parallel are proposed. The result shows proposed parallel optimization techniques, with insignificant loss to coding efficiency, significantly improves the execution time performance. This , in turn, proves modern parallel computing platforms, with appropriate platform-specific algorithm design, are valuable tools for improving the performance of computationally intensive applications

    Study of Pruning Techniques to Predict Efficient Business Decisions for a Shopping Mall

    Full text link
    The shopping mall domain is a dynamic and unpredictable environment. Traditional techniques such as fundamental and technical analysis can provide investors with some tools for managing their shops and predicting their business growth. However, these techniques cannot discover all the possible relations between business growth and thus, there is a need for a different approach that will provide a deeper kind of analysis. Data mining can be used extensively in the shopping malls and help to increase business growth. Therefore, there is a need to find a perfect solution or an algorithm to work with this kind of environment. So we are going to study few methods of pruning with decision tree. Finally, we prove and make use of the Cost based pruning method to obtain an objective evaluation of the tendency to over prune or under prune observed in each method

    Architectures for Adaptive Low-Power Embedded Multimedia Systems

    Get PDF
    This Ph.D. thesis describes novel hardware/software architectures for adaptive low-power embedded multimedia systems. Novel techniques for run-time adaptive energy management are proposed, such that both HW & SW adapt together to react to the unpredictable scenarios. A complete power-aware H.264 video encoder was developed. Comparison with state-of-the-art demonstrates significant energy savings while meeting the performance constraint and keeping the video quality degradation unnoticeable

    Social Network Data Management

    Get PDF
    With the increasing usage of online social networks and the semantic web's graph structured RDF framework, and the rising adoption of networks in various fields from biology to social science, there is a rapidly growing need for indexing, querying, and analyzing massive graph structured data. Facebook has amassed over 500 million users creating huge volumes of highly connected data. Governments have made RDF datasets containing billions of triples available to the public. In the life sciences, researches have started to connect disparate data sets of research results into one giant network of valuable information. Clearly, networks are becoming increasingly popular and growing rapidly in size, requiring scalable solutions for network data management. This thesis focuses on the following aspects of network data management. We present a hierarchical index structure for external memory storage of network data that aims to maximize data locality. We propose efficient algorithms to answer subgraph matching queries against network databases and discuss effective pruning strategies to improve performance. We show how adaptive cost models can speed up subgraph matching query answering by assigning budgets to index retrieval operations and adjusting the query plan while executing. We develop a cloud oriented social network database, COSI, which handles massive network datasets too large for a single computer by partitioning the data across multiple machines and achieving high performance query answering through asynchronous parallelization and cluster-aware heuristics. Tracking multiple standing queries against a social network database is much faster with our novel multi-view maintenance algorithm, which exploits common substructures between queries. To capture uncertainty inherent in social network querying, we define probabilistic subgraph matching queries over deterministic graph data and propose algorithms to answer them efficiently. Finally, we introduce a general relational machine learning framework and rule-based language, Probabilistic Soft Logic, to learn from and probabilistically reason about social network data and describe applications to information integration and information fusion

    Acquisition, compression and rendering of depth and texture for multi-view video

    Get PDF
    Three-dimensional (3D) video and imaging technologies is an emerging trend in the development of digital video systems, as we presently witness the appearance of 3D displays, coding systems, and 3D camera setups. Three-dimensional multi-view video is typically obtained from a set of synchronized cameras, which are capturing the same scene from different viewpoints. This technique especially enables applications such as freeviewpoint video or 3D-TV. Free-viewpoint video applications provide the feature to interactively select and render a virtual viewpoint of the scene. A 3D experience such as for example in 3D-TV is obtained if the data representation and display enable to distinguish the relief of the scene, i.e., the depth within the scene. With 3D-TV, the depth of the scene can be perceived using a multi-view display that renders simultaneously several views of the same scene. To render these multiple views on a remote display, an efficient transmission, and thus compression of the multi-view video is necessary. However, a major problem when dealing with multiview video is the intrinsically large amount of data to be compressed, decompressed and rendered. We aim at an efficient and flexible multi-view video system, and explore three different aspects. First, we develop an algorithm for acquiring a depth signal from a multi-view setup. Second, we present efficient 3D rendering algorithms for a multi-view signal. Third, we propose coding techniques for 3D multi-view signals, based on the use of an explicit depth signal. This motivates that the thesis is divided in three parts. The first part (Chapter 3) addresses the problem of 3D multi-view video acquisition. Multi-view video acquisition refers to the task of estimating and recording a 3D geometric description of the scene. A 3D description of the scene can be represented by a so-called depth image, which can be estimated by triangulation of the corresponding pixels in the multiple views. Initially, we focus on the problem of depth estimation using two views, and present the basic geometric model that enables the triangulation of corresponding pixels across the views. Next, we review two calculation/optimization strategies for determining corresponding pixels: a local and a one-dimensional optimization strategy. Second, to generalize from the two-view case, we introduce a simple geometric model for estimating the depth using multiple views simultaneously. Based on this geometric model, we propose a new multi-view depth-estimation technique, employing a one-dimensional optimization strategy that (1) reduces the noise level in the estimated depth images and (2) enforces consistent depth images across the views. The second part (Chapter 4) details the problem of multi-view image rendering. Multi-view image rendering refers to the process of generating synthetic images using multiple views. Two different rendering techniques are initially explored: a 3D image warping and a mesh-based rendering technique. Each of these methods has its limitations and suffers from either high computational complexity or low image rendering quality. As a consequence, we present two image-based rendering algorithms that improves the balance on the aforementioned issues. First, we derive an alternative formulation of the relief texture algorithm which was extented to the geometry of multiple views. The proposed technique features two advantages: it avoids rendering artifacts ("holes") in the synthetic image and it is suitable for execution on a standard Graphics Processor Unit (GPU). Second, we propose an inverse mapping rendering technique that allows a simple and accurate re-sampling of synthetic pixels. Experimental comparisons with 3D image warping show an improvement of rendering quality of 3.8 dB for the relief texture mapping and 3.0 dB for the inverse mapping rendering technique. The third part concentrates on the compression problem of multi-view texture and depth video (Chapters 5–7). In Chapter 5, we extend the standard H.264/MPEG-4 AVC video compression algorithm for handling the compression of multi-view video. As opposed to the Multi-view Video Coding (MVC) standard that encodes only the multi-view texture data, the proposed encoder peforms the compression of both the texture and the depth multi-view sequences. The proposed extension is based on exploiting the correlation between the multiple camera views. To this end, two different approaches for predictive coding of views have been investigated: a block-based disparity-compensated prediction technique and a View Synthesis Prediction (VSP) scheme. Whereas VSP relies on an accurate depth image, the block-based disparity-compensated prediction scheme can be performed without any geometry information. Our encoder adaptively selects the most appropriate prediction scheme using a rate-distortion criterion for an optimal prediction-mode selection. We present experimental results for several texture and depth multi-view sequences, yielding a quality improvement of up to 0.6 dB for the texture and 3.2 dB for the depth, when compared to solely performing H.264/MPEG-4AVC disparitycompensated prediction. Additionally, we discuss the trade-off between the random-access to a user-selected view and the coding efficiency. Experimental results illustrating and quantifying this trade-off are provided. In Chapter 6, we focus on the compression of a depth signal. We present a novel depth image coding algorithm which concentrates on the special characteristics of depth images: smooth regions delineated by sharp edges. The algorithm models these smooth regions using parameterized piecewiselinear functions and sharp edges by a straight line, so that it is more efficient than a conventional transform-based encoder. To optimize the quality of the coding system for a given bit rate, a special global rate-distortion optimization balances the rate against the accuracy of the signal representation. For typical bit rates, i.e., between 0.01 and 0.25 bit/pixel, experiments have revealed that the coder outperforms a standard JPEG-2000 encoder by 0.6-3.0 dB. Preliminary results were published in the Proceedings of 26th Symposium on Information Theory in the Benelux. In Chapter 7, we propose a novel joint depth-texture bit-allocation algorithm for the joint compression of texture and depth images. The described algorithm combines the depth and texture Rate-Distortion (R-D) curves, to obtain a single R-D surface that allows the optimization of the joint bit-allocation in relation to the obtained rendering quality. Experimental results show an estimated gain of 1 dB compared to a compression performed without joint bit-allocation optimization. Besides this, our joint R-D model can be readily integrated into an multi-view H.264/MPEG-4 AVC coder because it yields the optimal compression setting with a limited computation effort

    Fusing Multimedia Data Into Dynamic Virtual Environments

    Get PDF
    In spite of the dramatic growth of virtual and augmented reality (VR and AR) technology, content creation for immersive and dynamic virtual environments remains a significant challenge. In this dissertation, we present our research in fusing multimedia data, including text, photos, panoramas, and multi-view videos, to create rich and compelling virtual environments. First, we present Social Street View, which renders geo-tagged social media in its natural geo-spatial context provided by 360° panoramas. Our system takes into account visual saliency and uses maximal Poisson-disc placement with spatiotemporal filters to render social multimedia in an immersive setting. We also present a novel GPU-driven pipeline for saliency computation in 360° panoramas using spherical harmonics (SH). Our spherical residual model can be applied to virtual cinematography in 360° videos. We further present Geollery, a mixed-reality platform to render an interactive mirrored world in real time with three-dimensional (3D) buildings, user-generated content, and geo-tagged social media. Our user study has identified several use cases for these systems, including immersive social storytelling, experiencing the culture, and crowd-sourced tourism. We next present Video Fields, a web-based interactive system to create, calibrate, and render dynamic videos overlaid on 3D scenes. Our system renders dynamic entities from multiple videos, using early and deferred texture sampling. Video Fields can be used for immersive surveillance in virtual environments. Furthermore, we present VRSurus and ARCrypt projects to explore the applications of gestures recognition, haptic feedback, and visual cryptography for virtual and augmented reality. Finally, we present our work on Montage4D, a real-time system for seamlessly fusing multi-view video textures with dynamic meshes. We use geodesics on meshes with view-dependent rendering to mitigate spatial occlusion seams while maintaining temporal consistency. Our experiments show significant enhancement in rendering quality, especially for salient regions such as faces. We believe that Social Street View, Geollery, Video Fields, and Montage4D will greatly facilitate several applications such as virtual tourism, immersive telepresence, and remote education

    Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age

    Get PDF
    Simultaneous Localization and Mapping (SLAM)consists in the concurrent construction of a model of the environment (the map), and the estimation of the state of the robot moving within it. The SLAM community has made astonishing progress over the last 30 years, enabling large-scale real-world applications, and witnessing a steady transition of this technology to industry. We survey the current state of SLAM. We start by presenting what is now the de-facto standard formulation for SLAM. We then review related work, covering a broad set of topics including robustness and scalability in long-term mapping, metric and semantic representations for mapping, theoretical performance guarantees, active SLAM and exploration, and other new frontiers. This paper simultaneously serves as a position paper and tutorial to those who are users of SLAM. By looking at the published research with a critical eye, we delineate open challenges and new research issues, that still deserve careful scientific investigation. The paper also contains the authors' take on two questions that often animate discussions during robotics conferences: Do robots need SLAM? and Is SLAM solved

    Design Efficient Deep Neural Networks with System Optimization

    Get PDF
    The pursuit of enhanced accuracy in Deep Neural Networks (DNNs) has led to increasingly complex model structures, notably in Convolutional Neural Networks (CNNs) and Transformers. While these advancements have propelled the capabilities of intelligent applications, they also introduce significant challenges, primarily an increase in inference latency.This issue is particularly critical in time-sensitive applications, such as self-driving vehicles, where delays could have severe consequences. In addressing these challenges, this dissertation focuses on optimizing DNNs for efficiency with the goal of maintaining or minimally impacting their accuracy. The study is structured into five chapters, each targeting a specific aspect of DNN optimization in the context of CNNs and Transformers. (1) Efficient Model Design for CNNs (2) System Optimization for CNNs (3) Efficient Model Design for Transformers (4) System Optimization for Transformers (5) Model Compression Methods. Throughout those studies, the emphasis is placed not only on the technical advancements in DNN efficiency but also on the broader implications of these improvements. The research highlights how optimizing DNNs can lead to significant benefits in real-world applications, particularly those requiring real-time processing and operating under resource constraints. By advancing the field of DNN efficiency, this work contributes to the development of more sustainable, accessible, and powerful AI technologies, reinforcing the role of DNNs in the future of intelligent systems
    • 

    corecore