29 research outputs found

    Steered mixture-of-experts for light field images and video : representation and coding

    Get PDF
    Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution

    HDTV transmission format conversion and migration path

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (leaves 77-79).by Lon E. Sunshine.Ph.D

    Improved depth coding for HEVC focusing on depth edge approximation

    Get PDF
    The latest High Efficiency Video Coding (HEVC) standard has greatly improved the coding efficiency compared to its predecessor H.264. An important share of which is the adoption of hierarchical block partitioning structures and an extended number of modes. The structure of existing inter-modes is appropriate mainly to handle the rectangular and square aligned motion patterns. However, they could not be suitable for the block partitioning of depth objects having partial foreground motion with irregular edges and background. In such cases, the HEVC reference test model (HM) normally explores finer level block partitioning that requires more bits and encoding time to compensate large residuals. Since motion detection is the underlying criteria for mode selection, in this work, we use the energy concentration ratio feature of phase correlation to capture different types of motion in depth object. For better motion modeling focusing at depth edges, the proposed technique also uses an extra pattern mode comprising a group of templates with various rectangular and non-rectangular object shapes and edges. As the pattern mode could save bits by encoding only the foreground areas and beat all other inter-modes in a block once selected, the proposed technique could improve the rate-distortion performance. It could also reduce encoding time by skipping further branching using the pattern mode and selecting a subset of modes using innovative pre-processing criteria. Experimentally it could save 29% average encoding time and improve 0.10 dB Bjontegaard Delta peak signal-to-noise ratio compared to the HM

    Complexity adaptation in video encoders for power limited platforms

    Get PDF
    With the emergence of video services on power limited platforms, it is necessary to consider both performance-centric and constraint-centric signal processing techniques. Traditionally, video applications have a bandwidth or computational resources constraint or both. The recent H.264/AVC video compression standard offers significantly improved efficiency and flexibility compared to previous standards, which leads to less emphasis on bandwidth. However, its high computational complexity is a problem for codecs running on power limited plat- forms. Therefore, a technique that integrates both complexity and bandwidth issues in a single framework should be considered. In this thesis we investigate complexity adaptation of a video coder which focuses on managing computational complexity and provides significant complexity savings when applied to recent standards. It consists of three sub functions specially designed for reducing complexity and a framework for using these sub functions; Variable Block Size (VBS) partitioning, fast motion estimation, skip macroblock detection, and complexity adaptation framework. Firstly, the VBS partitioning algorithm based on the Walsh Hadamard Transform (WHT) is presented. The key idea is to segment regions of an image as edges or flat regions based on the fact that prediction errors are mainly affected by edges. Secondly, a fast motion estimation algorithm called Fast Walsh Boundary Search (FWBS) is presented on the VBS partitioned images. Its results outperform other commonly used fast algorithms. Thirdly, a skip macroblock detection algorithm is proposed for use prior to motion estimation by estimating the Discrete Cosine Transform (DCT) coefficients after quantisation. A new orthogonal transform called the S-transform is presented for predicting Integer DCT coefficients from Walsh Hadamard Transform coefficients. Complexity saving is achieved by deciding which macroblocks need to be processed and which can be skipped without processing. Simulation results show that the proposed algorithm achieves significant complexity savings with a negligible loss in rate-distortion performance. Finally, a complexity adaptation framework which combines all three techniques mentioned above is proposed for maximizing the perceptual quality of coded video on a complexity constrained platform

    Novel block-based motion estimation and segmentation for video coding

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Robust Subspace Estimation via Low-Rank and Sparse Decomposition and Applications in Computer Vision

    Get PDF
    PhDRecent advances in robust subspace estimation have made dimensionality reduction and noise and outlier suppression an area of interest for research, along with continuous improvements in computer vision applications. Due to the nature of image and video signals that need a high dimensional representation, often storage, processing, transmission, and analysis of such signals is a difficult task. It is therefore desirable to obtain a low-dimensional representation for such signals, and at the same time correct for corruptions, errors, and outliers, so that the signals could be readily used for later processing. Major recent advances in low-rank modelling in this context were initiated by the work of Cand`es et al. [17] where the authors provided a solution for the long-standing problem of decomposing a matrix into low-rank and sparse components in a Robust Principal Component Analysis (RPCA) framework. However, for computer vision applications RPCA is often too complex, and/or may not yield desirable results. The low-rank component obtained by the RPCA has usually an unnecessarily high rank, while in certain tasks lower dimensional representations are required. The RPCA has the ability to robustly estimate noise and outliers and separate them from the low-rank component, by a sparse part. But, it has no mechanism of providing an insight into the structure of the sparse solution, nor a way to further decompose the sparse part into a random noise and a structured sparse component that would be advantageous in many computer vision tasks. As videos signals are usually captured by a camera that is moving, obtaining a low-rank component by RPCA becomes impossible. In this thesis, novel Approximated RPCA algorithms are presented, targeting different shortcomings of the RPCA. The Approximated RPCA was analysed to identify the most time consuming RPCA solutions, and replace them with simpler yet tractable alternative solutions. The proposed method is able to obtain the exact desired rank for the low-rank component while estimating a global transformation to describe camera-induced motion. Furthermore, it is able to decompose the sparse part into a foreground sparse component, and a random noise part that contains no useful information for computer vision processing. The foreground sparse component is obtained by several novel structured sparsity-inducing norms, that better encapsulate the needed pixel structure in visual signals. Moreover, algorithms for reducing complexity of low-rank estimation have been proposed that achieve significant complexity reduction without sacrificing the visual representation of video and image information. The proposed algorithms are applied to several fundamental computer vision tasks, namely, high efficiency video coding, batch image alignment, inpainting, and recovery, video stabilisation, background modelling and foreground segmentation, robust subspace clustering and motion estimation, face recognition, and ultra high definition image and video super-resolution. The algorithms proposed in this thesis including batch image alignment and recovery, background modelling and foreground segmentation, robust subspace clustering and motion segmentation, and ultra high definition image and video super-resolution achieve either state-of-the-art or comparable results to existing methods

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

    Adaptive format conversion information as enhancement data for scalable video coding

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (p. 143-145).Scalable coding techniques can be used to efficiently provide multicast video service and involve transmitting a single independently coded base layer and one or more dependently coded enhancement layers. Clients can decode the base layer bitstream and none, some or all of the enhancement layer bitstreams to obtain video quality commensurate with their available resources. In many scalable coding algorithms, residual coding information is the only type of data that is coded in the enhancement layers. However, since the transmitter has access to the original sequence, it can adaptively select different format conversion methods for different regions in an intelligent manner. This adaptive format conversion information can then be transmitted as enhancement data to assist processing at the decoder. The use of adaptive format conversion has not been studied in detail and this thesis examines when and how it can be used for scalable video compression. A new scalable codec is developed in this thesis that can utilize adaptive format conversion information and/or residual coding information as enhancement data. This codec was used in various simulations to investigate different aspects of adaptive format conversion such as the effect of the base layer, a comparison of adaptive format conversion and residual coding, and the use of both adaptive format conversion and residual coding.(cont.) The experimental results show adaptive format conversion can provide video scalability at low enhancement bitrates not possible with residual coding and also assist residual coding at higher enhancement layer bitrates. This thesis also discusses the application of adaptive format conversion to the migration path for digital television. Adaptive format conversion is well-suited to the unique problems of the migration path and can provide initial video scalability as well as assist a future migration path.by Wade K. Wan.Ph.D

    Architekturkonzepte für prozessorbasierte MPEG Videodecoder mit Schwerpunkt für mobile Anwendungen

    Get PDF
    Mobile Endgeräte basieren z. Zt. auf Systemen (System on a Chip), deren Hauptfunktionalität durch den Einsatz entsprechender Software auf eingebetteten Prozessoren gebildet wird. Aufgrund der immer kürzeren Innovationszyklen kommt der softwarebasierten Umsetzung von Applikationen für eingebettete Systeme damit eine steigende Bedeutung zu. Im Bereich der mobilfunkgestützten Anwendungen sind in zunehmendem Maße auch Multimediaapplikationen vorzufinden, die bis vor kurzem noch leistungsfähigen Systemen, wie z.B. Desktop-PCs oder dedizierten Implementierungen in Form spezieller ASICs, vorbehalten waren. Hierzu zählen auch Anwendungen, die auf entsprechenden Standards basierend, die echtzeitfähige Übertragung von Bewegtbilddaten unterstützen, wie z.B. Videotelefonie oder Broadcastdienste. Dem hohen Rechenleistungsbedarf echtzeitkritischer Multimediaapplikationen stehen hierbei jedoch die relativ leistungsschwachen Prozessoren eingebetteter Systeme gegenüber. Im Bereich der Videocodierung für Mobilfunkanwendungen hat sich der MPEG-4-Standard weitgehend etabliert und wird hier stellvertretend für alle weiteren MPEG-Videostandards ausführlich beschrieben und analysiert. Die vorliegende Arbeit befasst sich daher im Kern mit dem Entwurf einer speziellen Befehls-satz- und Coprozessorerweiterung für MPEG-4-basierte Videodecoderalgorithmen, womit ein Performancegewinn von ca. 50% gegenüber einer reinen Softwareimplementierung erzielt wird. Die Erweiterungen sind in ihrer Definition generisch und daher nicht auf einen speziellen Prozessortyp zugeschnitten. Im vorliegenden Falle wird eine für Mobilfunkterminals typische RISC-Architektur herangezogen, um die Leistungsfähigkeit unter Beweis zu stellen und den Einsatz in eingebetteten Systemen aufzuzeigen. Die einzelnen Konzepte werden auf Basis einer Algorithmenanalyse hergeleitet, wobei erst eine Beschreibung der generischen Erweiterung erfolgt und anschließend die Integration in den verwendeten Prozessorkern unter Verwendung der Hardwarebeschreibungssprache VHDL beschrieben wird. Für die Bemessung des Echtzeitverhaltens wird im Rahmen der Arbeit ein spezieller Profiler entworfen, der unter anderem auch die Untersuchung und Optimierung des Speicherzugriffs-verhaltens gestattet. Mit Hilfe des Profilers werden Messungen durchgeführt, die den Rechenzeitgewinn der jewei-ligen Algorithmenteile unter Zuhilfenahme der implementierten Optimierungen aufzeigen. Ebenso wird ein Vergleich der Leistungsfähigkeit der vorgestellten Architektur mit gängigen Prozessorarchitekturen, wie Superskalare und VLIW-Prozessoren, vorgenommen. Hierbei wird ermittelt, dass das entwickelte Konzept ähnliche Resultate erbringt wie die vergleichsweise komplexeren Prozessoren. Neben der Leistungsfähigkeit steht auch die Ermittlung des Flächenbedarfs im Falle einer CMOS Gate-Array-basierten Implementierung zur Diskussion und wird ebenfalls für jede einzelne Erweiterung dargestellt
    corecore