20 research outputs found

    Scalable light field representation and coding

    Get PDF
    This Thesis aims to advance the state-of-the-art in light field representation and coding. In this context, proposals to improve functionalities like light field random access and scalability are also presented. As the light field representation constrains the coding approach to be used, several light field coding techniques to exploit the inherent characteristics of the most popular types of light field representations are proposed and studied, which are normally based on micro-images or sub-aperture-images. To encode micro-images, two solutions are proposed, aiming to exploit the redundancy between neighboring micro-images using a high order prediction model, where the model parameters are either explicitly transmitted or inferred at the decoder, respectively. In both cases, the proposed solutions are able to outperform low order prediction solutions. To encode sub-aperture-images, an HEVC-based solution that exploits their inherent intra and inter redundancies is proposed. In this case, the light field image is encoded as a pseudo video sequence, where the scanning order is signaled, allowing the encoder and decoder to optimize the reference picture lists to improve coding efficiency. A novel hybrid light field representation coding approach is also proposed, by exploiting the combined use of both micro-image and sub-aperture-image representation types, instead of using each representation individually. In order to aid the fast deployment of the light field technology, this Thesis also proposes scalable coding and representation approaches that enable adequate compatibility with legacy displays (e.g., 2D, stereoscopic or multiview) and with future light field displays, while maintaining high coding efficiency. Additionally, viewpoint random access, allowing to improve the light field navigation and to reduce the decoding delay, is also enabled with a flexible trade-off between coding efficiency and viewpoint random access.Esta Tese tem como objetivo avançar o estado da arte em representação e codificação de campos de luz. Neste contexto, são também apresentadas propostas para melhorar funcionalidades como o acesso aleatório ao campo de luz e a escalabilidade. Como a representação do campo de luz limita a abordagem de codificação a ser utilizada, são propostas e estudadas várias técnicas de codificação de campos de luz para explorar as características inerentes aos seus tipos mais populares de representação, que são normalmente baseadas em micro-imagens ou imagens de sub-abertura. Para codificar as micro-imagens, são propostas duas soluções, visando explorar a redundância entre micro-imagens vizinhas utilizando um modelo de predição de alta ordem, onde os parâmetros do modelo são explicitamente transmitidos ou inferidos no decodificador, respetivamente. Em ambos os casos, as soluções propostas são capazes de superar as soluções de predição de baixa ordem. Para codificar imagens de sub-abertura, é proposta uma solução baseada em HEVC que explora a inerente redundância intra e inter deste tipo de imagens. Neste caso, a imagem do campo de luz é codificada como uma pseudo-sequência de vídeo, onde a ordem de varrimento é sinalizada, permitindo ao codificador e decodificador otimizar as listas de imagens de referência para melhorar a eficiência da codificação. Também é proposta uma nova abordagem de codificação baseada na representação híbrida do campo de luz, explorando o uso combinado dos tipos de representação de micro-imagem e sub-imagem, em vez de usar cada representação individualmente. A fim de facilitar a rápida implantação da tecnologia de campo de luz, esta Tese também propõe abordagens escaláveis de codificação e representação que permitem uma compatibilidade adequada com monitores tradicionais (e.g., 2D, estereoscópicos ou multivista) e com futuros monitores de campo de luz, mantendo ao mesmo tempo uma alta eficiência de codificação. Além disso, o acesso aleatório de pontos de vista, permitindo melhorar a navegação no campo de luz e reduzir o atraso na descodificação, também é permitido com um equilíbrio flexível entre eficiência de codificação e acesso aleatório de pontos de vista

    Dense light field coding: a survey

    Get PDF
    Light Field (LF) imaging is a promising solution for providing more immersive and closer to reality multimedia experiences to end-users with unprecedented creative freedom and flexibility for applications in different areas, such as virtual and augmented reality. Due to the recent technological advances in optics, sensor manufacturing and available transmission bandwidth, as well as the investment of many tech giants in this area, it is expected that soon many LF transmission systems will be available to both consumers and professionals. Recognizing this, novel standardization initiatives have recently emerged in both the Joint Photographic Experts Group (JPEG) and the Moving Picture Experts Group (MPEG), triggering the discussion on the deployment of LF coding solutions to efficiently handle the massive amount of data involved in such systems. Since then, the topic of LF content coding has become a booming research area, attracting the attention of many researchers worldwide. In this context, this paper provides a comprehensive survey of the most relevant LF coding solutions proposed in the literature, focusing on angularly dense LFs. Special attention is placed on a thorough description of the different LF coding methods and on the main concepts related to this relevant area. Moreover, comprehensive insights are presented into open research challenges and future research directions for LF coding.info:eu-repo/semantics/publishedVersio

    Data compression and computational efficiency

    Get PDF
    In this thesis we seek to make advances towards the goal of effective learned compression. This entails using machine learning models as the core constituent of compression algorithms, rather than hand-crafted components. To that end, we first describe a new method for lossless compression. This method allows a class of existing machine learning models – latent variable models – to be turned into lossless compressors. Thus many future advancements in the field of latent variable modelling can be leveraged in the field of lossless compression. We demonstrate a proof-of-concept of this method on image compression. Further, we show that it can scale to very large models, and image compression problems which closely resemble the real-world use cases that we seek to tackle. The use of the above compression method relies on executing a latent variable model. Since these models can be large in size and slow to run, we consider how to mitigate these computational costs. We show that by implementing much of the models using binary precision parameters, rather than floating-point precision, we can still achieve reasonable modelling performance but requiring a fraction of the storage space and execution time. Lastly, we consider how learned compression can be applied to 3D scene data - a data medium increasing in prevalence, and which can require a significant amount of space. A recently developed class of machine learning models - scene representation functions - has demonstrated good results on modelling such 3D scene data. We show that by compressing these representation functions themselves we can achieve good scene reconstruction with a very small model size

    Discontinuity-Aware Base-Mesh Modeling of Depth for Scalable Multiview Image Synthesis and Compression

    Full text link
    This thesis is concerned with the challenge of deriving disparity from sparsely communicated depth for performing disparity-compensated view synthesis for compression and rendering of multiview images. The modeling of depth is essential for deducing disparity at view locations where depth is not available and is also critical for visibility reasoning and occlusion handling. This thesis first explores disparity derivation methods and disparity-compensated view synthesis approaches. Investigations reveal the merits of adopting a piece-wise continuous mesh description of depth for deriving disparity at target view locations to enable disparity-compensated backward warping of texture. Visibility information can be reasoned due to the correspondence relationship between views that a mesh model provides, while the connectivity of a mesh model assists in resolving depth occlusion. The recent JPEG 2000 Part-17 extension defines tools for scalable coding of discontinuous media using breakpoint-dependent DWT, where breakpoints describe discontinuity boundary geometry. This thesis proposes a method to efficiently reconstruct depth coded using JPEG 2000 Part-17 as a piece-wise continuous mesh, where discontinuities are driven by the encoded breakpoints. Results show that the proposed mesh can accurately represent decoded depth while its complexity scales along with decoded depth quality. The piece-wise continuous mesh model anchored at a single viewpoint or base-view can be augmented to form a multi-layered structure where the underlying layers carry depth information of regions that are occluded at the base-view. Such a consolidated mesh representation is termed a base-mesh model and can be projected to many viewpoints, to deduce complete disparity fields between any pair of views that are inherently consistent. Experimental results demonstrate the superior performance of the base-mesh model in multiview synthesis and compression compared to other state-of-the-art methods, including the JPEG Pleno light field codec. The proposed base-mesh model departs greatly from conventional pixel-wise or block-wise depth models and their forward depth mapping for deriving disparity ingrained in existing multiview processing systems. When performing disparity-compensated view synthesis, there can be regions for which reference texture is unavailable, and inpainting is required. A new depth-guided texture inpainting algorithm is proposed to restore occluded texture in regions where depth information is either available or can be inferred using the base-mesh model

    Visual and Geometric Data Compression for Immersive Technologies

    Get PDF
    The contributions of this thesis are new compression algorithms for light field images and point cloud geometry. Light field imaging attracted wide attention in the recent decade, partly due to emergence of relatively low-cost handheld light field cameras designed for commercial purposes whereas point clouds are used more and more frequently in immersive technologies, replacing other forms of 3D representation. We obtain successful coding performance by combining conventional image processing methods, entropy coding, learning-based disparity estimation and optimization of neural networks for context probability modeling. On the light field coding side, we develop a lossless light field coding method which uses learning-based disparity estimations to predict any view in a light field from a set of reference views. On the point cloud geometry compression side, we develop four different algorithms. The first two of these algorithms follow the so-called bounding volumes approach which initially represents a part of the point cloud in two depth maps where the remaining points of the cloud are contained in a bounding volume which can be derived using only the two depth maps that are losslessly transmitted. One of the two algorithms is a lossy coder that reconstructs some of the remaining points in several steps which involve conventional image processing and image coding techniques. The other one is a lossless coder which applies a novel context arithmetic coding approach involving gradual expansion of the reconstructed point cloud into neighboring voxels. The last two of the proposed point cloud compression algorithms use neural networks for context probability modeling for coding the octree representation of point clouds using arithmetic coding. One of these two algorithms is a learning-based intra-frame coder which requires an initial training stage on a set of training point clouds. The lastly presented algorithm is an inter-frame (sequence) encoder which incorporates the neural network training into the encoding stage, thus for each sequence of point clouds, a specific neural network model is optimized which is also transmitted as a header in the bitstream

    12th Young Scientists Meeting 2019, Kleinmachnow, Germany, November 6-8

    Get PDF

    Multi-modal Video Content Understanding

    Get PDF
    Video is an important format of information. Humans use videos for a variety of purposes such as entertainment, education, communication, information sharing, and capturing memories. To this date, humankind accumulated a colossal amount of video material online which is freely available. Manual processing at this scale is simply impossible. To this end, many research efforts have been dedicated to the automatic processing of video content. At the same time, human perception of the world is multi-modal. A human uses multiple senses to understand the environment and objects, and their interactions. When watching a video, we perceive the content via both audio and visual modalities, and removing one of these modalities results in less immersive experience. Similarly, if information in both modalities does not correspond, it may create a sense of dissonance. Therefore, joint modelling of multiple modalities (such as audio, visual, and text) within one model is an active research area. In the last decade, the fields of automatic video understanding and multi-modal modelling have seen exceptional progress due to the ubiquitous success of deep learning models and, more recently, transformer-based architectures in particular. Our work draws on these advances and pushes the state-of-the-art of multi-modal video understanding forward. Applications of automatic multi-modal video processing are broad and exciting! For instance, the content-based textual description of a video (video captioning) may allow a visually- or auditory-impaired person to understand the content and, thus, engage in brighter social interactions. However, prior work in video content description relies on the visual input alone, missing vital information only available in the audio stream. To this end, we proposed two novel multi-modal transformer models that encode audio and visual interactions simultaneously. More specifically, first, we introduced a late-fusion multi-modal transformer that is highly modular and allows the processing of an arbitrary set of modalities. Second, an efficient bi-modal transformer was presented to encode audio-visual cues starting from the lower network layers allowing more rich audio-visual features and stronger performance as a result. Another application is the automatic visually-guided sound generation that might help professional sound (foley) designers who spend hours searching a database for relevant audio for a movie scene. Previous approaches for automatic conditional audio generation support only one class (e. g. “dog barking”), while real-life applications may require generation for hundreds of data classes and one would need to train one model for every data class which can be infeasible. To bridge this gap, we introduced a novel two-stage model that, first, efficiently encodes audio as a set of codebook vectors (i. e. trains to make “building blocks”) and, then, learns to sample these audio vectors given visual inputs to make a relevant audio track for this visual input. Moreover, we studied the automatic evaluation of the conditional audio generation model and proposed metrics that measure both quality and relevance of the generated samples. Finally, as video editing is becoming more common among non-professionals due to the increased popularity of such services as YouTube, automatic assistance during video editing grows in demand, e. g. off-sync detection between audio and visual tracks. Prior work in audio-visual synchronization was devoted to solving the task on lip-syncing datasets with “dense” signals, such as interviews and presentations. In such videos, synchronization cues occur “densely” across time, and it is enough to process just a few tens of a second to synchronize the tracks. In contrast, opendomain videos mostly have only “sparse” cues that occur just once in a seconds-long video clip (e. g. “chopping wood”). To address this, we: a) proposed a novel dataset with “sparse” sounds; b) designed a model which can efficiently encode seconds-long audio-visual tracks in a small set of “learnable selectors” that is, then, used for synchronization. In addition, we explored the temporal artefacts that common audio and video compression algorithms leave in data streams. To prevent a model from learning to rely on these artefacts, we introduced a list of recommendations on how to mitigate them. This thesis provides the details of the proposed methodologies as well as a comprehensive overview of advances in relevant fields of multi-modal video understanding. In addition, we provide a discussion of potential research directions that can bring significant contributions to the field

    Decryption Failure Attacks on Post-Quantum Cryptography

    Get PDF
    This dissertation discusses mainly new cryptanalytical results related to issues of securely implementing the next generation of asymmetric cryptography, or Public-Key Cryptography (PKC).PKC, as it has been deployed until today, depends heavily on the integer factorization and the discrete logarithm problems.Unfortunately, it has been well-known since the mid-90s, that these mathematical problems can be solved due to Peter Shor's algorithm for quantum computers, which achieves the answers in polynomial time.The recently accelerated pace of R&D towards quantum computers, eventually of sufficient size and power to threaten cryptography, has led the crypto research community towards a major shift of focus.A project towards standardization of Post-quantum Cryptography (PQC) was launched by the US-based standardization organization, NIST. PQC is the name given to algorithms designed for running on classical hardware/software whilst being resistant to attacks from quantum computers.PQC is well suited for replacing the current asymmetric schemes.A primary motivation for the project is to guide publicly available research toward the singular goal of finding weaknesses in the proposed next generation of PKC.For public key encryption (PKE) or digital signature (DS) schemes to be considered secure they must be shown to rely heavily on well-known mathematical problems with theoretical proofs of security under established models, such as indistinguishability under chosen ciphertext attack (IND-CCA).Also, they must withstand serious attack attempts by well-renowned cryptographers both concerning theoretical security and the actual software/hardware instantiations.It is well-known that security models, such as IND-CCA, are not designed to capture the intricacies of inner-state leakages.Such leakages are named side-channels, which is currently a major topic of interest in the NIST PQC project.This dissertation focuses on two things, in general:1) how does the low but non-zero probability of decryption failures affect the cryptanalysis of these new PQC candidates?And 2) how might side-channel vulnerabilities inadvertently be introduced when going from theory to the practice of software/hardware implementations?Of main concern are PQC algorithms based on lattice theory and coding theory.The primary contributions are the discovery of novel decryption failure side-channel attacks, improvements on existing attacks, an alternative implementation to a part of a PQC scheme, and some more theoretical cryptanalytical results

    Pengaruh ekstrak etanol daun sirsak (Annona muricata L.) terhadap kadar enzim transaminase (SGPT dan SGOT) pada mencit (Mus musculus) yang diinduksi DMBA

    Get PDF
    Penelitian ini merupakan penelitian eksperimenta menggunakan Rancangan Acak Lengkap (RAL) dengan lima kelompok perlakuan dan empat kali ulangan. Kelompok perlakuan Acak Lengkap tikus yang diinduksi dengan 7,12-dimetilbenz(alfa) antrasen (DMBA) dan diberi ekstrak daun sirsak dosis 0 mg/kgBB (K+), dosis 100 mg/kg bb (P1). DOSIS 150 MG/KG BB (P2), Dosis 200 mg/kg bb (P3) dan tikus normal yaitu tikus tanpa diinduksi dengan 7,12-dimetilbenz(alfa) antrasen (DMBA) dan tanpa diberi ekstrak daun sirsak (K-). Data kemudian dianalisis dengan menggunakan Analisis Varians (One Way Anova) satu arah. Jika menunjukan ada pengaruh maka dilakukan uji lanjut dengan BNT taraf signifikan alfa=1%. Berdasarkan hasil penelitian, diketahui kadar SGPT pada (K+) yaitu 154,472+/-4,09 U/L, sedangkan pada P1, P2, dan P3 masing-masing adalah 134,352+/-3,03 U/I , 118,137+/-2,69 U/L, 88,950+/-3,34 U/L dan (K-) 11,516+/-1.62. Kadar SGOT pada (K+) adalah 132,174+/-2,63 U/L, sedangkan pada P1, P2, dan P3 masing-masing adalah 1174,674+/-2,63 U/L, 105,232+/-2,34 U/L, 79,717+/-3,56 U/L dan (K-) 8,339+/-1,25. Oleh karena itu dapat disimpulkan bahwa ekstrak etanol daun sirsak (Annona muricata L.) berpengaruh terhadap kadar enzime transminase (SGPT dan SGOT) pada mencit (Mus muscullus)yang diinduksi dengan 7,12-dimetilbenz(alfa) antvrasen (DMBA) secara in vivo. Sedangkan dosis yang efektif untuk menurunkan kadar enzim transaminase SGPT dan SGOT adalah P3 yaitu 200 mg/kg bb

    Town of Newington, NH town report year ending December 31, 2015, school district report FY ending June 30, 2015.

    Get PDF
    This is an annual report containing vital statistics for a town/city in the state of New Hampshire
    corecore