3,181 research outputs found

    Steered mixture-of-experts for light field images and video : representation and coding

    Get PDF
    Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution

    Discontinuity-Aware Base-Mesh Modeling of Depth for Scalable Multiview Image Synthesis and Compression

    Full text link
    This thesis is concerned with the challenge of deriving disparity from sparsely communicated depth for performing disparity-compensated view synthesis for compression and rendering of multiview images. The modeling of depth is essential for deducing disparity at view locations where depth is not available and is also critical for visibility reasoning and occlusion handling. This thesis first explores disparity derivation methods and disparity-compensated view synthesis approaches. Investigations reveal the merits of adopting a piece-wise continuous mesh description of depth for deriving disparity at target view locations to enable disparity-compensated backward warping of texture. Visibility information can be reasoned due to the correspondence relationship between views that a mesh model provides, while the connectivity of a mesh model assists in resolving depth occlusion. The recent JPEG 2000 Part-17 extension defines tools for scalable coding of discontinuous media using breakpoint-dependent DWT, where breakpoints describe discontinuity boundary geometry. This thesis proposes a method to efficiently reconstruct depth coded using JPEG 2000 Part-17 as a piece-wise continuous mesh, where discontinuities are driven by the encoded breakpoints. Results show that the proposed mesh can accurately represent decoded depth while its complexity scales along with decoded depth quality. The piece-wise continuous mesh model anchored at a single viewpoint or base-view can be augmented to form a multi-layered structure where the underlying layers carry depth information of regions that are occluded at the base-view. Such a consolidated mesh representation is termed a base-mesh model and can be projected to many viewpoints, to deduce complete disparity fields between any pair of views that are inherently consistent. Experimental results demonstrate the superior performance of the base-mesh model in multiview synthesis and compression compared to other state-of-the-art methods, including the JPEG Pleno light field codec. The proposed base-mesh model departs greatly from conventional pixel-wise or block-wise depth models and their forward depth mapping for deriving disparity ingrained in existing multiview processing systems. When performing disparity-compensated view synthesis, there can be regions for which reference texture is unavailable, and inpainting is required. A new depth-guided texture inpainting algorithm is proposed to restore occluded texture in regions where depth information is either available or can be inferred using the base-mesh model

    Structured Sparsity: Discrete and Convex approaches

    Full text link
    Compressive sensing (CS) exploits sparsity to recover sparse or compressible signals from dimensionality reducing, non-adaptive sensing mechanisms. Sparsity is also used to enhance interpretability in machine learning and statistics applications: While the ambient dimension is vast in modern data analysis problems, the relevant information therein typically resides in a much lower dimensional space. However, many solutions proposed nowadays do not leverage the true underlying structure. Recent results in CS extend the simple sparsity idea to more sophisticated {\em structured} sparsity models, which describe the interdependency between the nonzero components of a signal, allowing to increase the interpretability of the results and lead to better recovery performance. In order to better understand the impact of structured sparsity, in this chapter we analyze the connections between the discrete models and their convex relaxations, highlighting their relative advantages. We start with the general group sparse model and then elaborate on two important special cases: the dispersive and the hierarchical models. For each, we present the models in their discrete nature, discuss how to solve the ensuing discrete problems and then describe convex relaxations. We also consider more general structures as defined by set functions and present their convex proxies. Further, we discuss efficient optimization solutions for structured sparsity problems and illustrate structured sparsity in action via three applications.Comment: 30 pages, 18 figure

    Nouvelles méthodes de prédiction inter-images pour la compression d’images et de vidéos

    Get PDF
    Due to the large availability of video cameras and new social media practices, as well as the emergence of cloud services, images and videosconstitute today a significant amount of the total data that is transmitted over the internet. Video streaming applications account for more than 70% of the world internet bandwidth. Whereas billions of images are already stored in the cloud and millions are uploaded every day. The ever growing streaming and storage requirements of these media require the constant improvements of image and video coding tools. This thesis aims at exploring novel approaches for improving current inter-prediction methods. Such methods leverage redundancies between similar frames, and were originally developed in the context of video compression. In a first approach, novel global and local inter-prediction tools are associated to improve the efficiency of image sets compression schemes based on video codecs. By leveraging a global geometric and photometric compensation with a locally linear prediction, significant improvements can be obtained. A second approach is then proposed which introduces a region-based inter-prediction scheme. The proposed method is able to improve the coding performances compared to existing solutions by estimating and compensating geometric and photometric distortions on a semi-local level. This approach is then adapted and validated in the context of video compression. Bit-rate improvements are obtained, especially for sequences displaying complex real-world motions such as zooms and rotations. The last part of the thesis focuses on deep learning approaches for inter-prediction. Deep neural networks have shown striking results for a large number of computer vision tasks over the last years. Deep learning based methods proposed for frame interpolation applications are studied here in the context of video compression. Coding performance improvements over traditional motion estimation and compensation methods highlight the potential of these deep architectures.En raison de la grande disponibilité des dispositifs de capture vidéo et des nouvelles pratiques liées aux réseaux sociaux, ainsi qu’à l’émergence desservices en ligne, les images et les vidéos constituent aujourd’hui une partie importante de données transmises sur internet. Les applications de streaming vidéo représentent ainsi plus de 70% de la bande passante totale de l’internet. Des milliards d’images sont déjà stockées dans le cloud et des millions y sont téléchargés chaque jour. Les besoins toujours croissants en streaming et stockage nécessitent donc une amélioration constante des outils de compression d’image et de vidéo. Cette thèse vise à explorer des nouvelles approches pour améliorer les méthodes actuelles de prédiction inter-images. De telles méthodes tirent parti des redondances entre images similaires, et ont été développées à l’origine dans le contexte de la vidéo compression. Dans une première partie, de nouveaux outils de prédiction inter globaux et locaux sont associés pour améliorer l’efficacité des schémas de compression de bases de données d’image. En associant une compensation géométrique et photométrique globale avec une prédiction linéaire locale, des améliorations significatives peuvent être obtenues. Une seconde approche est ensuite proposée qui introduit un schéma deprédiction inter par régions. La méthode proposée est en mesure d’améliorer les performances de codage par rapport aux solutions existantes en estimant et en compensant les distorsions géométriques et photométriques à une échelle semi locale. Cette approche est ensuite adaptée et validée dans le cadre de la compression vidéo. Des améliorations en réduction de débit sont obtenues, en particulier pour les séquences présentant des mouvements complexes réels tels que des zooms et des rotations. La dernière partie de la thèse se concentre sur l’étude des méthodes d’apprentissage en profondeur dans le cadre de la prédiction inter. Ces dernières années, les réseaux de neurones profonds ont obtenu des résultats impressionnants pour un grand nombre de tâches de vision par ordinateur. Les méthodes basées sur l’apprentissage en profondeur proposéesà l’origine pour de l’interpolation d’images sont étudiées ici dans le contexte de la compression vidéo. Des améliorations en terme de performances de codage sont obtenues par rapport aux méthodes d’estimation et de compensation de mouvements traditionnelles. Ces résultats mettent en évidence le fort potentiel de ces architectures profondes dans le domaine de la compression vidéo

    Discriminative Representations for Heterogeneous Images and Multimodal Data

    Get PDF
    Histology images of tumor tissue are an important diagnostic and prognostic tool for pathologists. Recently developed molecular methods group tumors into subtypes to further guide treatment decisions, but they are not routinely performed on all patients. A lower cost and repeatable method to predict tumor subtypes from histology could bring benefits to more cancer patients. Further, combining imaging and genomic data types provides a more complete view of the tumor and may improve prognostication and treatment decisions. While molecular and genomic methods capture the state of a small sample of tumor, histological image analysis provides a spatial view and can identify multiple subtypes in a single tumor. This intra-tumor heterogeneity has yet to be fully understood and its quantification may lead to future insights into tumor progression. In this work, I develop methods to learn appropriate features directly from images using dictionary learning or deep learning. I use multiple instance learning to account for intra-tumor variations in subtype during training, improving subtype predictions and providing insights into tumor heterogeneity. I also integrate image and genomic features to learn a projection to a shared space that is also discriminative. This method can be used for cross-modal classification or to improve predictions from images by also learning from genomic data during training, even if only image data is available at test time.Doctor of Philosoph

    Article Search Tool and Topic Classifier

    Get PDF
    This thesis focuses on 3 main tasks related to Document Recommendations. The first approach deals with applying existing techniques on Document Recommendations using Doc2Vec. A robust representation of the same is presented to understand how noise induced in the embedding space affects predictions of the recommendations. The next phase focuses on improving the above recommendations using a Topic Classifier. A Hierarchical Attention Network is employed for this purpose. In order to increase the accuracy of prediction, this work establishes a relation to embedding size of the words in the article. In the last phase, model-agnostic Explainable AI (XAI) techniques are implemented to prove the findings in this thesis. XAI techniques are also employed to show how we can fine tune model hyper-parameters for a black-box model

    Machine learning-based automated segmentation with a feedback loop for 3D synchrotron micro-CT

    Get PDF
    Die Entwicklung von Synchrotronlichtquellen der dritten Generation hat die Grundlage für die Untersuchung der 3D-Struktur opaker Proben mit einer Auflösung im Mikrometerbereich und höher geschaffen. Dies führte zur Entwicklung der Röntgen-Synchrotron-Mikro-Computertomographie, welche die Schaffung von Bildgebungseinrichtungen zur Untersuchung von Proben verschiedenster Art förderte, z.B. von Modellorganismen, um die Physiologie komplexer lebender Systeme besser zu verstehen. Die Entwicklung moderner Steuerungssysteme und Robotik ermöglichte die vollständige Automatisierung der Röntgenbildgebungsexperimente und die Kalibrierung der Parameter des Versuchsaufbaus während des Betriebs. Die Weiterentwicklung der digitalen Detektorsysteme führte zu Verbesserungen der Auflösung, des Dynamikbereichs, der Empfindlichkeit und anderer wesentlicher Eigenschaften. Diese Verbesserungen führten zu einer beträchtlichen Steigerung des Durchsatzes des Bildgebungsprozesses, aber auf der anderen Seite begannen die Experimente eine wesentlich größere Datenmenge von bis zu Dutzenden von Terabyte zu generieren, welche anschließend manuell verarbeitet wurden. Somit ebneten diese technischen Fortschritte den Weg für die Durchführung effizienterer Hochdurchsatzexperimente zur Untersuchung einer großen Anzahl von Proben, welche Datensätze von besserer Qualität produzierten. In der wissenschaftlichen Gemeinschaft besteht daher ein hoher Bedarf an einem effizienten, automatisierten Workflow für die Röntgendatenanalyse, welcher eine solche Datenlast bewältigen und wertvolle Erkenntnisse für die Fachexperten liefern kann. Die bestehenden Lösungen für einen solchen Workflow sind nicht direkt auf Hochdurchsatzexperimente anwendbar, da sie für Ad-hoc-Szenarien im Bereich der medizinischen Bildgebung entwickelt wurden. Daher sind sie nicht für Hochdurchsatzdatenströme optimiert und auch nicht in der Lage, die hierarchische Beschaffenheit von Proben zu nutzen. Die wichtigsten Beiträge der vorliegenden Arbeit sind ein neuer automatisierter Analyse-Workflow, der für die effiziente Verarbeitung heterogener Röntgendatensätze hierarchischer Natur geeignet ist. Der entwickelte Workflow basiert auf verbesserten Methoden zur Datenvorverarbeitung, Registrierung, Lokalisierung und Segmentierung. Jede Phase eines Arbeitsablaufs, die eine Trainingsphase beinhaltet, kann automatisch feinabgestimmt werden, um die besten Hyperparameter für den spezifischen Datensatz zu finden. Für die Analyse von Faserstrukturen in Proben wurde eine neue, hochgradig parallelisierbare 3D-Orientierungsanalysemethode entwickelt, die auf einem neuartigen Konzept der emittierenden Strahlen basiert und eine präzisere morphologische Analyse ermöglicht. Alle entwickelten Methoden wurden gründlich an synthetischen Datensätzen validiert, um ihre Anwendbarkeit unter verschiedenen Abbildungsbedingungen quantitativ zu bewerten. Es wurde gezeigt, dass der Workflow in der Lage ist, eine Reihe von Datensätzen ähnlicher Art zu verarbeiten. Darüber hinaus werden die effizienten CPU/GPU-Implementierungen des entwickelten Workflows und der Methoden vorgestellt und der Gemeinschaft als Module für die Sprache Python zur Verfügung gestellt. Der entwickelte automatisierte Analyse-Workflow wurde erfolgreich für Mikro-CT-Datensätze angewandt, die in Hochdurchsatzröntgenexperimenten im Bereich der Entwicklungsbiologie und Materialwissenschaft gewonnen wurden. Insbesondere wurde dieser Arbeitsablauf für die Analyse der Medaka-Fisch-Datensätze angewandt, was eine automatisierte Segmentierung und anschließende morphologische Analyse von Gehirn, Leber, Kopfnephronen und Herz ermöglichte. Darüber hinaus wurde die entwickelte Methode der 3D-Orientierungsanalyse bei der morphologischen Analyse von Polymergerüst-Datensätzen eingesetzt, um einen Herstellungsprozess in Richtung wünschenswerter Eigenschaften zu lenken
    • …
    corecore