1,208 research outputs found

    Steered mixture-of-experts for light field images and video : representation and coding

    Get PDF
    Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution

    Advances in single molecule microscopy: Protein characterization, force analysis and fluorescence localization

    Get PDF
    Recent advances in single molecule techniques have allowed scientists to address biological questions which cannot be resolved by traditional ensemble measurements. In this dissertation, I integrate single molecule and bulk measurements to establish a direct link between copper exposure and neurotoxicity in prion disease. Furthermore, I develop a new analysis method to improve the accuracy of kinetic parameter estimation in single molecule Atomic Force Microscope (AFM) experiments. Finally, I develop a new fluorescence localization microscopy to identify the axial position of a single fluorescent object with sub-nanometer accuracy. Prion diseases are characterized by the misfolding and oligomerization of prion protein (PrP). Copper exposure has been linked to prion pathogenesis; however, the molecular mechanism is still unknown. In the first part of this dissertation, I use single molecule fluorescence assay, dynamic force spectroscopy (DFS) with AFM, and real-time quaking-induced conversion (RT-QuIC) assay to resolve, for the first time, the mechanistic basis by which Cu2+ ions induce a structural change in PrP, further promote oligomerization, template amyloid formation and neurotoxicity. In the second part of this dissertation, I established a more accurate analysis method for single molecule DFS experiments. DFS is a widely used technique to characterize the dissociation kinetic between individual biomolecules. In AFM-DFS, receptor-ligand complexes are ruptured at different force rates by varying the speed at which the AFM-tip and substrate are pulled away from each other. The rupture events are grouped according to their pulling speeds and the mean force and loading rate of each group is calculated. This data is subsequently fit to the established models to extract the kinetic parameters such as the intrinsic off-rate (koff) and the width of the potential energy barrier (xβ). However, due to the large uncertainty in determining the mean forces and loading rates, errors in the estimated koff and xβ can be substantial. Here, I demonstrate that these errors can be dramatically reduced by sorting rupture events into groups using cluster analysis. Monte Carlo simulations show that the cluster analysis is very effective at improving the accuracy of parameter estimation, especially when the number of unbinding events are limited and not well separated into distinct groups. Finally, I describe a new technique, standing wave axial nanometry (SWAN), to image the axial location of a single nanoscale fluorescent object with sub-nanometer accuracy and 3.7 nm precision. A standing wave, generated by positioning an AFM tip over a focused laser beam, is used to excite fluorescence; axial position is determined from the phase of the emission intensity. I use SWAN to measure the orientation of single DNA molecules of different lengths, grafted on surfaces with different functionalities

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE

    Multi-frame reconstruction using super-resolution, inpainting, segmentation and codecs

    Get PDF
    In this thesis, different aspects of video and light field reconstruction are considered such as super-resolution, inpainting, segmentation and codecs. For this purpose, each of these strategies are analyzed based on a specific goal and a specific database. Accordingly, databases which are relevant to film industry, sport videos, light fields and hyperspectral videos are used for the sake of improvement. This thesis is constructed around six related manuscripts, in which several approaches are proposed for multi-frame reconstruction. Initially, a novel multi-frame reconstruction strategy is proposed for lightfield super-resolution in which graph-based regularization is applied along with edge preserving filtering for improving the spatio-angular quality of lightfield. Second, a novel video reconstruction is proposed which is built based on compressive sensing (CS), Gaussian mixture models (GMM) and sparse 3D transform-domain block matching. The motivation of the proposed technique is the improvement in visual quality performance of the video frames and decreasing the reconstruction error in comparison with the former video reconstruction methods. In the next approach, student-t mixture models and edge preserving filtering are applied for the purpose of video super-resolution. Student-t mixture model has a heavy tail which makes it robust and suitable as a video frame patch prior and rich in terms of log likelihood for information retrieval. In another approach, a hyperspectral video database is considered, and a Bayesian dictionary learning process is used for hyperspectral video super-resolution. To that end, Beta process is used in Bayesian dictionary learning and a sparse coding is generated regarding the hyperspectral video super-resolution. The spatial super-resolution is followed by a spectral video restoration strategy, and the whole process leveraged two different dictionary learnings, in which the first one is trained for spatial super-resolution and the second one is trained for the spectral restoration. Furthermore, in another approach, a novel framework is proposed for replacing advertisement contents in soccer videos in an automatic way by using deep learning strategies. For this purpose, a UNET architecture is applied (an image segmentation convolutional neural network technique) for content segmentation and detection. Subsequently, after reconstructing the segmented content in the video frames (considering the apparent loss in detection), the unwanted content is replaced by new one using a homography mapping procedure. In addition, in another research work, a novel video compression framework is presented using autoencoder networks that encode and decode videos by using less chroma information than luma information. For this purpose, instead of converting Y'CbCr 4:2:2/4:2:0 videos to and from RGB 4:4:4, the video is kept in Y'CbCr 4:2:2/4:2:0 and merged the luma and chroma channels after the luma is downsampled to match the chroma size. An inverse function is performed for the decoder. The performance of these models is evaluated by using CPSNR, MS-SSIM, and VMAF metrics. The experiments reveal that, as compared to video compression involving conversion to and from RGB 4:4:4, the proposed method increases the video quality by about 5.5% for Y'CbCr 4:2:2 and 8.3% for Y'CbCr 4:2:0 while reducing the amount of computation by nearly 37% for Y'CbCr 4:2:2 and 40% for Y'CbCr 4:2:0. The thread that ties these approaches together is reconstruction of the video and light field frames based on different aspects of problems such as having loss of information, blur in the frames, existing noise after reconstruction, existing unpleasant content, excessive size of information and high computational overhead. In three of the proposed approaches, we have used Plug-and-Play ADMM model for the first time regarding reconstruction of videos and light fields in order to address both information retrieval in the frames and tackling noise/blur at the same time. In two of the proposed models, we applied sparse dictionary learning to reduce the data dimension and demonstrate them as an efficient linear combination of basis frame patches. Two of the proposed approaches are developed in collaboration with industry, in which deep learning frameworks are used to handle large set of features and to learn high-level features from the data

    Artificial Intelligence for Multimedia Signal Processing

    Get PDF
    Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining

    Automatic Relative Radiometric Normalization of Bi-Temporal Satellite Images Using a Coarse-to-Fine Pseudo-Invariant Features Selection and Fuzzy Integral Fusion Strategies

    Get PDF
    Relative radiometric normalization (RRN) is important for pre-processing and analyzing multitemporal remote sensing (RS) images. Multitemporal RS images usually include different land use/land cover (LULC) types; therefore, considering an identical linear relationship during RRN modeling may result in potential errors in the RRN results. To resolve this issue, we proposed a new automatic RRN technique that efficiently selects the clustered pseudo-invariant features (PIFs) through a coarse-to-fine strategy and uses them in a fusion-based RRN modeling approach. In the coarse stage, an efficient difference index was first generated from the down-sampled reference and target images by combining the spectral correlation, spectral angle mapper (SAM), and Chebyshev distance. This index was then categorized into three groups of changed, unchanged, and uncertain classes using a fast multiple thresholding technique. In the fine stage, the subject image was first segmented into different clusters by the histogram-based fuzzy c-means (HFCM) algorithm. The optimal PIFs were then selected from unchanged and uncertain regions using each cluster’s bivariate joint distribution analysis. In the RRN modeling step, two normalized subject images were first produced using the robust linear regression (RLR) and cluster-wise-RLR (CRLR) methods based on the clustered PIFs. Finally, the normalized images were fused using the Choquet fuzzy integral fusion strategy for overwhelming the discontinuity between clusters in the final results and keeping the radiometric rectification optimal. Several experiments were implemented on four different bi-temporal satellite images and a simulated dataset to demonstrate the efficiency of the proposed method. The results showed that the proposed method yielded superior RRN results and outperformed other considered well-known RRN algorithms in terms of both accuracy level and execution time.publishedVersio

    Generative Models for Preprocessing of Hospital Brain Scans

    Get PDF
    I will in this thesis present novel computational methods for processing routine clinical brain scans. Such scans were originally acquired for qualitative assessment by trained radiologists, and present a number of difficulties for computational models, such as those within common neuroimaging analysis software. The overarching objective of this work is to enable efficient and fully automated analysis of large neuroimaging datasets, of the type currently present in many hospitals worldwide. The methods presented are based on probabilistic, generative models of the observed imaging data, and therefore rely on informative priors and realistic forward models. The first part of the thesis will present a model for image quality improvement, whose key component is a novel prior for multimodal datasets. I will demonstrate its effectiveness for super-resolving thick-sliced clinical MR scans and for denoising CT images and MR-based, multi-parametric mapping acquisitions. I will then show how the same prior can be used for within-subject, intermodal image registration, for more robustly registering large numbers of clinical scans. The second part of the thesis focusses on improved, automatic segmentation and spatial normalisation of routine clinical brain scans. I propose two extensions to a widely used segmentation technique. First, a method for this model to handle missing data, which allows me to predict entirely missing modalities from one, or a few, MR contrasts. Second, a principled way of combining the strengths of probabilistic, generative models with the unprecedented discriminative capability of deep learning. By introducing a convolutional neural network as a Markov random field prior, I can model nonlinear class interactions and learn these using backpropagation. I show that this model is robust to sequence and scanner variability. Finally, I show examples of fitting a population-level, generative model to various neuroimaging data, which can model, e.g., CT scans with haemorrhagic lesions
    • …
    corecore