165 research outputs found

    Light field reconstruction from multi-view images

    Get PDF
    Kang Han studied recovering the 3D world from multi-view images. He proposed several algorithms to deal with occlusions in depth estimation and effective representations in view rendering. the proposed algorithms can be used for many innovative applications based on machine intelligence, such as autonomous driving and Metaverse

    Deep Depth From Focus

    Full text link
    Depth from focus (DFF) is one of the classical ill-posed inverse problems in computer vision. Most approaches recover the depth at each pixel based on the focal setting which exhibits maximal sharpness. Yet, it is not obvious how to reliably estimate the sharpness level, particularly in low-textured areas. In this paper, we propose `Deep Depth From Focus (DDFF)' as the first end-to-end learning approach to this problem. One of the main challenges we face is the hunger for data of deep neural networks. In order to obtain a significant amount of focal stacks with corresponding groundtruth depth, we propose to leverage a light-field camera with a co-calibrated RGB-D sensor. This allows us to digitally create focal stacks of varying sizes. Compared to existing benchmarks our dataset is 25 times larger, enabling the use of machine learning for this inverse problem. We compare our results with state-of-the-art DFF methods and we also analyze the effect of several key deep architectural components. These experiments show that our proposed method `DDFFNet' achieves state-of-the-art performance in all scenes, reducing depth error by more than 75% compared to the classical DFF methods.Comment: accepted to Asian Conference on Computer Vision (ACCV) 201

    Occupancy Analysis of the Outdoor Football Fields

    Get PDF

    Light Field Salient Object Detection: A Review and Benchmark

    Full text link
    Salient object detection (SOD) is a long-standing research topic in computer vision and has drawn an increasing amount of research interest in the past decade. This paper provides the first comprehensive review and benchmark for light field SOD, which has long been lacking in the saliency community. Firstly, we introduce preliminary knowledge on light fields, including theory and data forms, and then review existing studies on light field SOD, covering ten traditional models, seven deep learning-based models, one comparative study, and one brief review. Existing datasets for light field SOD are also summarized with detailed information and statistical analyses. Secondly, we benchmark nine representative light field SOD models together with several cutting-edge RGB-D SOD models on four widely used light field datasets, from which insightful discussions and analyses, including a comparison between light field SOD and RGB-D SOD models, are achieved. Besides, due to the inconsistency of datasets in their current forms, we further generate complete data and supplement focal stacks, depth maps and multi-view images for the inconsistent datasets, making them consistent and unified. Our supplemental data makes a universal benchmark possible. Lastly, because light field SOD is quite a special problem attributed to its diverse data representations and high dependency on acquisition hardware, making it differ greatly from other saliency detection tasks, we provide nine hints into the challenges and future directions, and outline several open issues. We hope our review and benchmarking could help advance research in this field. All the materials including collected models, datasets, benchmarking results, and supplemented light field datasets will be publicly available on our project site https://github.com/kerenfu/LFSOD-Survey

    Evidence for Diffuse Central Retinal Edema In Vivo in Diabetic Male Sprague Dawley Rats

    Get PDF
    Background: Investigations into the mechanism of diffuse retinal edema in diabetic subjects have been limited by a lack of animal models and techniques that co-localized retinal thickness and hydration in vivo. In this study we test the hypothesis that a previously reported supernormal central retinal thickness on MRI measured in experimental diabetic retinopathy in vivo represents a persistent and diffuse edema. Methodology/Principal Findings: In diabetic and age-matched control rats, and in rats experiencing dilutional hyponatremia (as a positive edema control), whole central retinal thickness, intraretinal water content and apparent diffusion coefficients (ADC, β€˜water mobility’) were measured in vivo using quantitative MRI methods. Glycated hemoglobin and retinal thickness ex vivo (histology) were also measured in control and diabetic groups. In the dilutional hyponatremia model, central retinal thickness and water content were supernormal by quantitative MRI, and intraretinal water mobility profiles changed in a manner consistent with intracellular edema. Groups of diabetic (2, 3, 4, 6, and 9 mo of diabetes), and age-matched controls were then investigated with MRI and all diabetic rats showed supernormal whole central retinal thickness. In a separate study in 4 mo diabetic rats (and controls), MRI retinal thickness and water content metrics were significantly greater than normal, and ADC was subnormal in the outer retina; the increase in retinal thickness was not detected histologically on sections of fixed and dehydrated retinas from these rats

    Exploiting Multimodal Information in Deep Learning

    Get PDF
    Humans are good at using multimodal information to perceive and to interact with the world. Such information includes visual, auditory, kinesthetic, etc. Despite the advancement in deep learning using single modality in the past decade, there are relatively fewer works focused on multimodal learning. Even with existing multimodal deep learning works, most of them focus on a small number of modalities. This dissertation will investigate various distinct forms of multi-modal learning: multiple visual modalities as input, audio-visual multimodal input, and visual and proprioceptive (kinesthetic) multimodal input. Specifically, in the first project we investigate synthesizing light fields from a single image and estimated depth. In the second project, we investigate face recognition for unconstrained videos with audio-visual multimodal inputs. Finally, we investigate learning to construct and use tools with visual, proprioceptive and kinesthetic multimodal inputs. In the first task, we investigate synthesizing light fields with a single RGB image and its estimated depth. Synthesizing novel views (light fields) from a single image is very challenging since the depth information is lost, and depth information is crucial for view synthesis. We propose to use a pre-trained model to estimate the depth, and then fuse the depth information together with the RGB image to generate the light fields. Our experiments showed that multimodal input (RGB image and depth) significantly improved the performance over the single image input. In the second task, we focus on learning face recognition for low quality videos. For low quality videos such as low-resolution online videos and surveillance videos, recognizing faces based on video frames alone is very challenging. We propose to use audio information in the video clip to aid in the face recognition task. To achieve this goal, we propose Audio-Visual Aggregation Network (AVAN) to aggregate audio features and visual features using an attention mechanism. Empirical results show that our approach using both visual and audio information significantly improves the face recognition accuracy on unconstrained videos. Finally, in the third task, we propose to use visual, proprioceptive and kinesthetic inputs to learn to construct and use tools. The use of tools in animals indicates high levels of cognitive capability, and, aside from humans, it is observed only in a small number of higher mammals and avian species, and constructing novel tools is an even more challenging task. Learning this task with only visual input is challenging, therefore, we propose to use visual and proprioceptive (kinesthetic) inputs to accelerate the learning. We build a physically simulated environment for tool construction task. We also introduce a hierarchical reinforcement learning approach to learn to construct tools and reach the target, without any prior knowledge. The main contribution of this dissertation is in the investigation of multiple scenarios where multimodal processing leads to enhanced performance. We expect the specific methods developed in this work, such as the extraction of hidden modalities (depth), use of attention, and hierarchical rewards, to help us better understand multimodal processing in deep learning

    Shape Dynamical Models for Activity Recognition and Coded Aperture Imaging for Light-Field Capture

    Get PDF
    Classical applications of Pattern recognition in image processing and computer vision have typically dealt with modeling, learning and recognizing static patterns in images and videos. There are, of course, in nature, a whole class of patterns that dynamically evolve over time. Human activities, behaviors of insects and animals, facial expression changes, lip reading, genetic expression profiles are some examples of patterns that are dynamic. Models and algorithms to study these patterns must take into account the dynamics of these patterns while exploiting the classical pattern recognition techniques. The first part of this dissertation is an attempt to model and recognize such dynamically evolving patterns. We will look at specific instances of such dynamic patterns like human activities, and behaviors of insects and develop algorithms to learn models of such patterns and classify such patterns. The models and algorithms proposed are validated by extensive experiments on gait-based person identification, activity recognition and simultaneous tracking and behavior analysis of insects. The problem of comparing dynamically deforming shape sequences arises repeatedly in problems like activity recognition and lip reading. We describe and evaluate parametric and non-parametric models for shape sequences. In particular, we emphasize the need to model activity execution rate variations and propose a non-parametric model that is insensitive to such variations. These models and the resulting algorithms are shown to be extremely effective for a wide range of applications from gait-based person identification to human action recognition. We further show that the shape dynamical models are not only effective for the problem of recognition, but also can be used as effective priors for the problem of simultaneous tracking and behavior analysis. We validate the proposed algorithm for performing simultaneous behavior analysis and tracking on videos of bees dancing in a hive. In the last part of this dissertaion, we investigate computational imaging, an emerging field where the process of image formation involves the use of a computer. The current trend in computational imaging is to capture as much information about the scene as possible during capture time so that appropriate images with varying focus, aperture, blur and colorimetric settings may be rendered as required. In this regard, capturing the 4D light-field as opposed to a 2D image allows us to freely vary viewpoint and focus at the time of rendering an image. In this dissertation, we describe a theoretical framework for reversibly modulating {4D} light fields using an attenuating mask in the optical path of a lens based camera. Based on this framework, we present a novel design to reconstruct the {4D} light field from a {2D} camera image without any additional refractive elements as required by previous light field cameras. The patterned mask attenuates light rays inside the camera instead of bending them, and the attenuation recoverably encodes the rays on the {2D} sensor. Our mask-equipped camera focuses just as a traditional camera to capture conventional {2D} photos at full sensor resolution, but the raw pixel values also hold a modulated {4D} light field. The light field can be recovered by rearranging the tiles of the {2D} Fourier transform of sensor values into {4D} planes, and computing the inverse Fourier transform. In addition, one can also recover the full resolution image information for the in-focus parts of the scene

    Remote refocusing light-sheet fluorescence microscopy for high-speed 2D and 3D imaging of calcium dynamics in cardiomyocytes

    Get PDF
    The high prevalence and poor prognosis of heart failure are two key drivers for research into cardiac electrophysiology and regeneration. Dyssynchrony in calcium release and loss of structural organization within individual cardiomyocytes (CM) has been linked to reduced contractile strength and arrhythmia. Correlating calcium dynamics and cell microstructure requires multidimensional imaging with high spatiotemporal resolution. In light-sheet fluorescence microscopy (LSFM), selective plane illumination enables fast optically sectioned imaging with lower phototoxicity, making it suitable for imaging subcellular dynamics. In this work, a custom remote refocusing LSFM system is applied to studying calcium dynamics in isolated CM, cardiac cell cultures and tissue slices. The spatial resolution of the LSFM system was modelled and experimentally characterized. Simulation of the illumination path in Zemax was used to estimate the light-sheet beam waist and confocal parameter. Automated MATLAB-based image analysis was used to quantify the optical sectioning and the 3D point spread function using Gaussian fitting of bead image intensity distributions. The results demonstrated improved and more uniform axial resolution and optical sectioning with the tighter focused beam used for axially swept light-sheet microscopy. High-speed dual-channel LSFM was used for 2D imaging of calcium dynamics in correlation with the t-tubule structure in left and right ventricle cardiomyocytes at 395 fps. The high spatio-temporal resolution enabled the characterization of calcium sparks. The use of para-nitro-blebbistatin (NBleb), a non-phototoxic, low fluorescence contraction uncoupler, allowed 2D-mapping of the spatial dyssynchrony of calcium transient development across the cell. Finally, aberration-free remote refocusing was used for high-speed volumetric imaging of calcium dynamics in human induced pluripotent stem-cell derived cardiomyocytes (hiPSC-CM) and their co-culture with adult-CM. 3D-imaging at up to 8 Hz demonstrated the synchronization of calcium transients in co-culture, with increased coupling with longer co-culture duration, uninhibited by motion uncoupling with NBleb.Open Acces

    Preclinical MRI of the Kidney

    Get PDF
    This Open Access volume provides readers with an open access protocol collection and wide-ranging recommendations for preclinical renal MRI used in translational research. The chapters in this book are interdisciplinary in nature and bridge the gaps between physics, physiology, and medicine. They are designed to enhance training in renal MRI sciences and improve the reproducibility of renal imaging research. Chapters provide guidance for exploring, using and developing small animal renal MRI in your laboratory as a unique tool for advanced in vivo phenotyping, diagnostic imaging, and research into potential new therapies. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls. Cutting-edge and thorough, Preclinical MRI of the Kidney: Methods and Protocols is a valuable resource and will be of importance to anyone interested in the preclinical aspect of renal and cardiorenal diseases in the fields of physiology, nephrology, radiology, and cardiology. This publication is based upon work from COST Action PARENCHIMA, supported by European Cooperation in Science and Technology (COST). COST (www.cost.eu) is a funding agency for research and innovation networks. COST Actions help connect research initiatives across Europe and enable scientists to grow their ideas by sharing them with their peers. This boosts their research, career and innovation. PARENCHIMA (renalmri.org) is a community-driven Action in the COST program of the European Union, which unites more than 200 experts in renal MRI from 30 countries with the aim to improve the reproducibility and standardization of renal MRI biomarkers
    • …
    corecore