65,778 research outputs found

    Self-Supervised Relative Depth Learning for Urban Scene Understanding

    Full text link
    As an agent moves through the world, the apparent motion of scene elements is (usually) inversely proportional to their depth. It is natural for a learning agent to associate image patterns with the magnitude of their displacement over time: as the agent moves, faraway mountains don't move much; nearby trees move a lot. This natural relationship between the appearance of objects and their motion is a rich source of information about the world. In this work, we start by training a deep network, using fully automatic supervision, to predict relative scene depth from single images. The relative depth training images are automatically derived from simple videos of cars moving through a scene, using recent motion segmentation techniques, and no human-provided labels. This proxy task of predicting relative depth from a single image induces features in the network that result in large improvements in a set of downstream tasks including semantic segmentation, joint road segmentation and car detection, and monocular (absolute) depth estimation, over a network trained from scratch. The improvement on the semantic segmentation task is greater than those produced by any other automatically supervised methods. Moreover, for monocular depth estimation, our unsupervised pre-training method even outperforms supervised pre-training with ImageNet. In addition, we demonstrate benefits from learning to predict (unsupervised) relative depth in the specific videos associated with various downstream tasks. We adapt to the specific scenes in those tasks in an unsupervised manner to improve performance. In summary, for semantic segmentation, we present state-of-the-art results among methods that do not use supervised pre-training, and we even exceed the performance of supervised ImageNet pre-trained models for monocular depth estimation, achieving results that are comparable with state-of-the-art methods

    Incorporating Relaxivities to More Accurately Reconstruct MR Images

    Get PDF
    Purpose To develop a mathematical model that incorporates the magnetic resonance relaxivities into the image reconstruction process in a single step. Materials and methods In magnetic resonance imaging, the complex-valued measurements of the acquired signal at each point in frequency space are expressed as a Fourier transformation of the proton spin density weighted by Fourier encoding anomalies: T2⁎, T1, and a phase determined by magnetic field inhomogeneity (∆B) according to the MR signal equation. Such anomalies alter the expected symmetry and the signal strength of the k-space observations, resulting in images distorted by image warping, blurring, and loss in image intensity. Although T1 on tissue relaxation time provides valuable quantitative information on tissue characteristics, the T1 recovery term is typically neglected by assuming a long repetition time. In this study, the linear framework presented in the work of Rowe et al., 2007, and of Nencka et al., 2009 is extended to develop a Fourier reconstruction operation in terms of a real-valued isomorphism that incorporates the effects of T2⁎, ∆B, and T1. This framework provides a way to precisely quantify the statistical properties of the corrected image-space data by offering a linear relationship between the observed frequency space measurements and reconstructed corrected image-space measurements. The model is illustrated both on theoretical data generated by considering T2⁎, T1, and/or ∆B effects, and on experimentally acquired fMRI data by focusing on the incorporation of T1. A comparison is also made between the activation statistics computed from the reconstructed data with and without the incorporation of T1 effects. Result Accounting for T1 effects in image reconstruction is shown to recover image contrast that exists prior to T1 equilibrium. The incorporation of T1 is also shown to induce negligible correlation in reconstructed images and preserve functional activations. Conclusion With the use of the proposed method, the effects of T2⁎ and ∆B can be corrected, and T1 can be incorporated into the time series image-space data during image reconstruction in a single step. Incorporation of T1 provides improved tissue segmentation over the course of time series and therefore can improve the precision of motion correction and image registration

    Evaluation of Motion Artifact Metrics for Coronary CT Angiography

    Get PDF
    Purpose This study quantified the performance of coronary artery motion artifact metrics relative to human observer ratings. Motion artifact metrics have been used as part of motion correction and best‐phase selection algorithms for Coronary Computed Tomography Angiography (CCTA). However, the lack of ground truth makes it difficult to validate how well the metrics quantify the level of motion artifact. This study investigated five motion artifact metrics, including two novel metrics, using a dynamic phantom, clinical CCTA images, and an observer study that provided ground‐truth motion artifact scores from a series of pairwise comparisons. Method Five motion artifact metrics were calculated for the coronary artery regions on both phantom and clinical CCTA images: positivity, entropy, normalized circularity, Fold Overlap Ratio (FOR), and Low‐Intensity Region Score (LIRS). CT images were acquired of a dynamic cardiac phantom that simulated cardiac motion and contained six iodine‐filled vessels of varying diameter and with regions of soft plaque and calcifications. Scans were repeated with different gantry start angles. Images were reconstructed at five phases of the motion cycle. Clinical images were acquired from 14 CCTA exams with patient heart rates ranging from 52 to 82 bpm. The vessel and shading artifacts were manually segmented by three readers and combined to create ground‐truth artifact regions. Motion artifact levels were also assessed by readers using a pairwise comparison method to establish a ground‐truth reader score. The Kendall\u27s Tau coefficients were calculated to evaluate the statistical agreement in ranking between the motion artifacts metrics and reader scores. Linear regression between the reader scores and the metrics was also performed. Results On phantom images, the Kendall\u27s Tau coefficients of the five motion artifact metrics were 0.50 (normalized circularity), 0.35 (entropy), 0.82 (positivity), 0.77 (FOR), 0.77(LIRS), where higher Kendall\u27s Tau signifies higher agreement. The FOR, LIRS, and transformed positivity (the fourth root of the positivity) were further evaluated in the study of clinical images. The Kendall\u27s Tau coefficients of the selected metrics were 0.59 (FOR), 0.53 (LIRS), and 0.21 (Transformed positivity). In the study of clinical data, a Motion Artifact Score, defined as the product of FOR and LIRS metrics, further improved agreement with reader scores, with a Kendall\u27s Tau coefficient of 0.65. Conclusion The metrics of FOR, LIRS, and the product of the two metrics provided the highest agreement in motion artifact ranking when compared to the readers, and the highest linear correlation to the reader scores. The validated motion artifact metrics may be useful for developing and evaluating methods to reduce motion in Coronary Computed Tomography Angiography (CCTA) images

    Cellular tracking in time-lapse phase contrast images

    Get PDF
    The quantitative analysis of live cells is a key issue in evaluating biological processes. The current clinical practice involves the application of a tedious and time consuming manual tracking procedure on large amount of data. As a result, automatic tracking systems are currently developed and evaluated. However, problems caused by cellular division, agglomeration, Brownian motion and topology changes are difficult issues that have to be accommodated by automatic tracking techniques. In this paper, we detail the development of a fully automated multi-target tracking system that is able to deal with Brownian motion and cellular division. During the tracking process our approach includes the neighbourhood relationship and motion history to enforce the cellular tracking continuity in the spatial and temporal domain. The experimental results reported in this paper indicate that our method is able to accurately track cellular structures in time-lapse data

    Joint Optical Flow and Temporally Consistent Semantic Segmentation

    Full text link
    The importance and demands of visual scene understanding have been steadily increasing along with the active development of autonomous systems. Consequently, there has been a large amount of research dedicated to semantic segmentation and dense motion estimation. In this paper, we propose a method for jointly estimating optical flow and temporally consistent semantic segmentation, which closely connects these two problem domains and leverages each other. Semantic segmentation provides information on plausible physical motion to its associated pixels, and accurate pixel-level temporal correspondences enhance the accuracy of semantic segmentation in the temporal domain. We demonstrate the benefits of our approach on the KITTI benchmark, where we observe performance gains for flow and segmentation. We achieve state-of-the-art optical flow results, and outperform all published algorithms by a large margin on challenging, but crucial dynamic objects.Comment: 14 pages, Accepted for CVRSUAD workshop at ECCV 201

    Dynamic Body VSLAM with Semantic Constraints

    Full text link
    Image based reconstruction of urban environments is a challenging problem that deals with optimization of large number of variables, and has several sources of errors like the presence of dynamic objects. Since most large scale approaches make the assumption of observing static scenes, dynamic objects are relegated to the noise modeling section of such systems. This is an approach of convenience since the RANSAC based framework used to compute most multiview geometric quantities for static scenes naturally confine dynamic objects to the class of outlier measurements. However, reconstructing dynamic objects along with the static environment helps us get a complete picture of an urban environment. Such understanding can then be used for important robotic tasks like path planning for autonomous navigation, obstacle tracking and avoidance, and other areas. In this paper, we propose a system for robust SLAM that works in both static and dynamic environments. To overcome the challenge of dynamic objects in the scene, we propose a new model to incorporate semantic constraints into the reconstruction algorithm. While some of these constraints are based on multi-layered dense CRFs trained over appearance as well as motion cues, other proposed constraints can be expressed as additional terms in the bundle adjustment optimization process that does iterative refinement of 3D structure and camera / object motion trajectories. We show results on the challenging KITTI urban dataset for accuracy of motion segmentation and reconstruction of the trajectory and shape of moving objects relative to ground truth. We are able to show average relative error reduction by a significant amount for moving object trajectory reconstruction relative to state-of-the-art methods like VISO 2, as well as standard bundle adjustment algorithms

    Recursive image sequence segmentation by hierarchical models

    Get PDF
    This paper addresses the problem of image sequence segmentation. A technique using a sequence model based on compound random fields is presented. This technique is recursive in the sense that frames are processed in the same cadency as they are produced. New regions appearing in the sequence are detected by a morphological procedure.Peer ReviewedPostprint (published version

    Unsupervised dynamic modeling of medical image transformation

    Full text link
    Spatiotemporal imaging has applications in e.g. cardiac diagnostics, surgical guidance, and radiotherapy monitoring, In this paper, we explain the temporal motion by identifying the underlying dynamics, only based on the sequential images. Our dynamical model maps the inputs of observed high-dimensional sequential images to a low-dimensional latent space wherein a linear relationship between a hidden state process and the lower-dimensional representation of the inputs holds. For this, we use a conditional variational auto-encoder (CVAE) to nonlinearly map the higher-dimensional image to a lower-dimensional space, wherein we model the dynamics with a linear Gaussian state-space model (LG-SSM). The model, a modified version of the Kalman variational auto-encoder, is end-to-end trainable, and the weights, both in the CVAE and LG-SSM, are simultaneously updated by maximizing the evidence lower bound of the marginal likelihood. In contrast to the original model, we explain the motion with a spatial transformation from one image to another. This results in sharper reconstructions and the possibility of transferring auxiliary information, such as segmentation, through the image sequence. Our experiments, on cardiac ultrasound time series, show that the dynamic model outperforms traditional image registration in execution time, to a similar performance. Further, our model offers the possibility to impute and extrapolate for missing samples.Comment: published in 2022 25th International Conference on Information Fusion (FUSION
    • 

    corecore