65,778 research outputs found
Self-Supervised Relative Depth Learning for Urban Scene Understanding
As an agent moves through the world, the apparent motion of scene elements is
(usually) inversely proportional to their depth. It is natural for a learning
agent to associate image patterns with the magnitude of their displacement over
time: as the agent moves, faraway mountains don't move much; nearby trees move
a lot. This natural relationship between the appearance of objects and their
motion is a rich source of information about the world. In this work, we start
by training a deep network, using fully automatic supervision, to predict
relative scene depth from single images. The relative depth training images are
automatically derived from simple videos of cars moving through a scene, using
recent motion segmentation techniques, and no human-provided labels. This proxy
task of predicting relative depth from a single image induces features in the
network that result in large improvements in a set of downstream tasks
including semantic segmentation, joint road segmentation and car detection, and
monocular (absolute) depth estimation, over a network trained from scratch. The
improvement on the semantic segmentation task is greater than those produced by
any other automatically supervised methods. Moreover, for monocular depth
estimation, our unsupervised pre-training method even outperforms supervised
pre-training with ImageNet. In addition, we demonstrate benefits from learning
to predict (unsupervised) relative depth in the specific videos associated with
various downstream tasks. We adapt to the specific scenes in those tasks in an
unsupervised manner to improve performance. In summary, for semantic
segmentation, we present state-of-the-art results among methods that do not use
supervised pre-training, and we even exceed the performance of supervised
ImageNet pre-trained models for monocular depth estimation, achieving results
that are comparable with state-of-the-art methods
Incorporating Relaxivities to More Accurately Reconstruct MR Images
Purpose
To develop a mathematical model that incorporates the magnetic resonance relaxivities into the image reconstruction process in a single step.
Materials and methods
In magnetic resonance imaging, the complex-valued measurements of the acquired signal at each point in frequency space are expressed as a Fourier transformation of the proton spin density weighted by Fourier encoding anomalies: T2â, T1, and a phase determined by magnetic field inhomogeneity (âB) according to the MR signal equation. Such anomalies alter the expected symmetry and the signal strength of the k-space observations, resulting in images distorted by image warping, blurring, and loss in image intensity. Although T1 on tissue relaxation time provides valuable quantitative information on tissue characteristics, the T1 recovery term is typically neglected by assuming a long repetition time. In this study, the linear framework presented in the work of Rowe et al., 2007, and of Nencka et al., 2009 is extended to develop a Fourier reconstruction operation in terms of a real-valued isomorphism that incorporates the effects of T2â, âB, and T1. This framework provides a way to precisely quantify the statistical properties of the corrected image-space data by offering a linear relationship between the observed frequency space measurements and reconstructed corrected image-space measurements. The model is illustrated both on theoretical data generated by considering T2â, T1, and/or âB effects, and on experimentally acquired fMRI data by focusing on the incorporation of T1. A comparison is also made between the activation statistics computed from the reconstructed data with and without the incorporation of T1 effects.
Result
Accounting for T1 effects in image reconstruction is shown to recover image contrast that exists prior to T1 equilibrium. The incorporation of T1 is also shown to induce negligible correlation in reconstructed images and preserve functional activations.
Conclusion
With the use of the proposed method, the effects of T2â and âB can be corrected, and T1 can be incorporated into the time series image-space data during image reconstruction in a single step. Incorporation of T1 provides improved tissue segmentation over the course of time series and therefore can improve the precision of motion correction and image registration
Evaluation of Motion Artifact Metrics for Coronary CT Angiography
Purpose
This study quantified the performance of coronary artery motion artifact metrics relative to human observer ratings. Motion artifact metrics have been used as part of motion correction and bestâphase selection algorithms for Coronary Computed Tomography Angiography (CCTA). However, the lack of ground truth makes it difficult to validate how well the metrics quantify the level of motion artifact. This study investigated five motion artifact metrics, including two novel metrics, using a dynamic phantom, clinical CCTA images, and an observer study that provided groundâtruth motion artifact scores from a series of pairwise comparisons. Method
Five motion artifact metrics were calculated for the coronary artery regions on both phantom and clinical CCTA images: positivity, entropy, normalized circularity, Fold Overlap Ratio (FOR), and LowâIntensity Region Score (LIRS). CT images were acquired of a dynamic cardiac phantom that simulated cardiac motion and contained six iodineâfilled vessels of varying diameter and with regions of soft plaque and calcifications. Scans were repeated with different gantry start angles. Images were reconstructed at five phases of the motion cycle. Clinical images were acquired from 14 CCTA exams with patient heart rates ranging from 52 to 82 bpm. The vessel and shading artifacts were manually segmented by three readers and combined to create groundâtruth artifact regions. Motion artifact levels were also assessed by readers using a pairwise comparison method to establish a groundâtruth reader score. The Kendall\u27s Tau coefficients were calculated to evaluate the statistical agreement in ranking between the motion artifacts metrics and reader scores. Linear regression between the reader scores and the metrics was also performed. Results
On phantom images, the Kendall\u27s Tau coefficients of the five motion artifact metrics were 0.50 (normalized circularity), 0.35 (entropy), 0.82 (positivity), 0.77 (FOR), 0.77(LIRS), where higher Kendall\u27s Tau signifies higher agreement. The FOR, LIRS, and transformed positivity (the fourth root of the positivity) were further evaluated in the study of clinical images. The Kendall\u27s Tau coefficients of the selected metrics were 0.59 (FOR), 0.53 (LIRS), and 0.21 (Transformed positivity). In the study of clinical data, a Motion Artifact Score, defined as the product of FOR and LIRS metrics, further improved agreement with reader scores, with a Kendall\u27s Tau coefficient of 0.65. Conclusion
The metrics of FOR, LIRS, and the product of the two metrics provided the highest agreement in motion artifact ranking when compared to the readers, and the highest linear correlation to the reader scores. The validated motion artifact metrics may be useful for developing and evaluating methods to reduce motion in Coronary Computed Tomography Angiography (CCTA) images
Cellular tracking in time-lapse phase contrast images
The quantitative analysis of live cells is a key issue in evaluating biological processes. The current clinical practice involves the application of a tedious and time consuming manual tracking procedure on large amount of data. As a result, automatic tracking systems are currently developed and evaluated. However, problems caused by cellular division, agglomeration, Brownian motion and topology changes are difficult issues that have to be accommodated by automatic tracking techniques. In this paper, we detail the development of a fully automated multi-target tracking system that is able to deal with Brownian motion and cellular division. During the tracking process our approach includes the neighbourhood relationship and motion history to enforce the cellular tracking continuity in the spatial and temporal domain. The experimental results reported in this paper indicate that our method is able to accurately track cellular structures in time-lapse data
Joint Optical Flow and Temporally Consistent Semantic Segmentation
The importance and demands of visual scene understanding have been steadily
increasing along with the active development of autonomous systems.
Consequently, there has been a large amount of research dedicated to semantic
segmentation and dense motion estimation. In this paper, we propose a method
for jointly estimating optical flow and temporally consistent semantic
segmentation, which closely connects these two problem domains and leverages
each other. Semantic segmentation provides information on plausible physical
motion to its associated pixels, and accurate pixel-level temporal
correspondences enhance the accuracy of semantic segmentation in the temporal
domain. We demonstrate the benefits of our approach on the KITTI benchmark,
where we observe performance gains for flow and segmentation. We achieve
state-of-the-art optical flow results, and outperform all published algorithms
by a large margin on challenging, but crucial dynamic objects.Comment: 14 pages, Accepted for CVRSUAD workshop at ECCV 201
Dynamic Body VSLAM with Semantic Constraints
Image based reconstruction of urban environments is a challenging problem
that deals with optimization of large number of variables, and has several
sources of errors like the presence of dynamic objects. Since most large scale
approaches make the assumption of observing static scenes, dynamic objects are
relegated to the noise modeling section of such systems. This is an approach of
convenience since the RANSAC based framework used to compute most multiview
geometric quantities for static scenes naturally confine dynamic objects to the
class of outlier measurements. However, reconstructing dynamic objects along
with the static environment helps us get a complete picture of an urban
environment. Such understanding can then be used for important robotic tasks
like path planning for autonomous navigation, obstacle tracking and avoidance,
and other areas. In this paper, we propose a system for robust SLAM that works
in both static and dynamic environments. To overcome the challenge of dynamic
objects in the scene, we propose a new model to incorporate semantic
constraints into the reconstruction algorithm. While some of these constraints
are based on multi-layered dense CRFs trained over appearance as well as motion
cues, other proposed constraints can be expressed as additional terms in the
bundle adjustment optimization process that does iterative refinement of 3D
structure and camera / object motion trajectories. We show results on the
challenging KITTI urban dataset for accuracy of motion segmentation and
reconstruction of the trajectory and shape of moving objects relative to ground
truth. We are able to show average relative error reduction by a significant
amount for moving object trajectory reconstruction relative to state-of-the-art
methods like VISO 2, as well as standard bundle adjustment algorithms
Recursive image sequence segmentation by hierarchical models
This paper addresses the problem of image sequence segmentation. A technique using a sequence model based on compound random fields is presented. This technique is recursive in the sense that frames are processed in the same cadency as they are produced. New regions appearing in the sequence are detected by a morphological procedure.Peer ReviewedPostprint (published version
Unsupervised dynamic modeling of medical image transformation
Spatiotemporal imaging has applications in e.g. cardiac diagnostics, surgical
guidance, and radiotherapy monitoring, In this paper, we explain the temporal
motion by identifying the underlying dynamics, only based on the sequential
images. Our dynamical model maps the inputs of observed high-dimensional
sequential images to a low-dimensional latent space wherein a linear
relationship between a hidden state process and the lower-dimensional
representation of the inputs holds. For this, we use a conditional variational
auto-encoder (CVAE) to nonlinearly map the higher-dimensional image to a
lower-dimensional space, wherein we model the dynamics with a linear Gaussian
state-space model (LG-SSM). The model, a modified version of the Kalman
variational auto-encoder, is end-to-end trainable, and the weights, both in the
CVAE and LG-SSM, are simultaneously updated by maximizing the evidence lower
bound of the marginal likelihood. In contrast to the original model, we explain
the motion with a spatial transformation from one image to another. This
results in sharper reconstructions and the possibility of transferring
auxiliary information, such as segmentation, through the image sequence. Our
experiments, on cardiac ultrasound time series, show that the dynamic model
outperforms traditional image registration in execution time, to a similar
performance. Further, our model offers the possibility to impute and
extrapolate for missing samples.Comment: published in 2022 25th International Conference on Information Fusion
(FUSION
- âŠ