822 research outputs found

    Saliency-aware Stereoscopic Video Retargeting

    Full text link
    Stereo video retargeting aims to resize an image to a desired aspect ratio. The quality of retargeted videos can be significantly impacted by the stereo videos spatial, temporal, and disparity coherence, all of which can be impacted by the retargeting process. Due to the lack of a publicly accessible annotated dataset, there is little research on deep learning-based methods for stereo video retargeting. This paper proposes an unsupervised deep learning-based stereo video retargeting network. Our model first detects the salient objects and shifts and warps all objects such that it minimizes the distortion of the salient parts of the stereo frames. We use 1D convolution for shifting the salient objects and design a stereo video Transformer to assist the retargeting process. To train the network, we use the parallax attention mechanism to fuse the left and right views and feed the retargeted frames to a reconstruction module that reverses the retargeted frames to the input frames. Therefore, the network is trained in an unsupervised manner. Extensive qualitative and quantitative experiments and ablation studies on KITTI stereo 2012 and 2015 datasets demonstrate the efficiency of the proposed method over the existing state-of-the-art methods. The code is available at https://github.com/z65451/SVR/.Comment: 8 pages excluding references. CVPRW conferenc

    Heterogeneous volumetric data mapping and its medical applications

    Get PDF
    With the advance of data acquisition techniques, massive solid geometries are being collected routinely in scientific tasks, these complex and unstructured data need to be effectively correlated for various processing and analysis. Volumetric mapping solves bijective low-distortion correspondence between/among 3D geometric data, and can serve as an important preprocessing step in many tasks in compute-aided design and analysis, industrial manufacturing, medical image analysis, to name a few. This dissertation studied two important volumetric mapping problems: the mapping of heterogeneous volumes (with nonuniform inner structures/layers) and the mapping of sequential dynamic volumes. To effectively handle heterogeneous volumes, first, we studied the feature-aligned harmonic volumetric mapping. Compared to previous harmonic mapping, it supports the point, curve, and iso-surface alignment, which are important low-dimensional structures in heterogeneous volumetric data. Second, we proposed a biharmonic model for volumetric mapping. Unlike the conventional harmonic volumetric mapping that only supports positional continuity on the boundary, this new model allows us to have higher order continuity C1C^1 along the boundary surface. This suggests a potential model to solve the volumetric mapping of complex and big geometries through divide-and-conquer. We also studied the medical applications of our volumetric mapping in lung tumor respiratory motion modeling. We were building an effective digital platform for lung tumor radiotherapy based on effective volumetric CT/MRI image matching and analysis. We developed and integrated in this platform a set of geometric/image processing techniques including advanced image segmentation, finite element meshing, volumetric registration and interpolation. The lung organ/tumor and surrounding tissues are treated as a heterogeneous region and a dynamic 4D registration framework is developed for lung tumor motion modeling and tracking. Compared to the previous 3D pairwise registration, our new 4D parameterization model leads to a significantly improved registration accuracy. The constructed deforming model can hence approximate the deformation of the tissues and tumor

    Deformable Image Registration for Hyperspectral Images

    Get PDF
    Image registration is one of the basic image processing operations in remote sensing. A hyperspectral image has two spatial dimensions and one spectral dimension. There are many hyperspectral sensors used in remote sensing. Traditional intensity-based registration methods may fail for hyperspectral images because of the different spectral sensitivities for different sensors. In addition, not all spectral bands are required to achieve accurate registration. This thesis develops a modification of the large deformation diffeomorphic metric mappings (LDDMM) algorithm in order to deal with the challenges when applied to hyperspectral images. The transformation generated by our method that deforms one image to match the other is differentiable, isomorphic and invertible. We also propose a mutual information based band selection algorithm to reduce the data redundancy of the hyperspectral images. The approach is applied to two hyperspectral images from OMEGA instrument, with a better matching result than original LDDMM method with respect to mutual information

    Shape Analysis Using Spectral Geometry

    Get PDF
    Shape analysis is a fundamental research topic in computer graphics and computer vision. To date, more and more 3D data is produced by those advanced acquisition capture devices, e.g., laser scanners, depth cameras, and CT/MRI scanners. The increasing data demands advanced analysis tools including shape matching, retrieval, deformation, etc. Nevertheless, 3D Shapes are represented with Euclidean transformations such as translation, scaling, and rotation and digital mesh representations are irregularly sampled. The shape can also deform non-linearly and the sampling may vary. In order to address these challenging problems, we investigate Laplace-Beltrami shape spectra from the differential geometry perspective, focusing more on the intrinsic properties. In this dissertation, the shapes are represented with 2 manifolds, which are differentiable. First, we discuss in detail about the salient geometric feature points in the Laplace-Beltrami spectral domain instead of traditional spatial domains. Simultaneously, the local shape descriptor of a feature point is the Laplace-Beltrami spectrum of the spatial region associated to the point, which are stable and distinctive. The salient spectral geometric features are invariant to spatial Euclidean transforms, isometric deformations and mesh triangulations. Both global and partial matching can be achieved with these salient feature points. Next, we introduce a novel method to analyze a set of poses, i.e., near-isometric deformations, of 3D models that are unregistered. Different shapes of poses are transformed from the 3D spatial domain to a geometry spectral one where all near isometric deformations, mesh triangulations and Euclidean transformations are filtered away. Semantic parts of that model are then determined based on the computed geometric properties of all the mapped vertices in the geometry spectral domain while semantic skeleton can be automatically built with joints detected. Finally we prove the shape spectrum is a continuous function to a scale function on the conformal factor of the manifold. The derivatives of the eigenvalues are analytically expressed with those of the scale function. The property applies to both continuous domain and discrete triangle meshes. On the triangle meshes, a spectrum alignment algorithm is developed. Given two closed triangle meshes, the eigenvalues can be aligned from one to the other and the eigenfunction distributions are aligned as well. This extends the shape spectra across non-isometric deformations, supporting a registration-free analysis of general motion data

    Multigranularity Representations for Human Inter-Actions: Pose, Motion and Intention

    Get PDF
    Tracking people and their body pose in videos is a central problem in computer vision. Standard tracking representations reason about temporal coherence of detected people and body parts. They have difficulty tracking targets under partial occlusions or rare body poses, where detectors often fail, since the number of training examples is often too small to deal with the exponential variability of such configurations. We propose tracking representations that track and segment people and their body pose in videos by exploiting information at multiple detection and segmentation granularities when available, whole body, parts or point trajectories. Detections and motion estimates provide contradictory information in case of false alarm detections or leaking motion affinities. We consolidate contradictory information via graph steering, an algorithm for simultaneous detection and co-clustering in a two-granularity graph of motion trajectories and detections, that corrects motion leakage between correctly detected objects, while being robust to false alarms or spatially inaccurate detections. We first present a motion segmentation framework that exploits long range motion of point trajectories and large spatial support of image regions. We show resulting video segments adapt to targets under partial occlusions and deformations. Second, we augment motion-based representations with object detection for dealing with motion leakage. We demonstrate how to combine dense optical flow trajectory affinities with repulsions from confident detections to reach a global consensus of detection and tracking in crowded scenes. Third, we study human motion and pose estimation. We segment hard to detect, fast moving body limbs from their surrounding clutter and match them against pose exemplars to detect body pose under fast motion. We employ on-the-fly human body kinematics to improve tracking of body joints under wide deformations. We use motion segmentability of body parts for re-ranking a set of body joint candidate trajectories and jointly infer multi-frame body pose and video segmentation. We show empirically that such multi-granularity tracking representation is worthwhile, obtaining significantly more accurate multi-object tracking and detailed body pose estimation in popular datasets

    Generalized intrinsic symmetry detection

    Get PDF
    In this paper, we address the problem of detecting partial symmetries in 3D objects. In contrast to previous work, our algorithm is able to match deformed symmetric parts: We first develop an algorithm for the case of approximately isometric deformations, based on matching graphs of surface feature lines that are annotated with intrinsic geometric properties. The sensitivity to non-isometry is controlled by tolerance parameters for each such annotation. Using large tolerance values for some of these annotations and a robust matching of the graph topology yields a more general symmetry detection algorithm that can detect similarities in structures that have undergone strong deformations. This approach for the first time allows for detecting partial intrinsic as well as more general, non-isometric symmetries. We evaluate the recognition performance of our technique for a number synthetic and real-world scanner data sets

    Comfort-driven disparity adjustment for stereoscopic video

    Get PDF
    Pixel disparity—the offset of corresponding pixels between left and right views—is a crucial parameter in stereoscopic three-dimensional (S3D) video, as it determines the depth perceived by the human visual system (HVS). Unsuitable pixel disparity distribution throughout an S3D video may lead to visual discomfort. We present a unified and extensible stereoscopic video disparity adjustment framework which improves the viewing experience for an S3D video by keeping the perceived 3D appearance as unchanged as possible while minimizing discomfort. We first analyse disparity and motion attributes of S3D video in general, then derive a wide-ranging visual discomfort metric from existing perceptual comfort models. An objective function based on this metric is used as the basis of a hierarchical optimisation method to find a disparity mapping function for each input video frame. Warping-based disparity manipulation is then applied to the input video to generate the output video, using the desired disparity mappings as constraints. Our comfort metric takes into account disparity range, motion, and stereoscopic window violation; the framework could easily be extended to use further visual comfort models. We demonstrate the power of our approach using both animated cartoons and real S3D videos

    Using building and bridge information for adapting roads to ALS data by means of network snakes

    Get PDF
    In the German Authoritative Topographic Cartographic Information System (ATKIS), the 2D positions and the heights of objects such as roads are stored separately in the digital landscape model (DLM) and digital terrain model (DTM), which is often acquired by airborne laser scanning (ALS). However, an increasing number of applications require a combined processing and visualization of these two data sets. Due to different kinds of acquisition, processing, and modelling discrepancies exist between the DTM and DLM and thus a simple integration may lead to semantically incorrect 3D objects. For example, roads may be situated on strongly tilted DTM parts and rivers sometimes flow uphill. In this paper we propose an algorithm for the adaptation of 2D road centrelines to ALS data by means of network snakes. Generally, the image energy for the snakes is defined based on ALS intensity and height information and derived products. Additionally, buildings and bridges as strong features in height data are exploited in order to support the road adaptation process. Extracted buildings as priors modified by a distance transform are used to create a force of repulsion for the road vectors integrated in the image energy. In contrast, bridges give strong evidence for the correct road position in the height data. Therefore, the image energy is adapted for the bridge points. For that purpose bridge detection in the DTM is performed starting from an approximate position using template matching. Examples are given which apply the concept of network-snakes with new image energy for the adaptation of road networks to ALS data taking advantage of the prior known topology
    • …
    corecore