12 research outputs found

    Delving into Crispness: Guided Label Refinement for Crisp Edge Detection

    Full text link
    Learning-based edge detection usually suffers from predicting thick edges. Through extensive quantitative study with a new edge crispness measure, we find that noisy human-labeled edges are the main cause of thick predictions. Based on this observation, we advocate that more attention should be paid on label quality than on model design to achieve crisp edge detection. To this end, we propose an effective Canny-guided refinement of human-labeled edges whose result can be used to train crisp edge detectors. Essentially, it seeks for a subset of over-detected Canny edges that best align human labels. We show that several existing edge detectors can be turned into a crisp edge detector through training on our refined edge maps. Experiments demonstrate that deep models trained with refined edges achieve significant performance boost of crispness from 17.4% to 30.6%. With the PiDiNet backbone, our method improves ODS and OIS by 12.2% and 12.6% on the Multicue dataset, respectively, without relying on non-maximal suppression. We further conduct experiments and show the superiority of our crisp edge detection for optical flow estimation and image segmentation.Comment: Accepted by TI

    Tensorformer: Normalized Matrix Attention Transformer for High-quality Point Cloud Reconstruction

    Full text link
    Surface reconstruction from raw point clouds has been studied for decades in the computer graphics community, which is highly demanded by modeling and rendering applications nowadays. Classic solutions, such as Poisson surface reconstruction, require point normals as extra input to perform reasonable results. Modern transformer-based methods can work without normals, while the results are less fine-grained due to limited encoding performance in local fusion from discrete points. We introduce a novel normalized matrix attention transformer (Tensorformer) to perform high-quality reconstruction. The proposed matrix attention allows for simultaneous point-wise and channel-wise message passing, while the previous vector attention loses neighbor point information across different channels. It brings more degree of freedom in feature learning and thus facilitates better modeling of local geometries. Our method achieves state-of-the-art on two commonly used datasets, ShapeNetCore and ABC, and attains 4% improvements on IOU on ShapeNet. Our implementation will be released upon acceptance

    2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds

    Full text link
    The commonly adopted detect-then-match approach to registration finds difficulties in the cross-modality cases due to the incompatible keypoint detection and inconsistent feature description. We propose, 2D3D-MATR, a detection-free method for accurate and robust registration between images and point clouds. Our method adopts a coarse-to-fine pipeline where it first computes coarse correspondences between downsampled patches of the input image and the point cloud and then extends them to form dense correspondences between pixels and points within the patch region. The coarse-level patch matching is based on transformer which jointly learns global contextual constraints with self-attention and cross-modality correlations with cross-attention. To resolve the scale ambiguity in patch matching, we construct a multi-scale pyramid for each image patch and learn to find for each point patch the best matching image patch at a proper resolution level. Extensive experiments on two public benchmarks demonstrate that 2D3D-MATR outperforms the previous state-of-the-art P2-Net by around 2020 percentage points on inlier ratio and over 1010 points on registration recall. Our code and models are available at https://github.com/minhaolee/2D3DMATR.Comment: Accepted by ICCV 202

    6DOF Pose Estimation of a 3D Rigid Object based on Edge-enhanced Point Pair Features

    Full text link
    The point pair feature (PPF) is widely used for 6D pose estimation. In this paper, we propose an efficient 6D pose estimation method based on the PPF framework. We introduce a well-targeted down-sampling strategy that focuses more on edge area for efficient feature extraction of complex geometry. A pose hypothesis validation approach is proposed to resolve the symmetric ambiguity by calculating edge matching degree. We perform evaluations on two challenging datasets and one real-world collected dataset, demonstrating the superiority of our method on pose estimation of geometrically complex, occluded, symmetrical objects. We further validate our method by applying it to simulated punctures.Comment: 16 pages,20 figure

    Image layer separation and application

    Get PDF
    Image layer separation is an important step for image understanding and facilitates many image processing applications. It aims to separate a single image into multiple image layers, decomposing different components of the image. Image layers are either physics-based layers such as the reflectance layer in intrinsic image decomposition, or semantic layers such as the occlusion layer in image de-hazing, raindrop removal problems. Since the number of unknowns is at least twice that of the inputs, image layer separation problems are ill-posed and challenging. In order to solve such ill-posed problems, traditional methods acquire additional constraints based on prior knowledge, and recent deep learning methods rely on training data. In this thesis, we propose an optimization-based method based on handcrafted priors for video de-fencing (separating fence-like occlusion layers from dynamic videos), and an unsupervised deep learning training scheme for utilizing unlabeled real images from the Internet, which is applied on highlight separation and intrinsic image decomposition. Traditional methods make assumptions based on observations and priors to acquire additional constraints and solve it as an optimization problem. In this thesis, we solve video de-fencing by a novel bottom-up pipeline based on such traditional optimization-based method. We present a fully automatic approach to detect and segment fence-like occluders from a video clip. Unlike previous approaches that usually assume either static scenes or cameras, our method is capable of handling both dynamic scenes and moving cameras. After that, we introduce the main challenges of recent deep learning methods for image layer separation, which is the lack of real-world training data with ground truth. Thus, we propose an unsupervised training scheme for training the network on unlabeled real images. This unsupervised training scheme is then applied to two image layer separation problems, which are highlight separation for facial images trained from celebrity photos, and non-Lambertian intrinsic image decomposition trained from customer product photos. Finally, we demonstrate one application from separated image layers, where we use faces as light probes to estimate the environment illumination. It is important for mixed reality applications, such as inserting virtual objects into real photos. Our technique estimates illumination at high precision in the form of a non-parametric environment map, and it works well for both indoor and outdoor scenes

    STEdge: Self-training Edge Detection with Multi-layer Teaching and Regularization

    Full text link
    Learning-based edge detection has hereunto been strongly supervised with pixel-wise annotations which are tedious to obtain manually. We study the problem of self-training edge detection, leveraging the untapped wealth of large-scale unlabeled image datasets. We design a self-supervised framework with multi-layer regularization and self-teaching. In particular, we impose a consistency regularization which enforces the outputs from each of the multiple layers to be consistent for the input image and its perturbed counterpart. We adopt L0-smoothing as the 'perturbation' to encourage edge prediction lying on salient boundaries following the cluster assumption in self-supervised learning. Meanwhile, the network is trained with multi-layer supervision by pseudo labels which are initialized with Canny edges and then iteratively refined by the network as the training proceeds. The regularization and self-teaching together attain a good balance of precision and recall, leading to a significant performance boost over supervised methods, with lightweight refinement on the target dataset. Furthermore, our method demonstrates strong cross-dataset generality. For example, it attains 4.8% improvement for ODS and 5.8% for OIS when tested on the unseen BIPED dataset, compared to the state-of-the-art methods

    Multi-Resolution Monocular Depth Map Fusion by Self-Supervised Gradient-Based Composition

    No full text
    Monocular depth estimation is a challenging problem on which deep neural networks have demonstrated great potential. However, depth maps predicted by existing deep models usually lack fine-grained details due to convolution operations and down-samplings in networks. We find that increasing input resolution is helpful to preserve more local details while the estimation at low resolution is more accurate globally. Therefore, we propose a novel depth map fusion module to combine the advantages of estimations with multi-resolution inputs. Instead of merging the low- and high-resolution estimations equally, we adopt the core idea of Poisson fusion, trying to implant the gradient domain of high-resolution depth into the low-resolution depth. While classic Poisson fusion requires a fusion mask as supervision, we propose a self-supervised framework based on guided image filtering. We demonstrate that this gradient-based composition performs much better at noisy immunity, compared with the state-of-the-art depth map fusion method. Our lightweight depth fusion is one-shot and runs in real-time, making it 80X faster than a state-of-the-art depth fusion method. Quantitative evaluations demonstrate that the proposed method can be integrated into many fully convolutional monocular depth estimation backbones with a significant performance boost, leading to state-of-the-art results of detail enhancement on depth maps. Codes are released at https://github.com/yuinsky/gradient-based-depth-map-fusion
    corecore