43 research outputs found

    Rethinking Pseudo-LiDAR Representation

    Full text link
    The recently proposed pseudo-LiDAR based 3D detectors greatly improve the benchmark of monocular/stereo 3D detection task. However, the underlying mechanism remains obscure to the research community. In this paper, we perform an in-depth investigation and observe that the efficacy of pseudo-LiDAR representation comes from the coordinate transformation, instead of data representation itself. Based on this observation, we design an image based CNN detector named Patch-Net, which is more generalized and can be instantiated as pseudo-LiDAR based 3D detectors. Moreover, the pseudo-LiDAR data in our PatchNet is organized as the image representation, which means existing 2D CNN designs can be easily utilized for extracting deep features from input data and boosting 3D detection performance. We conduct extensive experiments on the challenging KITTI dataset, where the proposed PatchNet outperforms all existing pseudo-LiDAR based counterparts. Code has been made available at: https://github.com/xinzhuma/patchnet.Comment: ECCV2020. Supplemental Material attache

    PatchNet: a patch-based image representation for interactive library-driven image editing

    Get PDF
    We introduce PatchNets, a compact, hierarchical representation describing structural and appearance characteristics of image regions, for use in image editing. In a PatchNet, an image region with coherent appearance is summarized by a graph node, associated with a single representative patch, while geometric relationships between different regions are encoded by labelled graph edges giving contextual information. The hierarchical structure of a PatchNet allows a coarse-to-fine description of the image. We show how this PatchNet representation can be used as a basis for interactive, library-driven, image editing. The user draws rough sketches to quickly specify editing constraints for the target image. The system then automatically queries an image library to find semantically-compatible candidate regions to meet the editing goal. Contextual image matching is performed using the PatchNet representation, allowing suitable regions to be found and applied in a few seconds, even from a library containing thousands of images

    Scene Consistency Verification Based on PatchNet

    Get PDF
    In the real world, the object does not exist in isolation, and it always appears in a certain scene. Usually the object is fixed in a particular scene and even in special spatial location. In this paper, we propose a method for judging scene consistency effectively. Scene semantics and geometry relation play a key role. In this paper, we use PatchNet to deal with these high-level scene structures. We construct a consistent scene database, using semantic information of PatchNet to determine whether the scene is consistent. The effectiveness of the proposed algorithm is verified by a lot of experiments

    Image-based 3D Object Detection for Autonomous Driving

    Get PDF
    Autonomous driving has the potential to radically change people’s lives, improving mobility and reducing travel time, energy consumption, and emissions. As one of the key enabling technologies for autonomous driving, image-based 3D object detection has received a lot of attention and gradually becomes a hot research topic. In this thesis, we review existing image-based 3D object detection models and propose novel taxonomies to help readers understand common pipelines in this area. We also create a simple baseline model to identify and address the key challenge: 'localization error.' In addition to the ‘result-lifting’ method, we introduce a successful 'pseudo-LiDAR' approach that outperforms other methods. We show that its effectiveness lies in coordinate transformation rather than data representation. We also show how to use LiDAR signals to guide the image-based models. Particularly, we propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors, without introducing any extra cost in the inference phase. We first transform the LiDAR signals into the image representation and train a LiDAR model with the same architecture as the baseline model. This LiDAR model can serve as the teacher to transfer the learned knowledge to the image model, and the experiments show the effectiveness of our scheme. Moreover, to leverage the massive unlabeled data, we also investigate how to apply image-based 3D detection in the semi-supervised setting with the help of LiDAR signals. In summary, in this thesis, we thoroughly review existing image-based 3D detection models and propose new image-based 3D detection paradigms with promising performances. Besides, we also show how to use auxiliary LiDAR signals to guide the image-based model learning spatial features and achieve semi-supervised learning. Finally, we discuss open questions in this research field and point out several promising research directions

    Faithful completion of images of scenic landmarks using internet images

    Get PDF
    Abstract—Previous works on image completion typically aim to produce visually plausible results rather than factually correct ones. In this paper, we propose an approach to faithfully complete the missing regions of an image. We assume that the input image is taken at a well-known landmark, so similar images taken at the same location can be easily found on the Internet. We first download thousands of images from the Internet using a text label provided by the user. Next, we apply two-step filtering to reduce them to a small set of candidate images for use as source images for completion. For each candidate image, a co-matching algorithm is used to find correspondences of both points and lines between the candidate image and the input image. These are used to find an optimal warp relating the two images. A completion result is obtained by blending the warped candidate image into the missing region of the input image. The completion results are ranked according to combination score, which considers both warping and blending energy, and the highest ranked ones are shown to the user. Experiments and results demonstrate that our method can faithfully complete images
    corecore