29,322 research outputs found

    Distance to Center of Mass Encoding for Instance Segmentation

    Full text link
    The instance segmentation can be considered an extension of the object detection problem where bounding boxes are replaced by object contours. Strictly speaking the problem requires to identify each pixel instance and class independently of the artifice used for this mean. The advantage of instance segmentation over the usual object detection lies in the precise delineation of objects improving object localization. Additionally, object contours allow the evaluation of partial occlusion with basic image processing algorithms. This work approaches the instance segmentation problem as an annotation problem and presents a novel technique to encode and decode ground truth annotations. We propose a mathematical representation of instances that any deep semantic segmentation model can learn and generalize. Each individual instance is represented by a center of mass and a field of vectors pointing to it. This encoding technique has been denominated Distance to Center of Mass Encoding (DCME)

    Object Detection and Localization in 2D & 3D Environment

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Computer vision is a science that studies how to make machines "see." It refers to utilizing vision sensors and computers to identify, locate, and track objects. Under this topic, this thesis proposed three frameworks to improve 2D and 3D object detection and localization performance. In the first 3D object detection framework, we investigated the bilateral convolution layers’ feasibility to alternate the widely used point cloud voxelization process. The second framework explored the voxel-wise and point-wise proposal fusions method to improve 3D object detection performance. For the 2D instance segmentation, the framework formed an NMS-free and anchor-free detector designed explicitly for the eye-to-hand robotic system. In existing works, most of the state-of-the-art 3D object detection approaches are based on the point clouds’ voxelization method to sample the point cloud into a subdivide voxel space. Although it provides an efficient way to process point cloud data, its lack of feature relationship on voxel-level limits the model’s detection accuracy. Furthermore, the voxel sizes hyperparameters tuning increased the model complexity, resulting in a fluctuated model performance. To this end, we aim to simplify the process by re-projecting the point cloud data onto a lattice hyper-plane that saves point cloud processing time while maintaining the model accuracy. The proposed framework Bilateral Lattice Point Network (BLPNet) is provided in chapter three. In the second framework, Point and Voxel Fusion Net (PVF-Net) is proposed to further push the 3D object detection performance forward. In two-stage approaches, increasing the first stage proposals recall rate positively influences the model final prediction performance. Therefore, in the PVF-Net, we proposed a twofold proposal fusion architecture to extract and fuse the voxel-level and point-level features of the point clouds. The model details are in chapter four, mainly consisting of two novel modules: the Twofold Proposal Fusion (TPF) module and the ROI Deep Fusion (RDF) module. Lastly, it is well-known that 3D and 2D sensors jointly depict the real world. In chapter five, 2D object detection will become the next goal for improvement. So far, the existing 2D instance segmentation algorithms developed significantly and reached a saturated performance. However, there is no solid solution for heavy occluded or diagonally arranged objects, especially in the vision-guided robot picking system. To solve the problem above, we proposed a real-time occlusion and oblique friendly instance segmentation framework, terms as Keypoint-Mask, assisting the robotic system to handle the complicated detection scenario

    Object-Oriented Dynamics Learning through Multi-Level Abstraction

    Full text link
    Object-based approaches for learning action-conditioned dynamics has demonstrated promise for generalization and interpretability. However, existing approaches suffer from structural limitations and optimization difficulties for common environments with multiple dynamic objects. In this paper, we present a novel self-supervised learning framework, called Multi-level Abstraction Object-oriented Predictor (MAOP), which employs a three-level learning architecture that enables efficient object-based dynamics learning from raw visual observations. We also design a spatial-temporal relational reasoning mechanism for MAOP to support instance-level dynamics learning and handle partial observability. Our results show that MAOP significantly outperforms previous methods in terms of sample efficiency and generalization over novel environments for learning environment models. We also demonstrate that learned dynamics models enable efficient planning in unseen environments, comparable to true environment models. In addition, MAOP learns semantically and visually interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial Intelligence (AAAI), 202

    Reversible Recursive Instance-level Object Segmentation

    Full text link
    In this work, we propose a novel Reversible Recursive Instance-level Object Segmentation (R2-IOS) framework to address the challenging instance-level object segmentation task. R2-IOS consists of a reversible proposal refinement sub-network that predicts bounding box offsets for refining the object proposal locations, and an instance-level segmentation sub-network that generates the foreground mask of the dominant object instance in each proposal. By being recursive, R2-IOS iteratively optimizes the two sub-networks during joint training, in which the refined object proposals and improved segmentation predictions are alternately fed into each other to progressively increase the network capabilities. By being reversible, the proposal refinement sub-network adaptively determines an optimal number of refinement iterations required for each proposal during both training and testing. Furthermore, to handle multiple overlapped instances within a proposal, an instance-aware denoising autoencoder is introduced into the segmentation sub-network to distinguish the dominant object from other distracting instances. Extensive experiments on the challenging PASCAL VOC 2012 benchmark well demonstrate the superiority of R2-IOS over other state-of-the-art methods. In particular, the APr\text{AP}^r over 2020 classes at 0.50.5 IoU achieves 66.7%66.7\%, which significantly outperforms the results of 58.7%58.7\% by PFN~\cite{PFN} and 46.3%46.3\% by~\cite{liu2015multi}.Comment: 9 page
    • …
    corecore