30,281 research outputs found
Distance to Center of Mass Encoding for Instance Segmentation
The instance segmentation can be considered an extension of the object
detection problem where bounding boxes are replaced by object contours.
Strictly speaking the problem requires to identify each pixel instance and
class independently of the artifice used for this mean. The advantage of
instance segmentation over the usual object detection lies in the precise
delineation of objects improving object localization. Additionally, object
contours allow the evaluation of partial occlusion with basic image processing
algorithms. This work approaches the instance segmentation problem as an
annotation problem and presents a novel technique to encode and decode ground
truth annotations. We propose a mathematical representation of instances that
any deep semantic segmentation model can learn and generalize. Each individual
instance is represented by a center of mass and a field of vectors pointing to
it. This encoding technique has been denominated Distance to Center of Mass
Encoding (DCME)
Object Detection and Localization in 2D & 3D Environment
University of Technology Sydney. Faculty of Engineering and Information Technology.Computer vision is a science that studies how to make machines "see." It refers to utilizing vision sensors and computers to identify, locate, and track objects. Under this topic, this thesis proposed three frameworks to improve 2D and 3D object detection and localization performance. In the first 3D object detection framework, we investigated the bilateral convolution layers’ feasibility to alternate the widely used point cloud voxelization process. The second framework explored the voxel-wise and point-wise proposal fusions method to improve 3D object detection performance. For the 2D instance segmentation, the framework formed an NMS-free and anchor-free detector designed explicitly for the eye-to-hand robotic system.
In existing works, most of the state-of-the-art 3D object detection approaches are based on the point clouds’ voxelization method to sample the point cloud into a subdivide voxel space. Although it provides an efficient way to process point cloud data, its lack of feature relationship on voxel-level limits the model’s detection accuracy. Furthermore, the voxel sizes hyperparameters tuning increased the model complexity, resulting in a fluctuated model performance. To this end, we aim to simplify the process by re-projecting the point cloud data onto a lattice hyper-plane that saves point cloud processing time while maintaining the model accuracy. The proposed framework Bilateral Lattice Point Network (BLPNet) is provided in chapter three.
In the second framework, Point and Voxel Fusion Net (PVF-Net) is proposed to further push the 3D object detection performance forward. In two-stage approaches, increasing the first stage proposals recall rate positively influences the model final prediction performance. Therefore, in the PVF-Net, we proposed a twofold proposal fusion architecture to extract and fuse the voxel-level and point-level features of the point clouds. The model details are in chapter four, mainly consisting of two novel modules: the Twofold Proposal Fusion (TPF) module and the ROI Deep Fusion (RDF) module.
Lastly, it is well-known that 3D and 2D sensors jointly depict the real world. In chapter five, 2D object detection will become the next goal for improvement. So far, the existing 2D instance segmentation algorithms developed significantly and reached a saturated performance. However, there is no solid solution for heavy occluded or diagonally arranged objects, especially in the vision-guided robot picking system. To solve the problem above, we proposed a real-time occlusion and oblique friendly instance segmentation framework, terms as Keypoint-Mask, assisting the robotic system to handle the complicated detection scenario
Reversible Recursive Instance-level Object Segmentation
In this work, we propose a novel Reversible Recursive Instance-level Object
Segmentation (R2-IOS) framework to address the challenging instance-level
object segmentation task. R2-IOS consists of a reversible proposal refinement
sub-network that predicts bounding box offsets for refining the object proposal
locations, and an instance-level segmentation sub-network that generates the
foreground mask of the dominant object instance in each proposal. By being
recursive, R2-IOS iteratively optimizes the two sub-networks during joint
training, in which the refined object proposals and improved segmentation
predictions are alternately fed into each other to progressively increase the
network capabilities. By being reversible, the proposal refinement sub-network
adaptively determines an optimal number of refinement iterations required for
each proposal during both training and testing. Furthermore, to handle multiple
overlapped instances within a proposal, an instance-aware denoising autoencoder
is introduced into the segmentation sub-network to distinguish the dominant
object from other distracting instances. Extensive experiments on the
challenging PASCAL VOC 2012 benchmark well demonstrate the superiority of
R2-IOS over other state-of-the-art methods. In particular, the
over classes at IoU achieves , which significantly
outperforms the results of by PFN~\cite{PFN} and
by~\cite{liu2015multi}.Comment: 9 page
Object-Oriented Dynamics Learning through Multi-Level Abstraction
Object-based approaches for learning action-conditioned dynamics has
demonstrated promise for generalization and interpretability. However, existing
approaches suffer from structural limitations and optimization difficulties for
common environments with multiple dynamic objects. In this paper, we present a
novel self-supervised learning framework, called Multi-level Abstraction
Object-oriented Predictor (MAOP), which employs a three-level learning
architecture that enables efficient object-based dynamics learning from raw
visual observations. We also design a spatial-temporal relational reasoning
mechanism for MAOP to support instance-level dynamics learning and handle
partial observability. Our results show that MAOP significantly outperforms
previous methods in terms of sample efficiency and generalization over novel
environments for learning environment models. We also demonstrate that learned
dynamics models enable efficient planning in unseen environments, comparable to
true environment models. In addition, MAOP learns semantically and visually
interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial
Intelligence (AAAI), 202
- …