169 research outputs found
DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion
In this paper, we study the task of detecting semantic parts of an object,
e.g., a wheel of a car, under partial occlusion. We propose that all models
should be trained without seeing occlusions while being able to transfer the
learned knowledge to deal with occlusions. This setting alleviates the
difficulty in collecting an exponentially large dataset to cover occlusion
patterns and is more essential. In this scenario, the proposal-based deep
networks, like RCNN-series, often produce unsatisfactory results, because both
the proposal extraction and classification stages may be confused by the
irrelevant occluders. To address this, [25] proposed a voting mechanism that
combines multiple local visual cues to detect semantic parts. The semantic
parts can still be detected even though some visual cues are missing due to
occlusions. However, this method is manually-designed, thus is hard to be
optimized in an end-to-end manner.
In this paper, we present DeepVoting, which incorporates the robustness shown
by [25] into a deep network, so that the whole pipeline can be jointly
optimized. Specifically, it adds two layers after the intermediate features of
a deep network, e.g., the pool-4 layer of VGGNet. The first layer extracts the
evidence of local visual cues, and the second layer performs a voting mechanism
by utilizing the spatial relationship between visual cues and semantic parts.
We also propose an improved version DeepVoting+ by learning visual cues from
context outside objects. In experiments, DeepVoting achieves significantly
better performance than several baseline methods, including Faster-RCNN, for
semantic part detection under occlusion. In addition, DeepVoting enjoys
explainability as the detection results can be diagnosed via looking up the
voting cues
Detecting Semantic Parts on Partially Occluded Objects
In this paper, we address the task of detecting semantic parts on partially
occluded objects. We consider a scenario where the model is trained using
non-occluded images but tested on occluded images. The motivation is that there
are infinite number of occlusion patterns in real world, which cannot be fully
covered in the training data. So the models should be inherently robust and
adaptive to occlusions instead of fitting / learning the occlusion patterns in
the training data. Our approach detects semantic parts by accumulating the
confidence of local visual cues. Specifically, the method uses a simple voting
method, based on log-likelihood ratio tests and spatial constraints, to combine
the evidence of local cues. These cues are called visual concepts, which are
derived by clustering the internal states of deep networks. We evaluate our
voting scheme on the VehicleSemanticPart dataset with dense part annotations.
We randomly place two, three or four irrelevant objects onto the target object
to generate testing images with various occlusions. Experiments show that our
algorithm outperforms several competitors in semantic part detection when
occlusions are present.Comment: Accepted to BMVC 2017 (13 pages, 3 figures
Steps toward Parallel Intelligence
The origin of artificial intelligence is investigated, based on which the concepts of hybrid intelligence and parallel intelligence are presented. The paradigm shift in Intelligence indicates the "new normal" of cyber-social-physical systems (CPSS), in which the system behaviors are guided by Merton's Laws. Thus, the ACP-based parallel intelligence consisting of Artificial societies, Computational experiments and Parallel execution are introduced to bridge the big modeling gap in CPSS
A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans
Deep neural networks have been widely adopted for automatic organ
segmentation from abdominal CT scans. However, the segmentation accuracy of
some small organs (e.g., the pancreas) is sometimes below satisfaction,
arguably because deep networks are easily disrupted by the complex and variable
background regions which occupies a large fraction of the input volume. In this
paper, we formulate this problem into a fixed-point model which uses a
predicted segmentation mask to shrink the input region. This is motivated by
the fact that a smaller input region often leads to more accurate segmentation.
In the training process, we use the ground-truth annotation to generate
accurate input regions and optimize network weights. On the testing stage, we
fix the network parameters and update the segmentation results in an iterative
manner. We evaluate our approach on the NIH pancreas segmentation dataset, and
outperform the state-of-the-art by more than 4%, measured by the average
Dice-S{\o}rensen Coefficient (DSC). In addition, we report 62.43% DSC in the
worst case, which guarantees the reliability of our approach in clinical
applications.Comment: Accepted to MICCAI 2017 (8 pages, 3 figures
- …