67,705 research outputs found
Ball-Scale Based Hierarchical Multi-Object Recognition in 3D Medical Images
This paper investigates, using prior shape models and the concept of ball
scale (b-scale), ways of automatically recognizing objects in 3D images without
performing elaborate searches or optimization. That is, the goal is to place
the model in a single shot close to the right pose (position, orientation, and
scale) in a given image so that the model boundaries fall in the close vicinity
of object boundaries in the image. This is achieved via the following set of
key ideas: (a) A semi-automatic way of constructing a multi-object shape model
assembly. (b) A novel strategy of encoding, via b-scale, the pose relationship
between objects in the training images and their intensity patterns captured in
b-scale images. (c) A hierarchical mechanism of positioning the model, in a
one-shot way, in a given image from a knowledge of the learnt pose relationship
and the b-scale image of the given image to be segmented. The evaluation
results on a set of 20 routine clinical abdominal female and male CT data sets
indicate the following: (1) Incorporating a large number of objects improves
the recognition accuracy dramatically. (2) The recognition algorithm can be
thought as a hierarchical framework such that quick replacement of the model
assembly is defined as coarse recognition and delineation itself is known as
finest recognition. (3) Scale yields useful information about the relationship
between the model assembly and any given image such that the recognition
results in a placement of the model close to the actual pose without doing any
elaborate searches or optimization. (4) Effective object recognition can make
delineation most accurate.Comment: This paper was published and presented in SPIE Medical Imaging 201
Detecting Visual Relationships with Deep Relational Networks
Relationships among objects play a crucial role in image understanding.
Despite the great success of deep learning techniques in recognizing individual
objects, reasoning about the relationships among objects remains a challenging
task. Previous methods often treat this as a classification problem,
considering each type of relationship (e.g. "ride") or each distinct visual
phrase (e.g. "person-ride-horse") as a category. Such approaches are faced with
significant difficulties caused by the high diversity of visual appearance for
each kind of relationships or the large number of distinct visual phrases. We
propose an integrated framework to tackle this problem. At the heart of this
framework is the Deep Relational Network, a novel formulation designed
specifically for exploiting the statistical dependencies between objects and
their relationships. On two large datasets, the proposed method achieves
substantial improvement over state-of-the-art.Comment: To be appeared in CVPR 2017 as an oral pape
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
On the Importance of Visual Context for Data Augmentation in Scene Understanding
Performing data augmentation for learning deep neural networks is known to be
important for training visual recognition systems. By artificially increasing
the number of training examples, it helps reducing overfitting and improves
generalization. While simple image transformations can already improve
predictive performance in most vision tasks, larger gains can be obtained by
leveraging task-specific prior knowledge. In this work, we consider object
detection, semantic and instance segmentation and augment the training images
by blending objects in existing scenes, using instance segmentation
annotations. We observe that randomly pasting objects on images hurts the
performance, unless the object is placed in the right context. To resolve this
issue, we propose an explicit context model by using a convolutional neural
network, which predicts whether an image region is suitable for placing a given
object or not. In our experiments, we show that our approach is able to improve
object detection, semantic and instance segmentation on the PASCAL VOC12 and
COCO datasets, with significant gains in a limited annotation scenario, i.e.
when only one category is annotated. We also show that the method is not
limited to datasets that come with expensive pixel-wise instance annotations
and can be used when only bounding boxes are available, by employing
weakly-supervised learning for instance masks approximation.Comment: Updated the experimental section. arXiv admin note: substantial text
overlap with arXiv:1807.0742
- …