10,833 research outputs found
Holistic, Instance-Level Human Parsing
Object parsing -- the task of decomposing an object into its semantic parts
-- has traditionally been formulated as a category-level segmentation problem.
Consequently, when there are multiple objects in an image, current methods
cannot count the number of objects in the scene, nor can they determine which
part belongs to which object. We address this problem by segmenting the parts
of objects at an instance-level, such that each pixel in the image is assigned
a part label, as well as the identity of the object it belongs to. Moreover, we
show how this approach benefits us in obtaining segmentations at coarser
granularities as well. Our proposed network is trained end-to-end given
detections, and begins with a category-level segmentation module. Thereafter, a
differentiable Conditional Random Field, defined over a variable number of
instances for every input image, reasons about the identity of each part by
associating it with a human detection. In contrast to other approaches, our
method can handle the varying number of people in each image and our holistic
network produces state-of-the-art results in instance-level part and human
segmentation, together with competitive results in category-level part
segmentation, all achieved by a single forward-pass through our neural network.Comment: Poster at BMVC 201
Adaptive Temporal Encoding Network for Video Instance-level Human Parsing
Beyond the existing single-person and multiple-person human parsing tasks in
static images, this paper makes the first attempt to investigate a more
realistic video instance-level human parsing that simultaneously segments out
each person instance and parses each instance into more fine-grained parts
(e.g., head, leg, dress). We introduce a novel Adaptive Temporal Encoding
Network (ATEN) that alternatively performs temporal encoding among key frames
and flow-guided feature propagation from other consecutive frames between two
key frames. Specifically, ATEN first incorporates a Parsing-RCNN to produce the
instance-level parsing result for each key frame, which integrates both the
global human parsing and instance-level human segmentation into a unified
model. To balance between accuracy and efficiency, the flow-guided feature
propagation is used to directly parse consecutive frames according to their
identified temporal consistency with key frames. On the other hand, ATEN
leverages the convolution gated recurrent units (convGRU) to exploit temporal
changes over a series of key frames, which are further used to facilitate the
frame-level instance-level parsing. By alternatively performing direct feature
propagation between consistent frames and temporal encoding network among key
frames, our ATEN achieves a good balance between frame-level accuracy and time
efficiency, which is a common crucial problem in video object segmentation
research. To demonstrate the superiority of our ATEN, extensive experiments are
conducted on the most popular video segmentation benchmark (DAVIS) and a newly
collected Video Instance-level Parsing (VIP) dataset, which is the first video
instance-level human parsing dataset comprised of 404 sequences and over 20k
frames with instance-level and pixel-wise annotations.Comment: To appear in ACM MM 2018. Code link:
https://github.com/HCPLab-SYSU/ATEN. Dataset link: http://sysu-hcp.net/li
Improving Facial Attribute Prediction using Semantic Segmentation
Attributes are semantically meaningful characteristics whose applicability
widely crosses category boundaries. They are particularly important in
describing and recognizing concepts where no explicit training example is
given, \textit{e.g., zero-shot learning}. Additionally, since attributes are
human describable, they can be used for efficient human-computer interaction.
In this paper, we propose to employ semantic segmentation to improve facial
attribute prediction. The core idea lies in the fact that many facial
attributes describe local properties. In other words, the probability of an
attribute to appear in a face image is far from being uniform in the spatial
domain. We build our facial attribute prediction model jointly with a deep
semantic segmentation network. This harnesses the localization cues learned by
the semantic segmentation to guide the attention of the attribute prediction to
the regions where different attributes naturally show up. As a result of this
approach, in addition to recognition, we are able to localize the attributes,
despite merely having access to image level labels (weak supervision) during
training. We evaluate our proposed method on CelebA and LFWA datasets and
achieve superior results to the prior arts. Furthermore, we show that in the
reverse problem, semantic face parsing improves when facial attributes are
available. That reaffirms the need to jointly model these two interconnected
tasks
Iterated learning and grounding: from holistic to compositional languages
This paper presents a new computational model for studying the origins and evolution of compositional languages grounded through the interaction between agents and their environment. The model is based on previous work on adaptive grounding of lexicons and the iterated learning model. Although the model is still in a developmental phase, the first results show that a compositional language can emerge in which the structure reflects regularities present in the population's environment
Encouraging versatile thinking in algebra using the computer
In this article we formulate and analyse some of the obstacles to understanding the notion of a variable, and the use and meaning of algebraic notation, and report empirical evidence to support the hypothesis that an approach using the computer will be more successful in overcoming these obstacles. The computer approach is formulated within a wider framework ofversatile thinking in which global, holistic processing complements local, sequential processing. This is done through a combination of programming in BASIC, physical activities which simulate computer storage and manipulation of variables, and specific software which evaluates expressions in standard mathematical notation. The software is designed to enable the user to explore examples and non-examples of a concept, in this case equivalent and non-equivalent expressions. We call such a piece of software ageneric organizer because if offers examples and non-examples which may be seen not just in specific terms, but as typical, or generic, examples of the algebraic processes, assisting the pupil in the difficult task of abstracting the more general concept which they represent. Empirical evidence from several related studies shows that such an approach significantly improves the understanding of higher order concepts in algebra, and that any initial loss in manipulative facility through lack of practice is more than made up at a later stage
- …