254 research outputs found
Markerless Motion Capture in the Crowd
This work uses crowdsourcing to obtain motion capture data from video
recordings. The data is obtained by information workers who click repeatedly to
indicate body configurations in the frames of a video, resulting in a model of
2D structure over time. We discuss techniques to optimize the tracking task and
strategies for maximizing accuracy and efficiency. We show visualizations of a
variety of motions captured with our pipeline then apply reconstruction
techniques to derive 3D structure.Comment: Presented at Collective Intelligence conference, 2012
(arXiv:1204.2991
Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
This paper proposes a new hybrid architecture that consists of a deep
Convolutional Network and a Markov Random Field. We show how this architecture
is successfully applied to the challenging problem of articulated human pose
estimation in monocular images. The architecture can exploit structural domain
constraints such as geometric relationships between body joint locations. We
show that joint training of these two model paradigms improves performance and
allows us to significantly outperform existing state-of-the-art techniques
MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation
In this work, we propose a novel and efficient method for articulated human
pose estimation in videos using a convolutional network architecture, which
incorporates both color and motion features. We propose a new human body pose
dataset, FLIC-motion, that extends the FLIC dataset with additional motion
features. We apply our architecture to this dataset and report significantly
better performance than current state-of-the-art pose detection systems
Learning Human Pose Estimation Features with Convolutional Networks
This paper introduces a new architecture for human pose estimation using a
multi- layer convolutional network architecture and a modified learning
technique that learns low-level features and higher-level weak spatial models.
Unconstrained human pose estimation is one of the hardest problems in computer
vision, and our new architecture and learning schema shows significant
improvement over the current state-of-the-art results. The main contribution of
this paper is showing, for the first time, that a specific variation of deep
learning is able to outperform all existing traditional architectures on this
task. The paper also discusses several lessons learned while researching
alternatives, most notably, that it is possible to learn strong low-level
feature detectors on features that might even just cover a few pixels in the
image. Higher-level spatial models improve somewhat the overall result, but to
a much lesser extent then expected. Many researchers previously argued that the
kinematic structure and top-down information is crucial for this domain, but
with our purely bottom up, and weak spatial model, we could improve other more
complicated architectures that currently produce the best results. This mirrors
what many other researchers, like those in the speech recognition, object
recognition, and other domains have experienced
Efficient Object Localization Using Convolutional Networks
Recent state-of-the-art performance on human-body pose estimation has been
achieved with Deep Convolutional Networks (ConvNets). Traditional ConvNet
architectures include pooling and sub-sampling layers which reduce
computational requirements, introduce invariance and prevent over-training.
These benefits of pooling come at the cost of reduced localization accuracy. We
introduce a novel architecture which includes an efficient `position
refinement' model that is trained to estimate the joint offset location within
a small region of the image. This refinement model is jointly trained in
cascade with a state-of-the-art ConvNet model to achieve improved accuracy in
human joint location estimation. We show that the variance of our detector
approaches the variance of human annotations on the FLIC dataset and
outperforms all existing approaches on the MPII-human-pose dataset.Comment: 8 pages with 1 page of citation
Social acceptance of classified versus non-classified students
The purpose of this study was to examine the social acceptance status of classified students versus non-classified students. Another purpose was to identify reasons why students perceive someone as having lower social status. A total of 95 students completed a rating scale and were surveyed for a nomination scale. Out of the 95 students, 27 were classified as learning disabled (21 boys, 11 girls). The scales and surveys allowed all the students to rate one another on peer ratings of liking and disliking and social acceptance. Students who were classified rated within the top 50%, of overall students, as being accepted and chosen as friends of other students. The students\u27 reasons for choosing their friends was mainly because the person they chose was nice to them. The findings highlight the importance of mainstreaming students and keeping labels to a minimum for continued success and for improving self esteem
Towards Accurate Multi-person Pose Estimation in the Wild
We propose a method for multi-person detection and 2-D pose estimation that
achieves state-of-art results on the challenging COCO keypoints task. It is a
simple, yet powerful, top-down approach consisting of two stages.
In the first stage, we predict the location and scale of boxes which are
likely to contain people; for this we use the Faster RCNN detector. In the
second stage, we estimate the keypoints of the person potentially contained in
each proposed bounding box. For each keypoint type we predict dense heatmaps
and offsets using a fully convolutional ResNet. To combine these outputs we
introduce a novel aggregation procedure to obtain highly localized keypoint
predictions. We also use a novel form of keypoint-based Non-Maximum-Suppression
(NMS), instead of the cruder box-level NMS, and a novel form of keypoint-based
confidence score estimation, instead of box-level scoring.
Trained on COCO data alone, our final system achieves average precision of
0.649 on the COCO test-dev set and the 0.643 test-standard sets, outperforming
the winner of the 2016 COCO keypoints challenge and other recent state-of-art.
Further, by using additional in-house labeled data we obtain an even higher
average precision of 0.685 on the test-dev set and 0.673 on the test-standard
set, more than 5% absolute improvement compared to the previous best performing
method on the same dataset.Comment: Paper describing an improved version of the G-RMI entry to the 2016
COCO keypoints challenge (http://image-net.org/challenges/ilsvrc+coco2016).
Camera ready version to appear in the Proceedings of CVPR 201
INoD: Injected Noise Discriminator for Self-Supervised Representation Learning in Agricultural Fields
Perception datasets for agriculture are limited both in quantity and
diversity which hinders effective training of supervised learning approaches.
Self-supervised learning techniques alleviate this problem, however, existing
methods are not optimized for dense prediction tasks in agriculture domains
which results in degraded performance. In this work, we address this limitation
with our proposed Injected Noise Discriminator (INoD) which exploits principles
of feature replacement and dataset discrimination for self-supervised
representation learning. INoD interleaves feature maps from two disjoint
datasets during their convolutional encoding and predicts the dataset
affiliation of the resultant feature map as a pretext task. Our approach
enables the network to learn unequivocal representations of objects seen in one
dataset while observing them in conjunction with similar features from the
disjoint dataset. This allows the network to reason about higher-level
semantics of the entailed objects, thus improving its performance on various
downstream tasks. Additionally, we introduce the novel Fraunhofer Potato 2022
dataset consisting of over 16,800 images for object detection in potato fields.
Extensive evaluations of our proposed INoD pretraining strategy for the tasks
of object detection, semantic segmentation, and instance segmentation on the
Sugar Beets 2016 and our potato dataset demonstrate that it achieves
state-of-the-art performance.Comment: 8 pages, 7 figure
- …