2,491 research outputs found
Toward a Taxonomy and Computational Models of Abnormalities in Images
The human visual system can spot an abnormal image, and reason about what
makes it strange. This task has not received enough attention in computer
vision. In this paper we study various types of atypicalities in images in a
more comprehensive way than has been done before. We propose a new dataset of
abnormal images showing a wide range of atypicalities. We design human subject
experiments to discover a coarse taxonomy of the reasons for abnormality. Our
experiments reveal three major categories of abnormality: object-centric,
scene-centric, and contextual. Based on this taxonomy, we propose a
comprehensive computational model that can predict all different types of
abnormality in images and outperform prior arts in abnormality recognition.Comment: To appear in the Thirtieth AAAI Conference on Artificial Intelligence
(AAAI 2016
Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events
Along with the development of modern smart cities, human-centric video
analysis has been encountering the challenge of analyzing diverse and complex
events in real scenes. A complex event relates to dense crowds, anomalous, or
collective behaviors. However, limited by the scale of existing video datasets,
few human analysis approaches have reported their performance on such complex
events. To this end, we present a new large-scale dataset, named
Human-in-Events or HiEve (Human-centric video analysis in complex Events), for
the understanding of human motions, poses, and actions in a variety of
realistic events, especially in crowd and complex events. It contains a record
number of poses (>1M), the largest number of action instances (>56k) under
complex events, as well as one of the largest numbers of trajectories lasting
for longer time (with an average trajectory length of >480 frames). Based on
this dataset, we present an enhanced pose estimation baseline by utilizing the
potential of action information to guide the learning of more powerful 2D pose
features. We demonstrate that the proposed method is able to boost the
performance of existing pose estimation pipelines on our HiEve dataset.
Furthermore, we conduct extensive experiments to benchmark recent video
analysis approaches together with our baseline methods, demonstrating that
HiEve is a challenging dataset for human-centric video analysis. We expect that
the dataset will advance the development of cutting-edge techniques in
human-centric analysis and the understanding of complex events. The dataset is
available at http://humaninevents.orgComment: Dataset for Large-scale Human-centric Video Analysis in Complex
Events (http://humaninevents.org
Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes
Humans have long been recorded in a variety of forms since antiquity. For
example, sculptures and paintings were the primary media for depicting human
beings before the invention of cameras. However, most current human-centric
computer vision tasks like human pose estimation and human image generation
focus exclusively on natural images in the real world. Artificial humans, such
as those in sculptures, paintings, and cartoons, are commonly neglected, making
existing models fail in these scenarios. As an abstraction of life, art
incorporates humans in both natural and artificial scenes. We take advantage of
it and introduce the Human-Art dataset to bridge related tasks in natural and
artificial scenarios. Specifically, Human-Art contains 50k high-quality images
with over 123k person instances from 5 natural and 15 artificial scenarios,
which are annotated with bounding boxes, keypoints, self-contact points, and
text information for humans represented in both 2D and 3D. It is, therefore,
comprehensive and versatile for various downstream tasks. We also provide a
rich set of baseline results and detailed analyses for related tasks, including
human detection, 2D and 3D human pose estimation, image generation, and motion
transfer. As a challenging dataset, we hope Human-Art can provide insights for
relevant research and open up new research questions.Comment: CVPR202
Humans need not label more humans: Occlusion Copy & Paste for Occluded Human Instance Segmentation
Modern object detection and instance segmentation networks stumble when
picking out humans in crowded or highly occluded scenes. Yet, these are often
scenarios where we require our detectors to work well. Many works have
approached this problem with model-centric improvements. While they have been
shown to work to some extent, these supervised methods still need sufficient
relevant examples (i.e. occluded humans) during training for the improvements
to be maximised. In our work, we propose a simple yet effective data-centric
approach, Occlusion Copy & Paste, to introduce occluded examples to models
during training - we tailor the general copy & paste augmentation approach to
tackle the difficult problem of same-class occlusion. It improves instance
segmentation performance on occluded scenarios for "free" just by leveraging on
existing large-scale datasets, without additional data or manual labelling
needed. In a principled study, we show whether various proposed add-ons to the
copy & paste augmentation indeed contribute to better performance. Our
Occlusion Copy & Paste augmentation is easily interoperable with any models: by
simply applying it to a recent generic instance segmentation model without
explicit model architectural design to tackle occlusion, we achieve
state-of-the-art instance segmentation performance on the very challenging
OCHuman dataset. Source code is available at
https://github.com/levan92/occlusion-copy-paste.Comment: 13 pages, 5 figures, BMVC 202
- …