Search CORE

2,491 research outputs found

Toward a Taxonomy and Computational Models of Abnormalities in Images

Author: Elgammal Ahmed
Farhadi Ali
Feldman Jacob
Saleh Babak
Publication venue
Publication date: 04/12/2015
Field of study

The human visual system can spot an abnormal image, and reason about what makes it strange. This task has not received enough attention in computer vision. In this paper we study various types of atypicalities in images in a more comprehensive way than has been done before. We propose a new dataset of abnormal images showing a wide range of atypicalities. We design human subject experiments to discover a coarse taxonomy of the reasons for abnormality. Our experiments reveal three major categories of abnormality: object-centric, scene-centric, and contextual. Based on this taxonomy, we propose a comprehensive computational model that can predict all different types of abnormality in images and outperform prior arts in abnormality recognition.Comment: To appear in the Thirtieth AAAI Conference on Artificial Intelligence (AAAI 2016

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events

Author: Li Yuxi
Lin Weiyao
Liu Huabin
Liu Shizhan
Qi Guo-Jun
Qian Rui
Sebe Nicu
Wang Tao
Xiong Hongkai
Xu Ning
Publication venue
Publication date: 14/03/2021
Field of study

Along with the development of modern smart cities, human-centric video analysis has been encountering the challenge of analyzing diverse and complex events in real scenes. A complex event relates to dense crowds, anomalous, or collective behaviors. However, limited by the scale of existing video datasets, few human analysis approaches have reported their performance on such complex events. To this end, we present a new large-scale dataset, named Human-in-Events or HiEve (Human-centric video analysis in complex Events), for the understanding of human motions, poses, and actions in a variety of realistic events, especially in crowd and complex events. It contains a record number of poses (>1M), the largest number of action instances (>56k) under complex events, as well as one of the largest numbers of trajectories lasting for longer time (with an average trajectory length of >480 frames). Based on this dataset, we present an enhanced pose estimation baseline by utilizing the potential of action information to guide the learning of more powerful 2D pose features. We demonstrate that the proposed method is able to boost the performance of existing pose estimation pipelines on our HiEve dataset. Furthermore, we conduct extensive experiments to benchmark recent video analysis approaches together with our baseline methods, demonstrating that HiEve is a challenging dataset for human-centric video analysis. We expect that the dataset will advance the development of cutting-edge techniques in human-centric analysis and the understanding of complex events. The dataset is available at http://humaninevents.orgComment: Dataset for Large-scale Human-centric Video Analysis in Complex Events (http://humaninevents.org

arXiv.org e-Print Archive

Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes

Author: Ju Xuan
Wang Jianan
Xu Qiang
Zeng Ailing
Zhang Lei
Publication venue
Publication date: 05/04/2023
Field of study

Humans have long been recorded in a variety of forms since antiquity. For example, sculptures and paintings were the primary media for depicting human beings before the invention of cameras. However, most current human-centric computer vision tasks like human pose estimation and human image generation focus exclusively on natural images in the real world. Artificial humans, such as those in sculptures, paintings, and cartoons, are commonly neglected, making existing models fail in these scenarios. As an abstraction of life, art incorporates humans in both natural and artificial scenes. We take advantage of it and introduce the Human-Art dataset to bridge related tasks in natural and artificial scenarios. Specifically, Human-Art contains 50k high-quality images with over 123k person instances from 5 natural and 15 artificial scenarios, which are annotated with bounding boxes, keypoints, self-contact points, and text information for humans represented in both 2D and 3D. It is, therefore, comprehensive and versatile for various downstream tasks. We also provide a rich set of baseline results and detailed analyses for related tasks, including human detection, 2D and 3D human pose estimation, image generation, and motion transfer. As a challenging dataset, we hope Human-Art can provide insights for relevant research and open up new research questions.Comment: CVPR202

arXiv.org e-Print Archive

Humans need not label more humans: Occlusion Copy & Paste for Occluded Human Instance Segmentation

Author: Huang Dezhao
Hur Minhoe
Ling Evan
Publication venue
Publication date: 07/10/2022
Field of study

Modern object detection and instance segmentation networks stumble when picking out humans in crowded or highly occluded scenes. Yet, these are often scenarios where we require our detectors to work well. Many works have approached this problem with model-centric improvements. While they have been shown to work to some extent, these supervised methods still need sufficient relevant examples (i.e. occluded humans) during training for the improvements to be maximised. In our work, we propose a simple yet effective data-centric approach, Occlusion Copy & Paste, to introduce occluded examples to models during training - we tailor the general copy & paste augmentation approach to tackle the difficult problem of same-class occlusion. It improves instance segmentation performance on occluded scenarios for "free" just by leveraging on existing large-scale datasets, without additional data or manual labelling needed. In a principled study, we show whether various proposed add-ons to the copy & paste augmentation indeed contribute to better performance. Our Occlusion Copy & Paste augmentation is easily interoperable with any models: by simply applying it to a recent generic instance segmentation model without explicit model architectural design to tackle occlusion, we achieve state-of-the-art instance segmentation performance on the very challenging OCHuman dataset. Source code is available at https://github.com/levan92/occlusion-copy-paste.Comment: 13 pages, 5 figures, BMVC 202

arXiv.org e-Print Archive

MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?

Author: Aljosa Osep
Gianluca Maugeri
Guillem Braso
Laura Leal-Taixe
Matteo Fabbri
Orcun Cetintas
Riccardo Gasparini
Rita Cucchiara
Simone Calderara
Publication venue
Publication date: 01/01/2021
Field of study

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia