4,566 research outputs found
Human Motion Trajectory Prediction: A Survey
With growing numbers of intelligent autonomous systems in human environments,
the ability of such systems to perceive, understand and anticipate human
behavior becomes increasingly important. Specifically, predicting future
positions of dynamic agents and planning considering such predictions are key
tasks for self-driving vehicles, service robots and advanced surveillance
systems. This paper provides a survey of human motion trajectory prediction. We
review, analyze and structure a large selection of work from different
communities and propose a taxonomy that categorizes existing methods based on
the motion modeling approach and level of contextual information used. We
provide an overview of the existing datasets and performance metrics. We
discuss limitations of the state of the art and outline directions for further
research.Comment: Submitted to the International Journal of Robotics Research (IJRR),
37 page
Visual discomfort and variations in chromaticity in art and nature
SH was supported by a NARSAD Young Investigator Grant from the Brain and Behavior Research Foundation (26282), an R15 AREA award from the National Institute of Mental Health (122935), an NSF EPSCoR grant (1632849) on which SH is a co-investigator, and the NIH COBRE PG20GM103650. OP was partially funded by a Leverhulme grant (RPG-2019-096) to Julie M. Harris and a Research Incentive Grant from the Carnegie Trust (RIG009298).Visual discomfort is related to the statistical regularity of visual images. The contribution of luminance contrast to visual discomfort is well understood and can be framed in terms of a theory of efficient coding of natural stimuli, and linked to metabolic demand. While colour is important in our interaction with nature, the effect of colour on visual discomfort has received less attention. In this study, we build on the established association between visual discomfort and differences in chromaticity across space. We average the local differences in chromaticity in an image and show that this average is a good predictor of visual discomfort from the image. It accounts for part of the variance left unexplained by variations in luminance. We show that the local chromaticity difference in uncomfortable stimuli is high compared to that typical in natural scenes, except in particular infrequent conditions such as the arrangement of colourful fruits against foliage. Overall, our study discloses a new link between visual ecology and discomfort whereby discomfort arises when adaptive perceptual mechanisms are overstimulated by specific classes of stimuli rarely found in nature.Publisher PDFPeer reviewe
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
Object-Centric Image Generation from Layouts
Despite recent impressive results on single-object and single-domain image
generation, the generation of complex scenes with multiple objects remains
challenging. In this paper, we start with the idea that a model must be able to
understand individual objects and relationships between objects in order to
generate complex scenes well. Our layout-to-image-generation method, which we
call Object-Centric Generative Adversarial Network (or OC-GAN), relies on a
novel Scene-Graph Similarity Module (SGSM). The SGSM learns representations of
the spatial relationships between objects in the scene, which lead to our
model's improved layout-fidelity. We also propose changes to the conditioning
mechanism of the generator that enhance its object instance-awareness. Apart
from improving image quality, our contributions mitigate two failure modes in
previous approaches: (1) spurious objects being generated without corresponding
bounding boxes in the layout, and (2) overlapping bounding boxes in the layout
leading to merged objects in images. Extensive quantitative evaluation and
ablation studies demonstrate the impact of our contributions, with our model
outperforming previous state-of-the-art approaches on both the COCO-Stuff and
Visual Genome datasets. Finally, we address an important limitation of
evaluation metrics used in previous works by introducing SceneFID -- an
object-centric adaptation of the popular Fr{\'e}chet Inception Distance metric,
that is better suited for multi-object images.Comment: AAAI 202
Recommended from our members
Detection of abandoned objects in crowded environments
With concerns about terrorism and global security on the rise, it has become vital to have in place efficient threat detection systems that will identify potentially dangerous situations, and alert the authorities to take appropriate action. Of particular relevance is the case of abandoned objects in highly crowded areas. This thesis describes a general framework that recognizes the event of someone leaving an object unattended in forbidden areas. Our approach involves the recognition of four sub-events that characterize the activity of interest. When an unaccompanied object is found, the system analyzes its history to determine its most likely owner(s). Through subsequent frames, the system keeps a lookout for the owner, whose presence in or disappearance from the scene defines the status of the object and determines the appropriate course of action. The system was successfully implemented and tested on several standardized datasets.Electrical and Computer Engineerin
Sea Ice Classification of SAR Imagery Based on Convolution Neural Networks
We explore new and existing convolutional neural network (CNN) architectures for sea ice classification using Sentinel-1 (S1) synthetic aperture radar (SAR) data by investigating two key challenges: binary sea ice versus open-water classification, and a multi-class sea ice type classification. The analysis of sea ice in SAR images is challenging because of the thermal noise effects and ambiguities in the radar backscatter for certain conditions that include the reflection of complex information from sea ice surfaces. We use manually annotated SAR images containing various sea ice types to construct a dataset for our Deep Learning (DL) analysis. To avoid contamination between classes we use a combination of near-simultaneous SAR images from S1 and fine resolution cloud-free optical data from Sentinel-2 (S2). For the classification, we use data augmentation to adjust for the imbalance of sea ice type classes in the training data. The SAR images are divided into small patches which are processed one at a time. We demonstrate that the combination of data augmentation and training of a proposed modified Visual Geometric Group 16-layer (VGG-16) network, trained from scratch, significantly improves the classification performance, compared to the original VGG-16 model and an ad hoc CNN model. The experimental results show both qualitatively and quantitatively that our models produce accurate classification results
- …