10,254 research outputs found

    Human Motion Trajectory Prediction: A Survey

    Full text link
    With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important. Specifically, predicting future positions of dynamic agents and planning considering such predictions are key tasks for self-driving vehicles, service robots and advanced surveillance systems. This paper provides a survey of human motion trajectory prediction. We review, analyze and structure a large selection of work from different communities and propose a taxonomy that categorizes existing methods based on the motion modeling approach and level of contextual information used. We provide an overview of the existing datasets and performance metrics. We discuss limitations of the state of the art and outline directions for further research.Comment: Submitted to the International Journal of Robotics Research (IJRR), 37 page

    Recurrent Scene Parsing with Perspective Understanding in the Loop

    Full text link
    Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution. We propose a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby. The depth gating signal is provided by stereo disparity or estimated directly from monocular input. We integrate this depth-aware gating into a recurrent convolutional neural network to perform semantic segmentation. Our recurrent module iteratively refines the segmentation results, leveraging the depth and semantic predictions from the previous iterations. Through extensive experiments on four popular large-scale RGB-D datasets, we demonstrate this approach achieves competitive semantic segmentation performance with a model which is substantially more compact. We carry out extensive analysis of this architecture including variants that operate on monocular RGB but use depth as side-information during training, unsupervised gating as a generic attentional mechanism, and multi-resolution gating. We find that gated pooling for joint semantic segmentation and depth yields state-of-the-art results for quantitative monocular depth estimation

    Relation Networks for Object Detection

    Full text link
    Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence that the idea is working in the deep learning era. All state-of-the-art object detection systems still rely on recognizing object instances individually, without exploiting their relations during learning. This work proposes an object relation module. It processes a set of objects simultaneously through interaction between their appearance feature and geometry, thus allowing modeling of their relations. It is lightweight and in-place. It does not require additional supervision and is easy to embed in existing networks. It is shown effective on improving object recognition and duplicate removal steps in the modern object detection pipeline. It verifies the efficacy of modeling object relations in CNN based detection. It gives rise to the first fully end-to-end object detector

    Plot-based urbanism and urban morphometrics : measuring the evolution of blocks, street fronts and plots in cities

    Get PDF
    Generative urban design has been always conceived as a creation-centered process, i.e. a process mainly concerned with the creation phase of a spatial transformation. We argue that, though the way we create a space is important, how that space evolves in time is ways more important when it comes to providing livable places gifted by identity and sense of attachment. We are presenting in this paper this idea and its major consequences for urban design under the title of ā€œPlot-Based Urbanismā€. We will argue that however, in order for a place to be adaptable in time, the right structure must be provided ā€œby designā€ from the outset. We conceive urban design as the activity aimed at designing that structure. The force that shapes (has always shaped) the adaptability in time of livable urban places is the restless activity of ordinary people doing their own ordinary business, a kind of participation to the common good, which has hardly been acknowledged as such, that we term ā€œinformal participationā€. Investigating what spatial components belong to the spatial structure and how they relate to each other is of crucial importance for urban design and that is the scope of our research. In this paper a methodology to represent and measure form-related properties of streets, blocks, plots and buildings in cities is presented. Several dozens of urban blocks of different historic formation in Milan (IT) and Glasgow (UK) are surveyed and analyzed. Effort is posed to identify those spatial properties that are shared by clusters of cases in history and therefore constitute the set of spatial relationships that determine the morphological identity of places. To do so, we investigate the analogy that links the evolution of urban form as a cultural construct to that of living organisms, outlining a conceptual framework of reference for the further investigation of ā€œthe DNA of placesā€. In this sense, we identify in the year 1950 the nominal watershed that marks the first ā€œspeciationā€ in urban history and we find that factors of location/centrality, scale and street permeability are the main drivers of that transition towards the entirely new urban forms of contemporary cities

    DeepSignals: Predicting Intent of Drivers Through Visual Signals

    Full text link
    Detecting the intention of drivers is an essential task in self-driving, necessary to anticipate sudden events like lane changes and stops. Turn signals and emergency flashers communicate such intentions, providing seconds of potentially critical reaction time. In this paper, we propose to detect these signals in video sequences by using a deep neural network that reasons about both spatial and temporal information. Our experiments on more than a million frames show high per-frame accuracy in very challenging scenarios.Comment: To be presented at the IEEE International Conference on Robotics and Automation (ICRA), 201
    • ā€¦
    corecore