10,418 research outputs found

    Review on Computer Vision Techniques in Emergency Situation

    Full text link
    In emergency situations, actions that save lives and limit the impact of hazards are crucial. In order to act, situational awareness is needed to decide what to do. Geolocalized photos and video of the situations as they evolve can be crucial in better understanding them and making decisions faster. Cameras are almost everywhere these days, either in terms of smartphones, installed CCTV cameras, UAVs or others. However, this poses challenges in big data and information overflow. Moreover, most of the time there are no disasters at any given location, so humans aiming to detect sudden situations may not be as alert as needed at any point in time. Consequently, computer vision tools can be an excellent decision support. The number of emergencies where computer vision tools has been considered or used is very wide, and there is a great overlap across related emergency research. Researchers tend to focus on state-of-the-art systems that cover the same emergency as they are studying, obviating important research in other fields. In order to unveil this overlap, the survey is divided along four main axes: the types of emergencies that have been studied in computer vision, the objective that the algorithms can address, the type of hardware needed and the algorithms used. Therefore, this review provides a broad overview of the progress of computer vision covering all sorts of emergencies.Comment: 25 page

    Affordances Provide a Fundamental Categorization Principle for Visual Scenes

    Full text link
    How do we know that a kitchen is a kitchen by looking? Relatively little is known about how we conceptualize and categorize different visual environments. Traditional models of visual perception posit that scene categorization is achieved through the recognition of a scene's objects, yet these models cannot account for the mounting evidence that human observers are relatively insensitive to the local details in an image. Psychologists have long theorized that the affordances, or actionable possibilities of a stimulus are pivotal to its perception. To what extent are scene categories created from similar affordances? Using a large-scale experiment using hundreds of scene categories, we show that the activities afforded by a visual scene provide a fundamental categorization principle. Affordance-based similarity explained the majority of the structure in the human scene categorization patterns, outperforming alternative similarities based on objects or visual features. We all models were combined, affordances provided the majority of the predictive power in the combined model, and nearly half of the total explained variance is captured only by affordances. These results challenge many existing models of high-level visual perception, and provide immediately testable hypotheses for the functional organization of the human perceptual system

    Spatially Constrained Location Prior for Scene Parsing

    Full text link
    Semantic context is an important and useful cue for scene parsing in complicated natural images with a substantial amount of variations in objects and the environment. This paper proposes Spatially Constrained Location Prior (SCLP) for effective modelling of global and local semantic context in the scene in terms of inter-class spatial relationships. Unlike existing studies focusing on either relative or absolute location prior of objects, the SCLP effectively incorporates both relative and absolute location priors by calculating object co-occurrence frequencies in spatially constrained image blocks. The SCLP is general and can be used in conjunction with various visual feature-based prediction models, such as Artificial Neural Networks and Support Vector Machine (SVM), to enforce spatial contextual constraints on class labels. Using SVM classifiers and a linear regression model, we demonstrate that the incorporation of SCLP achieves superior performance compared to the state-of-the-art methods on the Stanford background and SIFT Flow datasets.Comment: authors' pre-print version of a article published in IJCNN 201

    Context-Aware Query Selection for Active Learning in Event Recognition

    Full text link
    Activity recognition is a challenging problem with many practical applications. In addition to the visual features, recent approaches have benefited from the use of context, e.g., inter-relationships among the activities and objects. However, these approaches require data to be labeled, entirely available beforehand, and not designed to be updated continuously, which make them unsuitable for surveillance applications. In contrast, we propose a continuous-learning framework for context-aware activity recognition from unlabeled video, which has two distinct advantages over existing methods. First, it employs a novel active-learning technique that not only exploits the informativeness of the individual activities but also utilizes their contextual information during query selection; this leads to significant reduction in expensive manual annotation effort. Second, the learned models can be adapted online as more data is available. We formulate a conditional random field model that encodes the context and devise an information-theoretic approach that utilizes entropy and mutual information of the nodes to compute the set of most informative queries, which are labeled by a human. These labels are combined with graphical inference techniques for incremental updates. We provide a theoretical formulation of the active learning framework with an analytic solution. Experiments on six challenging datasets demonstrate that our framework achieves superior performance with significantly less manual labeling.Comment: To appear in Transactions of Pattern Pattern Analysis and Machine Intelligence (T-PAMI

    SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences

    Full text link
    Semantic scene understanding is important for various applications. In particular, self-driving cars need a fine-grained understanding of the surfaces and objects in their vicinity. Light detection and ranging (LiDAR) provides precise geometric information about the environment and is thus a part of the sensor suites of almost all self-driving cars. Despite the relevance of semantic scene understanding for this application, there is a lack of a large dataset for this task which is based on an automotive LiDAR. In this paper, we introduce a large dataset to propel research on laser-based semantic segmentation. We annotated all sequences of the KITTI Vision Odometry Benchmark and provide dense point-wise annotations for the complete 360o360^{o} field-of-view of the employed automotive LiDAR. We propose three benchmark tasks based on this dataset: (i) semantic segmentation of point clouds using a single scan, (ii) semantic segmentation using multiple past scans, and (iii) semantic scene completion, which requires to anticipate the semantic scene in the future. We provide baseline experiments and show that there is a need for more sophisticated models to efficiently tackle these tasks. Our dataset opens the door for the development of more advanced methods, but also provides plentiful data to investigate new research directions.Comment: ICCV2019. See teaser video at http://bit.ly/SemanticKITTI-tease

    Deep Learning Markov Random Field for Semantic Segmentation

    Full text link
    Semantic segmentation tasks can be well modeled by Markov Random Field (MRF). This paper addresses semantic segmentation by incorporating high-order relations and mixture of label contexts into MRF. Unlike previous works that optimized MRFs using iterative algorithm, we solve MRF by proposing a Convolutional Neural Network (CNN), namely Deep Parsing Network (DPN), which enables deterministic end-to-end computation in a single forward pass. Specifically, DPN extends a contemporary CNN to model unary terms and additional layers are devised to approximate the mean field (MF) algorithm for pairwise terms. It has several appealing properties. First, different from the recent works that required many iterations of MF during back-propagation, DPN is able to achieve high performance by approximating one iteration of MF. Second, DPN represents various types of pairwise terms, making many existing models as its special cases. Furthermore, pairwise terms in DPN provide a unified framework to encode rich contextual information in high-dimensional data, such as images and videos. Third, DPN makes MF easier to be parallelized and speeded up, thus enabling efficient inference. DPN is thoroughly evaluated on standard semantic image/video segmentation benchmarks, where a single DPN model yields state-of-the-art segmentation accuracies on PASCAL VOC 2012, Cityscapes dataset and CamVid dataset.Comment: To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017. Extended version of our previous ICCV 2015 paper (arXiv:1509.02634

    Joint Attention in Driver-Pedestrian Interaction: from Theory to Practice

    Full text link
    Today, one of the major challenges that autonomous vehicles are facing is the ability to drive in urban environments. Such a task requires communication between autonomous vehicles and other road users in order to resolve various traffic ambiguities. The interaction between road users is a form of negotiation in which the parties involved have to share their attention regarding a common objective or a goal (e.g. crossing an intersection), and coordinate their actions in order to accomplish it. In this literature review we aim to address the interaction problem between pedestrians and drivers (or vehicles) from joint attention point of view. More specifically, we will discuss the theoretical background behind joint attention, its application to traffic interaction and practical approaches to implementing joint attention for autonomous vehicles

    ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data

    Full text link
    Scene understanding of high resolution aerial images is of great importance for the task of automated monitoring in various remote sensing applications. Due to the large within-class and small between-class variance in pixel values of objects of interest, this remains a challenging task. In recent years, deep convolutional neural networks have started being used in remote sensing applications and demonstrate state of the art performance for pixel level classification of objects. \textcolor{black}{Here we propose a reliable framework for performant results for the task of semantic segmentation of monotemporal very high resolution aerial images. Our framework consists of a novel deep learning architecture, ResUNet-a, and a novel loss function based on the Dice loss. ResUNet-a uses a UNet encoder/decoder backbone, in combination with residual connections, atrous convolutions, pyramid scene parsing pooling and multi-tasking inference. ResUNet-a infers sequentially the boundary of the objects, the distance transform of the segmentation mask, the segmentation mask and a colored reconstruction of the input. Each of the tasks is conditioned on the inference of the previous ones, thus establishing a conditioned relationship between the various tasks, as this is described through the architecture's computation graph. We analyse the performance of several flavours of the Generalized Dice loss for semantic segmentation, and we introduce a novel variant loss function for semantic segmentation of objects that has excellent convergence properties and behaves well even under the presence of highly imbalanced classes.} The performance of our modeling framework is evaluated on the ISPRS 2D Potsdam dataset. Results show state-of-the-art performance with an average F1 score of 92.9\% over all classes for our best model.Comment: Accepted for publication to the ISPRS Journal of Photogrammetry and Remote Sensin

    Towards Automated Cadastral Boundary Delineation from UAV Data

    Full text link
    Unmanned aerial vehicles (UAV) are evolving as an alternative tool to acquire land tenure data. UAVs can capture geospatial data at high quality and resolution in a cost-effective, transparent and flexible manner, from which visible land parcel boundaries, i.e., cadastral boundaries are delineable. This delineation is to no extent automated, even though physical objects automatically retrievable through image analysis methods mark a large portion of cadastral boundaries. This study proposes (i) a workflow that automatically extracts candidate cadastral boundaries from UAV orthoimages and (ii) a tool for their semi-automatic processing to delineate final cadastral boundaries. The workflow consists of two state-of-the-art computer vision methods, namely gPb contour detection and SLIC superpixels that are transferred to remote sensing in this study. The tool combines the two methods, allows a semi-automatic final delineation and is implemented as a publicly available QGIS plugin. The approach does not yet aim to provide a comparable alternative to manual cadastral mapping procedures. However, the methodological development of the tool towards this goal is developed in this paper. A study with 13 volunteers investigates the design and implementation of the approach and gathers initial qualitative as well as quantitate results. The study revealed points for improvement, which are prioritized based on the study results and which will be addressed in future work.Comment: Report on current state (August 2017) of PhD work of first author. Further info: https://its4land.com/automate-it-wp5

    Learning to Detect Vehicles by Clustering Appearance Patterns

    Full text link
    This paper studies efficient means for dealing with intra-category diversity in object detection. Strategies for occlusion and orientation handling are explored by learning an ensemble of detection models from visual and geometrical clusters of object instances. An AdaBoost detection scheme is employed with pixel lookup features for fast detection. The analysis provides insight into the design of a robust vehicle detection system, showing promise in terms of detection performance and orientation estimation accuracy.Comment: Preprint version of our T-ITS 2015 pape
    corecore