182,930 research outputs found

    Characterizing and Dissecting Human Perception of Scene Complexity

    Get PDF
    Humans can effortlessly assess the complexity of the visual stimuli they encounter. However, our understanding of how we dothis, and the relevant factors that result in our perception of scene complexity remain unclear; especially for the natural scenes in which we are constantly immersed. We introduce several new datasets to further understanding of human perception of scene complexity. Our first dataset (VISC-C) contains 800 scenes and 800 corresponding two-dimensional complexity annotations gathered from human observers, allowing exploration for how complexity perception varies across a scene. Our second dataset, (VISC-CI) consists of inverted scenes (reflection on the horizontal axis) with corresponding complexity maps, collected from human observers. Inverting images in this fashion is associated with destruction of semantic scene characteristics when viewed by humans, and hence allows analysis of the impact of semantics on perceptual complexity. We analysed perceptual complexity from both a single-score and a two-dimensional perspective, by evaluating a set of calculable and observable perceptual features based upon grounded psychological research (clutter, symmetry, entropy and openness). We considered these factors’ relationship to complexity via hierarchical regressions analyses, tested the efficacy of various neural models against our datasets, and validated our perceptual features against a large and varied complexity dataset consisting of nearly 5000 images. Our results indicate that both global image properties and semantic features are important for complexity perception. We further verified this by combining identified perceptual features with the output of a neural network predictor capable of extracting semantics, and found that we could increase the amount of explained human variance in complexity beyond that of low-level measures alone. Finally, we dissect our best performing prediction network, determining that artificial neurons learn to extract both global image properties and semantic details from scenes for complexity prediction. Based on our experimental results, we propose the "dual information" framework of complexity perception, hypothesising that humans rely on both low-level image features and high-level semantic content to evaluate the complexity of images

    The Cityscapes Dataset for Semantic Urban Scene Understanding

    Full text link
    Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations; 20000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.Comment: Includes supplemental materia

    Depth CNNs for RGB-D scene recognition: learning from scratch better than transferring from RGB-CNNs

    Full text link
    Scene recognition with RGB images has been extensively studied and has reached very remarkable recognition levels, thanks to convolutional neural networks (CNN) and large scene datasets. In contrast, current RGB-D scene data is much more limited, so often leverages RGB large datasets, by transferring pretrained RGB CNN models and fine-tuning with the target RGB-D dataset. However, we show that this approach has the limitation of hardly reaching bottom layers, which is key to learn modality-specific features. In contrast, we focus on the bottom layers, and propose an alternative strategy to learn depth features combining local weakly supervised training from patches followed by global fine tuning with images. This strategy is capable of learning very discriminative depth-specific features with limited depth images, without resorting to Places-CNN. In addition we propose a modified CNN architecture to further match the complexity of the model and the amount of data available. For RGB-D scene recognition, depth and RGB features are combined by projecting them in a common space and further leaning a multilayer classifier, which is jointly optimized in an end-to-end network. Our framework achieves state-of-the-art accuracy on NYU2 and SUN RGB-D in both depth only and combined RGB-D data.Comment: AAAI Conference on Artificial Intelligence 201

    Physics Inspired Optimization on Semantic Transfer Features: An Alternative Method for Room Layout Estimation

    Full text link
    In this paper, we propose an alternative method to estimate room layouts of cluttered indoor scenes. This method enjoys the benefits of two novel techniques. The first one is semantic transfer (ST), which is: (1) a formulation to integrate the relationship between scene clutter and room layout into convolutional neural networks; (2) an architecture that can be end-to-end trained; (3) a practical strategy to initialize weights for very deep networks under unbalanced training data distribution. ST allows us to extract highly robust features under various circumstances, and in order to address the computation redundance hidden in these features we develop a principled and efficient inference scheme named physics inspired optimization (PIO). PIO's basic idea is to formulate some phenomena observed in ST features into mechanics concepts. Evaluations on public datasets LSUN and Hedau show that the proposed method is more accurate than state-of-the-art methods.Comment: To appear in CVPR 2017. Project Page: https://sites.google.com/view/st-pio

    ICNet for Real-Time Semantic Segmentation on High-Resolution Images

    Full text link
    We focus on the challenging task of real-time semantic segmentation in this paper. It finds many practical applications and yet is with fundamental difficulty of reducing a large portion of computation for pixel-wise label inference. We propose an image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance to address this challenge. We provide in-depth analysis of our framework and introduce the cascade feature fusion unit to quickly achieve high-quality segmentation. Our system yields real-time inference on a single GPU card with decent quality results evaluated on challenging datasets like Cityscapes, CamVid and COCO-Stuff.Comment: ECCV 201

    Advanced content-based semantic scene analysis and information retrieval: the SCHEMA project

    Get PDF
    The aim of the SCHEMA Network of Excellence is to bring together a critical mass of universities, research centers, industrial partners and end users, in order to design a reference system for content-based semantic scene analysis, interpretation and understanding. Relevant research areas include: content-based multimedia analysis and automatic annotation of semantic multimedia content, combined textual and multimedia information retrieval, semantic -web, MPEG-7 and MPEG-21 standards, user interfaces and human factors. In this paper, recent advances in content-based analysis, indexing and retrieval of digital media within the SCHEMA Network are presented. These advances will be integrated in the SCHEMA module-based, expandable reference system
    • 

    corecore