403 research outputs found

    Learning-based depth estimation from 2D images using GIST and saliency

    Get PDF
    Although there has been a significant proliferation of 3D displays in the last decade, the availability of 3D content is still scant compared to the volume of 2D data. To fill this gap, automatic 2D to 3D conversion algorithms are needed. In this paper, we present an automatic approach, inspired by machine learning principles, for estimating the depth of a 2D image. The depth of a query image is inferred from a dataset of color and depth images by searching this repository for images that are photometrically similar to the query. We measure the photometric similarity between two images by comparing their GIST descriptors. Since not all regions in the query image require the same visual attention, we give more weight in the GIST-descriptor comparison to regions with high saliency. Subsequently, we fuse the depths of the most similar images and adaptively filter the result to obtain a depth estimate. Our experimental results indicate that the proposed algorithm outperforms other state-of-the-art approaches on the commonly-used Kinect-NYU dataset

    Automatic depth estimation from single 2D image via transfer learning approach

    Get PDF
    Nowadays, depth estimation from a single 2D image is a prominent task due to its numerous applications such as 2D to 3D image/video conversion, robot vision, and self-driving cars. This research proposes an automatic novel technique for the depth estimation of single 2D images via transfer learning of pre-trained deep learning model. This is a challenging problem, as a single 2D image does not carry any cues regarding depth. To tackle this, the pool of available images is exploited for which the depth is known. By following the hypothesis that the color images having similar semantics are most probably to have similar depth. Along these lines, the depth of the input image is predicted through corresponding depth maps of semantically similar images available in the dataset, fetched by high-level features of pre-trained deep learning model followed by a classifier (i.e., K-Nearest Neighbor). Afterward, a Cross Bilateral filter is applied for the removal of fallacious depth variations in the depth map. To prove the quality of the presented approach, different experiments have been conducted on two publicly available benchmark datasets, NYU (v2) and Make3D. The results indicate that the proposed approach outperforms state of the art methods

    An automatic cluster-based approach for depth estimation of single 2D images

    Get PDF
    In this paper, the problem of single 2D image depth estimation is considered. This is a very important problem due to its various applications in the industry. Previous learning-based methods are based on a key assumption that color images having photometric resemblance are likely to present similar depth structure. However, these methods search the whole dataset for finding corresponding images using handcrafted features, which is quite cumbersome and inefficient process. To overcome this, we have proposed a clustering-based algorithm for depth estimation of a single 2D image using transfer learning. To realize this, images are categorized into clusters using K-means clustering algorithm and features are extracted through a pre-trained deep learning model i.e., ResNet-50. After clustering, an efficient step of replacing feature vector is embedded to speedup the process without compromising on accuracy. After then, images with similar structure as an input image, are retrieved from the best matched cluster based on their correlation values. Then, retrieved candidate depth images are employed to initialize prior depth of a query image using weighted-correlation-average (WCA). Finally, the estimated depth is improved by removing variations using cross-bilateral-filter. In order to evaluate the performance of proposed algorithm, experiments are conducted on two benchmark datasets, NYU v2 and Make3D

    Visual working memory in immersive visualization: a change detection experiment and an image-computable model

    Get PDF
    Visual working memory (VWM) is a cognitive mechanism essential for interacting with the environment and accomplishing ongoing tasks, as it allows fast processing of visual inputs at the expense of the amount of information that can be stored. A better understanding of its functioning would be beneficial to research fields such as simulation and training in immersive Virtual Reality or information visualization and computer graphics. The current work focuses on the design and implementation of a paradigm for evaluating VWM in immersive visualization and of a novel image-based computational model for mimicking the human behavioral data of VWM. We evaluated the VWM at the variation of four conditions: set size, spatial layout, visual angle (VA) subtending stimuli presentation space, and observation time. We adopted a full factorial design and analysed participants' performances in the change detection experiment. The analysis of hit rates and false alarm rates confirms the existence of a limit of VWM capacity of around 7 & PLUSMN; 2 items, as found in the literature based on the use of 2D videos and images. Only VA and observation time influence performances (p<0.0001). Indeed, with VA enlargement, participants need more time to have a complete overview of the presented stimuli. Moreover, we show that our model has a high level of agreement with the human data, r>0.88 (p<0.05)

    How much of driving is pre-attentive?

    Get PDF
    Driving a car in an urban setting is an extremely difficult problem, incorporating a large number of complex visual tasks; however, this problem is solved daily by most adults with little apparent effort. This paper proposes a novel vision-based approach to autonomous driving that can predict and even anticipate a driver's behavior in real time, using preattentive vision only. Experiments on three large datasets totaling over 200 000 frames show that our preattentive model can (1) detect a wide range of driving-critical context such as crossroads, city center, and road type; however, more surprisingly, it can (2) detect the driver's actions (over 80% of braking and turning actions) and (3) estimate the driver's steering angle accurately. Additionally, our model is consistent with human data: First, the best steering prediction is obtained for a perception to action delay consistent with psychological experiments. Importantly, this prediction can be made before the driver's action. Second, the regions of the visual field used by the computational model strongly correlate with the driver's gaze locations, significantly outperforming many saliency measures and comparable to state-of-the-art approaches.European Commission’s Seventh Framework Programme (FP7/2007-2013

    Depth-aware neural style transfer

    Get PDF
    Neural style transfer has recently received significant attention and demonstrated amazing results. An efficient solution proposed by Johnson et al. trains feed-forward convolutional neural networks by defining and optimizing perceptual loss functions. Such methods are typically based on high-level features extracted from pre-trained neural networks, where the loss functions contain two components: style loss and content loss. However, such pre-trained networks are originally designed for object recognition, and hence the high-level features often focus on the primary target and neglect other details. As a result, when input images contain multiple objects potentially at different depths, the resulting images are often unsatisfactory because image layout is destroyed and the boundary between the foreground and background as well as different objects becomes obscured. We observe that the depth map effectively reflects the spatial distribution in an image and preserving the depth map of the content image after stylization helps produce an image that preserves its semantic content. In this paper, we introduce a novel approach for neural style transfer that integrates depth preservation as additional loss, preserving overall image layout while performing style transfer
    corecore