12 research outputs found

    Gaussian Processes with Context-Supported Priors for Active Object Localization

    Full text link
    We devise an algorithm using a Bayesian optimization framework in conjunction with contextual visual data for the efficient localization of objects in still images. Recent research has demonstrated substantial progress in object localization and related tasks for computer vision. However, many current state-of-the-art object localization procedures still suffer from inaccuracy and inefficiency, in addition to failing to provide a principled and interpretable system amenable to high-level vision tasks. We address these issues with the current research. Our method encompasses an active search procedure that uses contextual data to generate initial bounding-box proposals for a target object. We train a convolutional neural network to approximate an offset distance from the target object. Next, we use a Gaussian Process to model this offset response signal over the search space of the target. We then employ a Bayesian active search for accurate localization of the target. In experiments, we compare our approach to a state-of-theart bounding-box regression method for a challenging pedestrian localization task. Our method exhibits a substantial improvement over this baseline regression method.Comment: 10 pages, 4 figure

    Modified Capsule Neural Network (Mod-CapsNet) for indoor home scene recognition

    Get PDF
    In this paper, a Modified Capsule Neural Network (Mod-CapsNet) with a pooling layer but without the squash function is used for recognition of indoor home scenes which are represented in grayscale. This Mod-CapsNet produced an accuracy of 70% compared to the 17.2% accuracy produced by a standard CapsNet. Since there is a lack of larger datasets related to indoor home scenes, to obtain better accuracy with smaller datasets is also one of the important aims in the paper. The number of images used for training and testing is 20,000 and 5000 respectively, all of dimension 128X128. The analysis proves that in the indoor home scene recognition task the combination of the capsule without a squash function and with max-pooling layers works better than by using capsules with convolutional layers. Indoor home scenes are specifically focused towards analysing capsules performance on datasets whose images have similarities but are, nonetheless, quite different. For example, tables may be present in living rooms and dining rooms even though these are quite different rooms

    An Object Co-occurrence Assisted Hierarchical Model for Scene Understanding

    Full text link
    Hierarchical methods have been widely explored for object recognition, which is a critical component of scene understanding. However, few existing works are able to model the contextual information (e.g., objects co-occurrence) explicitly within a sin-gle coherent framework for scene understanding. Towards this goal, in this paper we propose a novel three-level (superpixel level, object level and scene level) hierarchical model to address the scene categorization problem. Our proposed model is a coher-ent probabilistic graphical model that captures the object co-occurrence information for scene understanding with a probabilistic chain structure. The efficacy of the proposed model is demonstrated by conducting experiments on the LabelMe dataset.

    Indoor home scene recognition using capsule neural networks

    Get PDF
    This paper presents the use of a class of Deep Neural Networks for recognizing indoor home scenes so as to aid Intelligent Assistive Systems (IAS) in performing indoor services to assist elderly or infirm people. Identifying exact indoor location is important so that objects associated with particular tasks can be located speedily and efficiently irrespective of position or orientation. In this way, IAS developed for providing services may become more efficient in accomplishing designated tasks satisfactorily. There are many Convolutional Neural Networks (CNNs) which have been developed for outdoor scene classification and, also, for interior (not necessarily indoor home) scene classification. However, to date, there are no CNNs which are trained, validated and tested on indoor home scene datasets as there appears to be an absence of sufficiently large databases of home scenes. Nonetheless, it is important to train systems which are meant to operate within home environments with the correct relevant data. To counteract this problem, it is proposed that a different type of network is used, which is not very deep (i.e., a network which does not have too many layers) but which can attain sufficiently high classification accuracy using smaller training datasets. A type of neural network likely to help achieve this is a Capsule Neural Network (CapsNet). In this paper, 20,000 indoor home scenes were used for training the CapsNet, and 5000 images were used for testing it. The validation accuracy achieved is 71% and testing accuracy achieved is 70%

    Detecting 3D geometric boundaries of indoor scenes under varying lighting

    Full text link
    The goal of this research is to identify 3D geometric boundaries in a set of 2D photographs of a static indoor scene under unknown, changing lighting conditions. A 3D geometric boundary is a contour located at a 3D depth discontinuity or a discontinuity in the surface normal. These boundaries can be used effectively for reasoning about the 3D layout of a scene. To distinguish 3D geometric boundaries from 2D texture edges, we analyze the illumination subspace of local appearance at each image location. In indoor time-lapse photography and surveillance video, we frequently see images that are lit by unknown combinations of uncalibrated light sources. We in-troduce an algorithm for semi-binary non-negative matrix factorization (SBNMF) to decompose such images into a set of lighting basis images, each of which shows the scene lit by a single light source. These basis images provide a natural, succinct representation of the scene, enabling tasks such as scene editing (e.g., relighting) and shadow edge identificatio

    CONSTRUCTION OF A DUAL-TASK MODEL FOR INDOOR SCENE RECOGNITION AND SEMANTIC SEGMENTATION BASED ON POINT CLOUDS

    Get PDF
    Indoor scene recognition remains a challenging problem in the fields of artificial intelligence and computer vision due to the complexity, similarity, and spatial variability of indoor scenes. The existing research is mainly based on 2D data, which lacks 3D information about the scene and cannot accurately identify scenes with a high frequency of changes in lighting, shading, layout, etc. Moreover, the existing research usually focuses on the global features of the scene, which cannot represent indoor scenes with cluttered objects and complex spatial layouts. To solve the above problems, this paper proposes a dual-task model for indoor scene recognition and semantic segmentation based on point cloud data. The model expands the data loading method by giving the dataset loader the ability to return multi-dimensional labels and then realizes the dual-task model of scene recognition and semantic segmentation by fine-tuning PointNet++, setting task state control parameters, and adding a common feature layer. Finally, in order to solve the problem that the similarity of indoor scenes leads to the wrong scene recognition results, the rules of scenes and elements are constructed to correct the scene recognition results. The experimental results showed that with the assistance of scene-element rules, the overall accuracy of scene recognition with the proposed method in this paper is 82.4%, and the overall accuracy of semantic segmentation is 98.9%, which is better than the comparison model in this paper and provides a new method for cognition of indoor scenes based on 3D point clouds

    Development and Adaptation of Robotic Vision in the Real-World: the Challenge of Door Detection

    Full text link
    Mobile service robots are increasingly prevalent in human-centric, real-world domains, operating autonomously in unconstrained indoor environments. In such a context, robotic vision plays a central role in enabling service robots to perceive high-level environmental features from visual observations. Despite the data-driven approaches based on deep learning push the boundaries of vision systems, applying these techniques to real-world robotic scenarios presents unique methodological challenges. Traditional models fail to represent the challenging perception constraints typical of service robots and must be adapted for the specific environment where robots finally operate. We propose a method leveraging photorealistic simulations that balances data quality and acquisition costs for synthesizing visual datasets from the robot perspective used to train deep architectures. Then, we show the benefits in qualifying a general detector for the target domain in which the robot is deployed, showing also the trade-off between the effort for obtaining new examples from such a setting and the performance gain. In our extensive experimental campaign, we focus on the door detection task (namely recognizing the presence and the traversability of doorways) that, in dynamic settings, is useful to infer the topology of the map. Our findings are validated in a real-world robot deployment, comparing prominent deep-learning models and demonstrating the effectiveness of our approach in practical settings

    Desarrollo de un sistema para la interpretación y predicción de la situación del tráfico mediante Deep Learning

    Get PDF
    La comprensión semántica de una escena es un aspecto clave en múltiples aplicaciones de inteligencia artificial, tanto para los Sistemas Inteligentes de Transporte como para los robots. En este Trabajo Fin de Máster se diseña, desarrolla y evalúa un sistema que, basado en la segmentación semántica de imágenes, obtenida mediante una red neuronal convolucional, permite realizar las distintas tareas que abarca la comprensión de una escena: clasificación, detección de objetos y la propia segmentación semántica, de una manera sencilla y eficiente. Además, proponemos una solución enfocada a vehículos inteligentes, que permite, utilizando la segmentación semántica, estimar la velocidad a la que debe circular el vehículo. Para ello, hemos construido una nueva base de datos en la que poder evaluar este nuevo problema. Los resultados confirman que es posible y beneficioso confiar en la segmentación semántica para llevar a cabo las distintas tareas.Semantic scene understanding is a key aspect of multiple artificial intelligence applications, from Intelligent Transportation Systems to robotics. In this Final Project, we design, develop and evaluate a system that, based on the semantic segmentation of images obtained through a convolutional neural network, allows to carry out the different tasks comprising scene understanding: classification, object detection and the aforementioned semantic segmentation, in a simple yet efficient manner. In addition, we propose a solution focused on intelligent vehicles, which allows us, using semantic segmentation, to estimate the speed at which the vehicle must be driven. To this end, we have built a new database in which we can evaluate this challenging new problem. The results confirm that it is possible and beneficial to rely on semantic segmentation to successfully perform the different tasks.Máster Universitario en Ingeniería de Telecomunicación (M125

    Visual sequence-based place recognition for changing conditions and varied viewpoints

    Get PDF
    Correctly identifying previously-visited locations is essential for robotic place recognition and localisation. This thesis presents training-free solutions to vision-based place recognition under changing environmental conditions and camera viewpoints. Using vision as a primary sensor, the proposed approaches combine image segmentation and rescaling techniques over sequences of visual imagery to enable successful place recognition over a range of challenging environments where prior techniques have failed
    corecore