1,608 research outputs found

    Learning Robust Object Recognition Using Composed Scenes from Generative Models

    Full text link
    Recurrent feedback connections in the mammalian visual system have been hypothesized to play a role in synthesizing input in the theoretical framework of analysis by synthesis. The comparison of internally synthesized representation with that of the input provides a validation mechanism during perceptual inference and learning. Inspired by these ideas, we proposed that the synthesis machinery can compose new, unobserved images by imagination to train the network itself so as to increase the robustness of the system in novel scenarios. As a proof of concept, we investigated whether images composed by imagination could help an object recognition system to deal with occlusion, which is challenging for the current state-of-the-art deep convolutional neural networks. We fine-tuned a network on images containing objects in various occlusion scenarios, that are imagined or self-generated through a deep generator network. Trained on imagined occluded scenarios under the object persistence constraint, our network discovered more subtle and localized image features that were neglected by the original network for object classification, obtaining better separability of different object classes in the feature space. This leads to significant improvement of object recognition under occlusion for our network relative to the original network trained only on un-occluded images. In addition to providing practical benefits in object recognition under occlusion, this work demonstrates the use of self-generated composition of visual scenes through the synthesis loop, combined with the object persistence constraint, can provide opportunities for neural networks to discover new relevant patterns in the data, and become more flexible in dealing with novel situations.Comment: Accepted by 14th Conference on Computer and Robot Visio

    Restoration of Partially Occluded Shapes of Faces using Neural Networks

    Get PDF
    One of the major difficulties encountered in the development of face image processing algorithms, is the possible presence of occlusions that hide part of the face images to be processed. Typical examples of facial occlusions include sunglasses, beards, hats and scarves. In our work we address the problem of restoring the overall shape of faces given only the shape presentation of a small part of the face. In the experiments described in this paper the shape of a face is defined by a series of landmarks located on the face outline and on the outline of different facial features. We describe the use of a number of methods including a method that utilizes a Hopfield neural network, a method that uses Multi-Layer Perceptron (MLP) neural network, a novel technique which combines Hopfield and MLP together, and a method based on associative search. We analyze comparative experiments in order to assess the performance of the four methods mentioned above. According to the experimental results it is possible to recover with reasonable accuracy the overall shape of faces even in the case that a substantial part of the shape of a given face is not visible. The techniques presented could form the basis for developing face image processing systems capable of dealing with occluded faces

    Evolutionary Robot Vision for People Tracking Based on Local Clustering

    Get PDF
    This paper discusses the role of evolutionary computation in visual perception for partner robots. The search of evolutionary computation has many analogies with human visual search. First of all, we discuss the analogies between the evolutionary search and human visual search. Next, we propose the concept of evolutionary robot vision, and a human tracking method based on the evolutionary robot vision. Finally, we show experimental results of the human tracking to discuss the effectiveness of our proposed method

    Robotic Cameraman for Augmented Reality based Broadcast and Demonstration

    Get PDF
    In recent years, a number of large enterprises have gradually begun to use vari-ous Augmented Reality technologies to prominently improve the audiences’ view oftheir products. Among them, the creation of an immersive virtual interactive scenethrough the projection has received extensive attention, and this technique refers toprojection SAR, which is short for projection spatial augmented reality. However,as the existing projection-SAR systems have immobility and limited working range,they have a huge difficulty to be accepted and used in human daily life. Therefore,this thesis research has proposed a technically feasible optimization scheme so thatit can be practically applied to AR broadcasting and demonstrations. Based on three main techniques required by state-of-art projection SAR applica-tions, this thesis has created a novel mobile projection SAR cameraman for ARbroadcasting and demonstration. Firstly, by combining the CNN scene parsingmodel and multiple contour extractors, the proposed contour extraction pipelinecan always detect the optimal contour information in non-HD or blurred images.This algorithm reduces the dependency on high quality visual sensors and solves theproblems of low contour extraction accuracy in motion blurred images. Secondly, aplane-based visual mapping algorithm is introduced to solve the difficulties of visualmapping in these low-texture scenarios. Finally, a complete process of designing theprojection SAR cameraman robot is introduced. This part has solved three mainproblems in mobile projection-SAR applications: (i) a new method for marking con-tour on projection model is proposed to replace the model rendering process. Bycombining contour features and geometric features, users can identify objects oncolourless model easily. (ii) a camera initial pose estimation method is developedbased on visual tracking algorithms, which can register the start pose of robot to thewhole scene in Unity3D. (iii) a novel data transmission approach is introduced to establishes a link between external robot and the robot in Unity3D simulation work-space. This makes the robotic cameraman can simulate its trajectory in Unity3D simulation work-space and project correct virtual content. Our proposed mobile projection SAR system has made outstanding contributionsto the academic value and practicality of the existing projection SAR technique. Itfirstly solves the problem of limited working range. When the system is running ina large indoor scene, it can follow the user and project dynamic interactive virtualcontent automatically instead of increasing the number of visual sensors. Then,it creates a more immersive experience for audience since it supports the user hasmore body gestures and richer virtual-real interactive plays. Lastly, a mobile systemdoes not require up-front frameworks and cheaper and has provided the public aninnovative choice for indoor broadcasting and exhibitions

    Human Detection and Gesture Recognition Based on Ambient Intelligence

    Get PDF

    Towards Large-Scale Small Object Detection: Survey and Benchmarks

    Full text link
    With the rise of deep convolutional neural networks, object detection has achieved prominent advances in past years. However, such prosperity could not camouflage the unsatisfactory situation of Small Object Detection (SOD), one of the notoriously challenging tasks in computer vision, owing to the poor visual appearance and noisy representation caused by the intrinsic structure of small targets. In addition, large-scale dataset for benchmarking small object detection methods remains a bottleneck. In this paper, we first conduct a thorough review of small object detection. Then, to catalyze the development of SOD, we construct two large-scale Small Object Detection dAtasets (SODA), SODA-D and SODA-A, which focus on the Driving and Aerial scenarios respectively. SODA-D includes 24828 high-quality traffic images and 278433 instances of nine categories. For SODA-A, we harvest 2513 high resolution aerial images and annotate 872069 instances over nine classes. The proposed datasets, as we know, are the first-ever attempt to large-scale benchmarks with a vast collection of exhaustively annotated instances tailored for multi-category SOD. Finally, we evaluate the performance of mainstream methods on SODA. We expect the released benchmarks could facilitate the development of SOD and spawn more breakthroughs in this field. Datasets and codes are available at: \url{https://shaunyuan22.github.io/SODA}

    A dynamic neural field approach to natural and efficient human-robot collaboration

    Get PDF
    A major challenge in modern robotics is the design of autonomous robots that are able to cooperate with people in their daily tasks in a human-like way. We address the challenge of natural human-robot interactions by using the theoretical framework of dynamic neural fields (DNFs) to develop processing architectures that are based on neuro-cognitive mechanisms supporting human joint action. By explaining the emergence of self-stabilized activity in neuronal populations, dynamic field theory provides a systematic way to endow a robot with crucial cognitive functions such as working memory, prediction and decision making . The DNF architecture for joint action is organized as a large scale network of reciprocally connected neuronal populations that encode in their firing patterns specific motor behaviors, action goals, contextual cues and shared task knowledge. Ultimately, it implements a context-dependent mapping from observed actions of the human onto adequate complementary behaviors that takes into account the inferred goal of the co-actor. We present results of flexible and fluent human-robot cooperation in a task in which the team has to assemble a toy object from its components.The present research was conducted in the context of the fp6-IST2 EU-IP Project JAST (proj. nr. 003747) and partly financed by the FCT grants POCI/V.5/A0119/2005 and CONC-REEQ/17/2001. We would like to thank Luis Louro, Emanuel Sousa, Flora Ferreira, Eliana Costa e Silva, Rui Silva and Toni Machado for their assistance during the robotic experiment
    • …
    corecore