1,297 research outputs found

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]

    PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects

    Full text link
    International audienceIn this paper, we present a novel algorithm for fast tracking of generic objects in videos. The algorithm uses two components: a detector that makes use of the generalised Hough transform with pixel-based descriptors, and a probabilistic segmentation method based on global models for foreground and background. These components are used for tracking in a combined way, and they adapt each other in a co-training manner. Through effective model adaptation and segmentation, the algorithm is able to track objects that undergo rigid and non-rigid deformations and considerable shape and appearance variations. The proposed tracking method has been thoroughly evaluated on challenging standard videos, and outperforms state-of-theart tracking methods designed for the same task. Finally, the proposed models allow for an extremely efficient implementation, and thus tracking is very fast

    Improved foreground detection via block-based classifier cascade with probabilistic decision integration

    Get PDF
    Background subtraction is a fundamental low-level processing task in numerous computer vision applications. The vast majority of algorithms process images on a pixel-by-pixel basis, where an independent decision is made for each pixel. A general limitation of such processing is that rich contextual information is not taken into account. We propose a block-based method capable of dealing with noise, illumination variations, and dynamic backgrounds, while still obtaining smooth contours of foreground objects. Specifically, image sequences are analyzed on an overlapping block-by-block basis. A low-dimensional texture descriptor obtained from each block is passed through an adaptive classifier cascade, where each stage handles a distinct problem. A probabilistic foreground mask generation approach then exploits block overlaps to integrate interim block-level decisions into final pixel-level foreground segmentation. Unlike many pixel-based methods, ad-hoc postprocessing of foreground masks is not required. Experiments on the difficult Wallflower and I2R datasets show that the proposed approach obtains on average better results (both qualitatively and quantitatively) than several prominent methods. We furthermore propose the use of tracking performance as an unbiased approach for assessing the practical usefulness of foreground segmentation methods, and show that the proposed approach leads to considerable improvements in tracking accuracy on the CAVIAR dataset

    Dynamical models and machine learning for supervised segmentation

    Get PDF
    This thesis is concerned with the problem of how to outline regions of interest in medical images, when the boundaries are weak or ambiguous and the region shapes are irregular. The focus on machine learning and interactivity leads to a common theme of the need to balance conflicting requirements. First, any machine learning method must strike a balance between how much it can learn and how well it generalises. Second, interactive methods must balance minimal user demand with maximal user control. To address the problem of weak boundaries,methods of supervised texture classification are investigated that do not use explicit texture features. These methods enable prior knowledge about the image to benefit any segmentation framework. A chosen dynamic contour model, based on probabilistic boundary tracking, combines these image priors with efficient modes of interaction. We show the benefits of the texture classifiers over intensity and gradient-based image models, in both classification and boundary extraction. To address the problem of irregular region shape, we devise a new type of statistical shape model (SSM) that does not use explicit boundary features or assume high-level similarity between region shapes. First, the models are used for shape discrimination, to constrain any segmentation framework by way of regularisation. Second, the SSMs are used for shape generation, allowing probabilistic segmentation frameworks to draw shapes from a prior distribution. The generative models also include novel methods to constrain shape generation according to information from both the image and user interactions. The shape models are first evaluated in terms of discrimination capability, and shown to out-perform other shape descriptors. Experiments also show that the shape models can benefit a standard type of segmentation algorithm by providing shape regularisers. We finally show how to exploit the shape models in supervised segmentation frameworks, and evaluate their benefits in user trials

    Gaussian belief propagation for real-time decentralised inference

    Get PDF
    For embodied agents to interact intelligently with their surroundings, they require perception systems that construct persistent 3D representations of their environments. These representations must be rich; capturing 3D geometry, semantics, physical properties, affordances and much more. Constructing the environment representation from sensory observations is done via Bayesian probabilistic inference and in practical systems, inference must take place within the power, compactness and simplicity constraints of real products. Efficient inference within these constraints however remains computationally challenging and current systems often require heavy computational resources while delivering a fraction of the desired capabilities. Decentralised algorithms based on local message passing with in-place processing and storage offer a promising solution to current inference bottlenecks. They are well suited to take advantage of recent rapid developments in distributed asynchronous processing hardware to achieve efficient, scalable and low-power performance. In this thesis, we argue for Gaussian belief propagation (GBP) as a strong algorithmic framework for distributed, generic and incremental probabilistic estimation. GBP operates by passing messages between the nodes on a factor graph and can converge with arbitrary asynchronous message schedules. We envisage the factor graph being the fundamental master environment representation, and GBP the flexible inference tool to compute local in-place probabilistic estimates. In large real-time systems, GBP will act as the `glue' between specialised modules, with attention based processing bringing about local convergence in the graph in a just-in-time manner. This thesis contains several technical and theoretical contributions in the application of GBP to practical real-time inference problems in vision and robotics. Additionally, we implement GBP on novel graph processor hardware and demonstrate breakthrough speeds for bundle adjustment problems. Lastly, we present a prototype system for incrementally creating hierarchical abstract scene graphs by combining neural networks and probabilistic inference via GBP.Open Acces

    Temporally Coherent General Dynamic Scene Reconstruction

    Get PDF
    Existing techniques for dynamic scene reconstruction from multiple wide-baseline cameras primarily focus on reconstruction in controlled environments, with fixed calibrated cameras and strong prior constraints. This paper introduces a general approach to obtain a 4D representation of complex dynamic scenes from multi-view wide-baseline static or moving cameras without prior knowledge of the scene structure, appearance, or illumination. Contributions of the work are: An automatic method for initial coarse reconstruction to initialize joint estimation; Sparse-to-dense temporal correspondence integrated with joint multi-view segmentation and reconstruction to introduce temporal coherence; and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes by introducing shape constraint. Comparison with state-of-the-art approaches on a variety of complex indoor and outdoor scenes, demonstrates improved accuracy in both multi-view segmentation and dense reconstruction. This paper demonstrates unsupervised reconstruction of complete temporally coherent 4D scene models with improved non-rigid object segmentation and shape reconstruction and its application to free-viewpoint rendering and virtual reality.Comment: Submitted to IJCV 2019. arXiv admin note: substantial text overlap with arXiv:1603.0338

    A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

    Full text link
    Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second authorshi

    Going beyond semantic image segmentation, towards holistic scene understanding, with associative hierarchical random fields

    Get PDF
    In this thesis we exploit the generality and expressive power of the Associative Hierarchical Random Field (AHRF) graphical model to take its use beyond that of semantic image segmentation, into object-classes, towards a framework for holistic scene understanding. We provide a working definition for the holistic approach to scene understanding, which allows for the integration of existing, disparate, applications into an unifying ensemble. We believe that modelling such an ensemble as an AHRF is both a principled and pragmatic solution. We present a hierarchy that shows several methods for fusing applications together with the AHRF graphical model. Each of the three; feature, potential and energy, layers subsumes its predecessor in generality and together give rise to many options for integration. With applications on street scenes we demonstrate an implementation of each layer. The first layer application joins appearance and geometric features. For our second layer we implement a things and stuff co-junction using higher order AHRF potentials for object detectors, with the goal of answering the classic questions: What? Where? and How many? A holistic approach to recognition-and-reconstruction is realised within our third layer by linking two energy based formulations of both applications. Each application is evaluated qualitatively and quantitatively. In all cases our holistic approach shows improvement over baseline methods
    • …
    corecore