2,532 research outputs found

    A multi-modal person perception framework for socially interactive mobile service robots

    Get PDF
    In order to meet the increasing demands of mobile service robot applications, a dedicated perception module is an essential requirement for the interaction with users in real-world scenarios. In particular, multi sensor fusion and human re-identification are recognized as active research fronts. Through this paper we contribute to the topic and present a modular detection and tracking system that models position and additional properties of persons in the surroundings of a mobile robot. The proposed system introduces a probability-based data association method that besides the position can incorporate face and color-based appearance features in order to realize a re-identification of persons when tracking gets interrupted. The system combines the results of various state-of-the-art image-based detection systems for person recognition, person identification and attribute estimation. This allows a stable estimate of a mobile robot’s user, even in complex, cluttered environments with long-lasting occlusions. In our benchmark, we introduce a new measure for tracking consistency and show the improvements when face and appearance-based re-identification are combined. The tracking system was applied in a real world application with a mobile rehabilitation assistant robot in a public hospital. The estimated states of persons are used for the user-centered navigation behaviors, e.g., guiding or approaching a person, but also for realizing a socially acceptable navigation in public environments

    Automatic object classification for surveillance videos.

    Get PDF
    PhDThe recent popularity of surveillance video systems, specially located in urban scenarios, demands the development of visual techniques for monitoring purposes. A primary step towards intelligent surveillance video systems consists on automatic object classification, which still remains an open research problem and the keystone for the development of more specific applications. Typically, object representation is based on the inherent visual features. However, psychological studies have demonstrated that human beings can routinely categorise objects according to their behaviour. The existing gap in the understanding between the features automatically extracted by a computer, such as appearance-based features, and the concepts unconsciously perceived by human beings but unattainable for machines, or the behaviour features, is most commonly known as semantic gap. Consequently, this thesis proposes to narrow the semantic gap and bring together machine and human understanding towards object classification. Thus, a Surveillance Media Management is proposed to automatically detect and classify objects by analysing the physical properties inherent in their appearance (machine understanding) and the behaviour patterns which require a higher level of understanding (human understanding). Finally, a probabilistic multimodal fusion algorithm bridges the gap performing an automatic classification considering both machine and human understanding. The performance of the proposed Surveillance Media Management framework has been thoroughly evaluated on outdoor surveillance datasets. The experiments conducted demonstrated that the combination of machine and human understanding substantially enhanced the object classification performance. Finally, the inclusion of human reasoning and understanding provides the essential information to bridge the semantic gap towards smart surveillance video systems

    A Methodology for Extracting Human Bodies from Still Images

    Get PDF
    Monitoring and surveillance of humans is one of the most prominent applications of today and it is expected to be part of many future aspects of our life, for safety reasons, assisted living and many others. Many efforts have been made towards automatic and robust solutions, but the general problem is very challenging and remains still open. In this PhD dissertation we examine the problem from many perspectives. First, we study the performance of a hardware architecture designed for large-scale surveillance systems. Then, we focus on the general problem of human activity recognition, present an extensive survey of methodologies that deal with this subject and propose a maturity metric to evaluate them. One of the numerous and most popular algorithms for image processing found in the field is image segmentation and we propose a blind metric to evaluate their results regarding the activity at local regions. Finally, we propose a fully automatic system for segmenting and extracting human bodies from challenging single images, which is the main contribution of the dissertation. Our methodology is a novel bottom-up approach relying mostly on anthropometric constraints and is facilitated by our research in the fields of face, skin and hands detection. Experimental results and comparison with state-of-the-art methodologies demonstrate the success of our approach

    Proceedings of the 2009 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

    Get PDF
    The joint workshop of the Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB, Karlsruhe, and the Vision and Fusion Laboratory (Institute for Anthropomatics, Karlsruhe Institute of Technology (KIT)), is organized annually since 2005 with the aim to report on the latest research and development findings of the doctoral students of both institutions. This book provides a collection of 16 technical reports on the research results presented on the 2009 workshop

    Tracking interacting targets in multi-modal sensors

    Get PDF
    PhDObject tracking is one of the fundamental tasks in various applications such as surveillance, sports, video conferencing and activity recognition. Factors such as occlusions, illumination changes and limited field of observance of the sensor make tracking a challenging task. To overcome these challenges the focus of this thesis is on using multiple modalities such as audio and video for multi-target, multi-modal tracking. Particularly, this thesis presents contributions to four related research topics, namely, pre-processing of input signals to reduce noise, multi-modal tracking, simultaneous detection and tracking, and interaction recognition. To improve the performance of detection algorithms, especially in the presence of noise, this thesis investigate filtering of the input data through spatio-temporal feature analysis as well as through frequency band analysis. The pre-processed data from multiple modalities is then fused within Particle filtering (PF). To further minimise the discrepancy between the real and the estimated positions, we propose a strategy that associates the hypotheses and the measurements with a real target, using a Weighted Probabilistic Data Association (WPDA). Since the filtering involved in the detection process reduces the available information and is inapplicable on low signal-to-noise ratio data, we investigate simultaneous detection and tracking approaches and propose a multi-target track-beforedetect Particle filtering (MT-TBD-PF). The proposed MT-TBD-PF algorithm bypasses the detection step and performs tracking in the raw signal. Finally, we apply the proposed multi-modal tracking to recognise interactions between targets in regions within, as well as outside the cameras’ fields of view. The efficiency of the proposed approaches are demonstrated on large uni-modal, multi-modal and multi-sensor scenarios from real world detections, tracking and event recognition datasets and through participation in evaluation campaigns

    Maritime Augmented Reality mit a prioriWissen aus Seekarten

    Get PDF
    The main objective of this thesis is to provide a concept to augment mar- itime sea chart information into the camera view of the user. The benefit is the simpler navigation due to the offered 3D information and the overlay onto the real 3D environment. In the maritime context special conditions hold. The sensor technologies have to be reliable in the environment of a ship’s ferrous construction. The aug- mentation of the objects has to be very precise due to the far distances of observable objects on the sea surface. Furthermore, the approach has to be reliable due to the wide range of light conditions. For a practical solution, the system has to be mobile, light-weight and with a real-time performance. To achieve this goal, the requirements are set, the possible measurement units and the data base structure are presented. First, the requirements are analyzed and a suitable system is designed. By the combination of proper sensor techniques, the local position and orienta- tion of the user can be estimated. To verify the concept, several prototypes with exchangeable units have been evaluated. This first concept is based on a marker-based approach which leads to some drawbacks. To overcome the drawbacks, the second aspect is the improvement of the sys- tem and the analysis of markerless approaches. One possible strategy will be presented. The approach uses the statistical technique of Bayesian networks to vote for single objects in the environment. By this procedure it will be shown, that due to the a priori information the underlying sea chart system has the most benefit. The analysis of the markerless approach shows, that the sea charts structure has to be adapted to the new requirements of interactive 3D augmentation scenes. After the analysis of the chart data concept, an approach for the optimization of the charts by building up an object-to-object topology within the charts data and the Bayesian object detection approach is presented. Finally, several evaluations show the performance of the imple- mented evaluation application.Diese Arbeit stellt ein Konzept zur Verfügung, um Seekarteninformationen in eine Kamera so einzublenden, dass die Informationen lagerichtig im Sichtfeld des Benutzers erscheinen. Der Mehrwert ist eine einfachere Navigation durch die Nutzung von 3D-Symbolen in der realen Umgebung. Im maritimen Umfeld gelten besondere Anforderungen an die Aufgabenstellung. Die genutzten Sensoren müssen in der Lage sein, robuste Daten in Anwesenheit der eisenhaltigen Materialien auf dem Schiff zu liefern. Die Augmentierung muss hoch genau berechnet werden, da die beobachtbaren Objekte zum Teil sehr weit entfernt auf der Meeresoberfläche verteilt sind. Weiterhin gelten die Bedingungen einer Außenumgebung, wie variierende Wetter- und Lichtbedingungen. Um eine praktikable Anwendung gewährleisten zu können, ist ein mobiles, leicht-gewichtiges und echtzeitfähiges System zu entwickeln. In dieser Arbeit werden die Anforderungen gesetzt und Konzepte für die Hardware- und Softwarelösungen beschrieben. Im ersten Teil werden die Anforderungen analysiert und ein geeignetes Hardwaresystem entwickelt. Durch die passende Kombination von Sensortechnologien kann damit die lokale Position und Orientierung des Benutzers berechnet werden. Um das Konzept zu evaluieren sind verschiedene modulare Hardware- und Softwarekonzepte als Prototypen umgesetzt worden. Das erste Softwarekonzept befasst sich mit einem markerbasierten Erkennungsalgorithmus, der in der Evaluation einige Nachteile zeigt. Dementsprechende Verbesserungen wurden in einem zweiten Softwarekonzept durch einen markerlosen Ansatz umgesetzt. Dieser Lösungsansatz nutzt Bayes'sche Netzwerke zur Erkennung einzelner Objekte in der Umgebung. Damit kann gezeigt werden, dass mit der Hilfe von a priori Informationen die dem System zugrunde liegenden Seekarten sehr gut zu diesem Zweck genutzt werden können. Die Analyse des Systemkonzeptes zeigt des weiteren, dass die Datenstruktur der Seekarten für die Anforderungen einer interaktiven, benutzergeführten 3D- Augmentierungsszene angepasst werden müssen. Nach der ausführlichen Analyse des Seekarten-Datenkonzeptes wird ein Lösungsansatz zur Optimierung der internen Seekartenstruktur aufgezeigt. Dies wird mit der Erstellung einer Objekt-zu-Objekt-Topologie in der Datenstruktur und der Verbindung zum Bayes'schen Objekterkennungsalgorithmus umgesetzt. Anschließend zeigen Evaluationen die Fähigkeiten des endgültigen Systems

    Face Recognition: An Engineering Approach

    Get PDF
    In computer vision, face recognition is the process of labeling a face as recognized or unrecognized. The process is based on a pipeline that goes through collection, detection, pre-processing, and recognition stages. The focus of this study is on the last stage of the pipeline with the assumption that images have already been collected and pre-processed. Conventional solutions to face recognition use the entire facial image as the input to their algorithms. We present a different approach where the input to the recognition algorithm is the individual segment of the face such as the left eye, the right eye, the nose, and the mouth. Two separate experiments are conducted on the AT&T database of faces [1]. In the first experiment, the entire image is used to run the Eigen-face, the Fisher-face, and the local binary pattern algorithms. For each run, accuracy and error rate of the results are tabulated and analyzed. In the second experiment, extracted facial feature segments are used as the input to the same algorithms. The output from each algorithm is subsequently labeled and placed in the appropriate feature class. Our analysis shows how the granularity of collected data for each segmented class can be leveraged to obtain an improved accuracy rate over the full face approach
    corecore