3,341 research outputs found

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]

    Human Pose Tracking from Monocular Image Sequences

    Get PDF
    This thesis proposes various novel approaches for improving the performance of automatic 2D human pose tracking system including multi-scale strategy, mid-level spatial dependencies to constrain more relations of multiple body parts, additional constraints between symmetric body parts and the left/right confusion correction by a head orientation estimator. These proposed approaches are employed to develop a complete human pose tracking system. The experimental results demonstrate significant improvements of all the proposed approaches towards accuracy and efficiency

    Tracking people across disjoint camera views by an illumination-tolerant appearance representation

    Full text link
    Tracking single individuals as they move across disjoint camera views is a challenging task since their appearance may vary significantly between views. Major changes in appearance are due to different and varying illumination conditions and the deformable geometry of people. These effects are hard to estimate and take into account in real-life applications. Thus, in this paper we propose an illumination-tolerant appearance representation, which is capable of coping with the typical illumination changes occurring in surveillance scenarios. The appearance representation is based on an online k-means colour clustering algorithm, a data-adaptive intensity transformation and the incremental use of frames. A similarity measurement is also introduced to compare the appearance representations of any two arbitrary individuals. Post-matching integration of the matching decision along the individuals' tracks is performed in order to improve reliability and robustness of matching. Once matching is provided for any two views of a single individual, its tracking across disjoint cameras derives straightforwardly. Experimental results presented in this paper from a real surveillance camera network show the effectiveness of the proposed method. © Springer-Verlag 2007

    Video object tracking : contributions to object description and performance assessment

    Get PDF
    Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Universidade do Porto. Faculdade de Engenharia. 201

    Pedestrian detection and tracking using stereo vision techniques

    Get PDF
    Automated pedestrian detection, counting and tracking has received significant attention from the computer vision community of late. Many of the person detection techniques described so far in the literature work well in controlled environments, such as laboratory settings with a small number of people. This allows various assumptions to be made that simplify this complex problem. The performance of these techniques, however, tends to deteriorate when presented with unconstrained environments where pedestrian appearances, numbers, orientations, movements, occlusions and lighting conditions violate these convenient assumptions. Recently, 3D stereo information has been proposed as a technique to overcome some of these issues and to guide pedestrian detection. This thesis presents such an approach, whereby after obtaining robust 3D information via a novel disparity estimation technique, pedestrian detection is performed via a 3D point clustering process within a region-growing framework. This clustering process avoids using hard thresholds by using bio-metrically inspired constraints and a number of plan view statistics. This pedestrian detection technique requires no external training and is able to robustly handle challenging real-world unconstrained environments from various camera positions and orientations. In addition, this thesis presents a continuous detect-and-track approach, with additional kinematic constraints and explicit occlusion analysis, to obtain robust temporal tracking of pedestrians over time. These approaches are experimentally validated using challenging datasets consisting of both synthetic data and real-world sequences gathered from a number of environments. In each case, the techniques are evaluated using both 2D and 3D groundtruth methodologies

    Comparison between gaze and moving objects in videos for smooth pursuit eye movement evaluation

    Get PDF
    When viewing moving objects in videos the movement of the eyes is called smooth pursuit. For evaluating the relationship of eye tracking data to the moving objects, the objects in the videos need to be detected and tracked. In the first part of this thesis, a method for detecting and tracking of moving objects in videos is developed. The method mainly consists of a modified version of the Gaussian mixture model, The Tracking feature point method, a modified version of the Mean shift algorithm, Matlabs function bwlabel and a set of new developed methods. The performance of the method is highest when the background is static and the objects differ in colour from the background. The false detection rate increases, when the video environment becomes more dynamic and complex. In the second part of this thesis the distance between the point of gaze and the moving objects centre point is calculated. The eyes may not always follow the centre position of an object, but rather some other part of the object. Therefore, the method gives more satisfactory result when the objects are small.Utvärdering av smooth pursuit-rörelser. En jämförelse mellan ögonrörelser och rörliga objekt i videosekvenser Populärvetenskaplig sammanfattning av examensarbetet: Andrea Åkerström Ett forskningsområde som har vuxit mycket de senaste åren är ”eye tracking”: en teknik för att undersöka ögonrörelser. Tekniken har visat sig intressant för studier inom exempelvis visuella system, i psykologi och i interaktioner mellan datorer och människor. Ett eye tracking system mäter ögonens rörelser så att de punkterna ögat tittar på kan bli estimerade. Tidigare har de flesta studier inom eye tracking baserats på bilder, men på senare tid har även intresset för att studera filmsekvenser vuxit. Den typ av rörelse som ögat utför när det följer ett rörligt objekt kallas för smooth pursuitrörelse. En av svårigheterna med att utvärdera relationen mellan eye tracking-data och rörliga objekten i filmer är att objekten, antingen manuellt mäts ut eller att ett intelligent system utvecklas för en automatisk utvärdering. Det som gör processen att detektera och följa rörliga objekt i filmer komplex är att olika videosekvenser kan ha många olika typer av svåra videoscenarion som metoden måste klara av. Till exempel kan bakgrunden i en video vara dynamisk, det kan finnas störningar som regn eller snö, eller kan problemet vara att kameran skakar eller rör sig. Syftet med detta arbete består av två delar. Den först delen, som också har varit den största, har varit att utveckla en metod som kan detektera och följa rörliga objekt i olika typer av videosekvenser, baserad på metoder från tidigare forskning. Den andra delen har varit att försöka utveckla en automatisk utvärdering av ögonrörelsen smooth persuit, genom att använda de detekterade och följda objekten i videosekvenserna tillsammans med redan existerande ögondata. För att utveckla den metod har olika metoder från tidigare forskning kombinerat. Alla metoder som har utvecklas i detta område har olika för och nackdelar och fungerade bättre eller sämre för olika typer av videoscenarion. Målet för metoden i detta arbete har varit att hitta en kombination av olika metoder som, genom att kompensera varandras för- och nackdelar, kan ge en så bra detektering som möjligt för olika typer av filmsekvenser. Min metod är till största del uppbyggd av tre metoder: En modifierad version av Guasssian Mixture Model, Tracking Feature Point och en modifierad version av Mean Shift Algorithmen. Guassian Mixture Model-metoden används för att detekterar pixlar i filmen som tillhör objekt som är i rörelse. Metoden tar fram dynamiska modeller av bakgrunden i filmen och detekterar pixlar som skiljer sig från backgrundsmodellerna. Detta är en väl använd metod som kan hantera komplexa bakgrunder med periodiskt brus, men den ger samtidigt ofta upphov till felaktiga detektioner och den kan inte hantera kamerarörelser. För att hantera kamerarörelser används Tracking Feature Point-metoden och på så sätt kompenseras denna brist hos Guassian Mixture Modell-metoden. Tracking Feature Point tar fram ”feature points” ut videobilder och med hjälp av dem kan metoden estimera kameraförflyttningar. Denna metod räknar dock endast ut de förflyttningar som kameran gör, men den tar inte hänsyn till om kameran roterar. Mean Shift Algoritm är en metod som används för att räkna ut det rörliga objektets nya position i en efterföljande bild. För mitt arbete har endast delar av denna metod används till att bestämma vilka detektioner av objekt i de olika bilderna som representerar samma objekt. Genom att ta fram modeller för objekten i varje bild, vilka sedan jämförs, kan metoden bestämma vilka objekt som kan klassas som samma objekt. Den metod som har utvecklat i detta arbete gav bäst resultat när bakgrunden var statisk och objektets färg skiljde sig från bakgrunden. När bakgrunden blir mer dynamisk och komplex ökade mängden falska detektioner och för vissa videosekvenser misslyckas metoden att detektera hela objekten. Den andra delen av detta arbetes syfte var att använda resultatet från metoden för att utvärdera eye tracking-data. Den automatiska utvärderingen av ögonrörelsen smooth pursuit ger ett mått på hur bra ögat kan följa objekt som rör sig. För att utföra detta mäts avståndet mellan den punkt som ögat tittar på och det detekterade objektets centrum. Den automatiskt utvärderingen av smooth pursuit-rörelsen gav bäst resultat när objekten var små. För större objekt följer ögat inte nödvändigtvis objektets mittenpunkt utan istället någon annan del av objektet och metoden kan därför i dessa fall ge ett missvisande resultat. Detta arbete har inte resulterat i en färdig metod utan det finns många områden för förbättringar. Exempelvis skulle en estimering av kamerans rotationer förbättra resultaten. Utvärderingen av hur väl ögat följer rörliga objekt kan även utvecklas mer, genom att konturerna av objekten beräknades. På detta sätt skulle även avståndet mellan punkterna ögat tittar på och objektets area kunnat bestämmas. Både eye tracking och att detektera och följa rörliga objekt i filmer är idag aktiva forskningsområden och det finns alltså fortfarande mycket att utveckla i dessa områden. Syfte med detta arbete har varit att försöka utveckla en mer generell metod som kan fungera för olika typer av filmsekvenser

    Parametric tracking with spatial extraction across an array of cameras

    Get PDF
    Video surveillance is a rapidly growing area that has been fuelled by an increase in the concerns of security and safety in both public and private areas. With heighten security concerns, the utilization of video surveillance systems spread over a large area is becoming the norm. Surveillance of a large area requires a number of cameras to be deployed, which presents problems for human operators. In the surveillance of a large area, the need to monitor numerous screens makes an operator less effective in monitoring, observing or tracking groups or targets of interest. In such situations, the application of computer systems can prove highly effective in assisting human operators. The overall aim of this thesis was to investigate different methods for tracking a target across an array of cameras. This required a set of parameters to be identified that could be passed between cameras as the target moved in and out of the fields of view. Initial investigations focussed on identifying the most effective colour space to use. A normalized cross correlation method was used initially with a reference image to track the target of interest. A second method investigated the use of histogram similarity in tracking targets. In this instance a reference target’s histogram or pixel distribution was used as a means for tracking. Finally a method was investigated that used the relationship between colour regions that make up a whole target. An experimental method was developed that used the information between colour regions such as the vector and colour difference as a means for tracking a target. This method was tested on a single camera configuration and multiple camera configuration and shown to be effective. In addition to the experimental tracking method investigated, additional data can be extracted to estimate a spatial map of a target as the target of interest is tracked across an array of cameras. For each method investigated the experimental results are presented in this thesis and it has been demonstrated that minimal data exchange can be used in order to track a target across an array of cameras. In addition to tracking a target, the spatial position of the target of interest could be estimated as it moves across the array

    Resilient Infrastructure and Building Security

    Get PDF
    corecore