1,488 research outputs found
A distributed camera system for multi-resolution surveillance
We describe an architecture for a multi-camera, multi-resolution surveillance system. The aim is to support a set of distributed static and pan-tilt-zoom (PTZ) cameras and visual tracking algorithms, together with a central supervisor unit. Each camera (and possibly pan-tilt device) has a dedicated process and processor.
Asynchronous interprocess communications and archiving of data are achieved in a simple and effective way via a central repository, implemented using an SQL database.
Visual tracking data from static views are stored dynamically into tables in the database via client calls to the SQL server. A supervisor process running on the SQL server determines if active zoom cameras should be dispatched to observe a particular target, and this message is effected via writing demands into another database table.
We show results from a real implementation of the system comprising one static camera overviewing the environment under consideration and a PTZ camera operating
under closed-loop velocity control, which uses a fast and robust level-set-based region tracker. Experiments demonstrate the effectiveness of our approach and its feasibility to multi-camera systems for intelligent surveillance
Cognitive visual tracking and camera control
Cognitive visual tracking is the process of observing and understanding the behaviour of a moving person. This paper presents an efficient solution to extract, in real-time, high-level information from an observed scene, and generate the most appropriate commands for a set of pan-tilt-zoom (PTZ) cameras in a surveillance scenario. Such a high-level feedback control loop, which is the main novelty of our work, will serve to reduce uncertainties in the observed scene and to maximize the amount of information extracted from it. It is implemented with a distributed camera system using SQL tables as virtual communication channels, and Situation Graph Trees for knowledge representation, inference and high-level camera control. A set of experiments in a surveillance scenario show the effectiveness of our approach and its potential for real applications of cognitive vision
Tracking and modeling focus of attention in meetings [online]
Abstract
This thesis addresses the problem of tracking the focus of
attention of people. In particular, a system to track the focus
of attention of participants in meetings is developed. Obtaining
knowledge about a person\u27s focus of attention is an important
step towards a better understanding of what people do, how and
with what or whom they interact or to what they refer. In
meetings, focus of attention can be used to disambiguate the
addressees of speech acts, to analyze interaction and for
indexing of meeting transcripts. Tracking a user\u27s focus of
attention also greatly contributes to the improvement of
humanĀcomputer interfaces since it can be used to build interfaces
and environments that become aware of what the user is paying
attention to or with what or whom he is interacting.
The direction in which people look; i.e., their gaze, is closely
related to their focus of attention. In this thesis, we estimate
a subject\u27s focus of attention based on his or her head
orientation. While the direction in which someone looks is
determined by head orientation and eye gaze, relevant literature
suggests that head orientation alone is a su#cient cue for the
detection of someone\u27s direction of attention during social
interaction. We present experimental results from a user study
and from several recorded meetings that support this hypothesis.
We have developed a Bayesian approach to model at whom or what
someone is lookĀ ing based on his or her head orientation. To
estimate head orientations in meetings, the participants\u27 faces
are automatically tracked in the view of a panoramic camera and
neural networks are used to estimate their head orientations
from preĀprocessed images of their faces. Using this approach,
the focus of attention target of subjects could be correctly
identified during 73% of the time in a number of evaluation meetĀ
ings with four participants.
In addition, we have investigated whether a person\u27s focus of
attention can be preĀdicted from other cues. Our results show
that focus of attention is correlated to who is speaking in a
meeting and that it is possible to predict a person\u27s focus of
attention
based on the information of who is talking or was talking before
a given moment.
We have trained neural networks to predict at whom a person is
looking, based on information about who was speaking. Using this
approach we were able to predict who is looking at whom with 63%
accuracy on the evaluation meetings using only information about
who was speaking. We show that by using both head orientation
and speaker information to estimate a person\u27s focus, the
accuracy of focus detection can be improved compared to just
using one of the modalities for focus estimation.
To demonstrate the generality of our approach, we have built a
prototype system to demonstrate focusĀaware interaction with a
household robot and other smart appliances in a room using the
developed components for focus of attention tracking. In the
demonstration environment, a subject could interact with a
simulated household robot, a speechĀenabled VCR or with other
people in the room, and the recipient of the subject\u27s speech
was disambiguated based on the user\u27s direction of attention.
Zusammenfassung
Die vorliegende Arbeit beschƤftigt sich mit der automatischen
Bestimmung und VerĀfolgung des Aufmerksamkeitsfokus von Personen
in Besprechungen.
Die Bestimmung des Aufmerksamkeitsfokus von Personen ist zum
VerstƤndnis und zur automatischen Auswertung von
Besprechungsprotokollen sehr wichtig. So kann damit
beispielsweise herausgefunden werden, wer zu einem bestimmten
Zeitpunkt wen angesprochen hat beziehungsweise wer wem zugehƶrt
hat. Die automatische BestimĀmung des Aufmerksamkeitsfokus kann
desweiteren zur Verbesserung von Mensch-MaschineĀSchnittstellen
benutzt werden.
Ein wichtiger Hinweis auf die Richtung, in welche eine Person
ihre Aufmerksamkeit richtet, ist die Kopfstellung der Person.
Daher wurde ein Verfahren zur Bestimmung der Kopfstellungen von
Personen entwickelt. Hierzu wurden kĆ¼nstliche neuronale Netze
benutzt, welche als Eingaben vorverarbeitete Bilder des Kopfes
einer Person erhalten, und als Ausgabe eine SchƤtzung der
Kopfstellung berechnen. Mit den trainierten Netzen wurde auf
Bilddaten neuer Personen, also Personen, deren Bilder nicht in
der Trainingsmenge enthalten waren, ein mittlerer Fehler von
neun bis zehn Grad fĆ¼r die Bestimmung der horizontalen und
vertikalen Kopfstellung erreicht.
Desweiteren wird ein probabilistischer Ansatz zur Bestimmung von
AufmerksamkeitsĀzielen vorgestellt. Es wird hierbei ein
Bayes\u27scher Ansatzes verwendet um die AĀposterior
iWahrscheinlichkeiten verschiedener Aufmerksamkteitsziele,
gegeben beobachteter Kopfstellungen einer Person, zu bestimmen.
Die entwickelten AnsƤtze wurden auf mehren Besprechungen mit
vier bis fĆ¼nf Teilnehmern evaluiert.
Ein weiterer Beitrag dieser Arbeit ist die Untersuchung,
inwieweit sich die BlickrichĀtung der Besprechungsteilnehmer
basierend darauf, wer gerade spricht, vorhersagen lƤĆt. Es wurde
ein Verfahren entwickelt um mit Hilfe von neuronalen Netzen den
Fokus einer Person basierend auf einer kurzen Historie der
Sprecherkonstellationen zu schƤtzen.
Wir zeigen, dass durch Kombination der bildbasierten und der
sprecherbasierten SchƤtzung des Aufmerksamkeitsfokus eine
deutliche verbesserte SchƤtzung erreicht werden kann.
Insgesamt wurde mit dieser Arbeit erstmals ein System
vorgestellt um automatisch die Aufmerksamkeit von Personen in
einem Besprechungsraum zu verfolgen.
Die entwickelten AnsƤtze und Methoden kƶnnen auch zur Bestimmung
der AufmerkĀsamkeit von Personen in anderen Bereichen,
insbesondere zur Steuerung von computĀerisierten, interaktiven
Umgebungen, verwendet werden. Dies wird an einer
Beispielapplikation gezeigt
WATCHING PEOPLE: ALGORITHMS TO STUDY HUMAN MOTION AND ACTIVITIES
Nowadays human motion analysis is one of the most active research topics in Computer Vision and it is receiving an increasing attention from both the industrial and scientific communities.
The growing interest in human motion analysis is motivated by the increasing number of promising applications, ranging from surveillance, humanācomputer interaction, virtual reality to healthcare, sports, computer games and video conferencing, just to name a few.
The aim of this thesis is to give an overview of the various tasks involved in visual motion analysis of the human body and to present the issues and possible solutions related to it.
In this thesis, visual motion analysis is categorized into three major areas related to the interpretation of human motion: tracking of human motion using virtual pan-tilt-zoom (vPTZ) camera, recognition of human motions and human behaviors segmentation.
In the field of human motion tracking, a virtual environment for PTZ cameras (vPTZ) is presented to overcame the mechanical limitations of PTZ cameras. The vPTZ is built on equirectangular images acquired by 360Ā° cameras and it allows not only the development of pedestrian tracking algorithms but also the comparison of their performances. On the basis of this virtual environment, three novel pedestrian tracking algorithms for 360Ā° cameras were developed, two of which adopt a tracking-by-detection approach while the last adopts a Bayesian approach.
The action recognition problem is addressed by an algorithm that represents actions in terms of multinomial distributions of frequent sequential patterns of different length. Frequent sequential patterns are series of data descriptors that occur many times in the data. The proposed method learns a codebook of frequent sequential patterns by means of an apriori-like algorithm. An action is then represented with a Bag-of-Frequent-Sequential-Patterns approach.
In the last part of this thesis a methodology to semi-automatically annotate behavioral data given a small set of manually annotated data is presented. The resulting methodology is not only effective in the semi-automated annotation task but can also be used in presence of abnormal behaviors, as demonstrated empirically by testing the system on data collected from children affected by neuro-developmental disorders
Real-Time, Multiple Pan/Tilt/Zoom Computer Vision Tracking and 3D Positioning System for Unmanned Aerial System Metrology
The study of structural characteristics of Unmanned Aerial Systems (UASs) continues to be an important field of research for developing state of the art nano/micro systems. Development of a metrology system using computer vision (CV) tracking and 3D point extraction would provide an avenue for making these theoretical developments. This work provides a portable, scalable system capable of real-time tracking, zooming, and 3D position estimation of a UAS using multiple cameras. Current state-of-the-art photogrammetry systems use retro-reflective markers or single point lasers to obtain object poses and/or positions over time. Using a CV pan/tilt/zoom (PTZ) system has the potential to circumvent their limitations. The system developed in this paper exploits parallel-processing and the GPU for CV-tracking, using optical flow and known camera motion, in order to capture a moving object using two PTU cameras. The parallel-processing technique developed in this work is versatile, allowing the ability to test other CV methods with a PTZ system using known camera motion. Utilizing known camera poses, the object\u27s 3D position is estimated and focal lengths are estimated for filling the image to a desired amount. This system is tested against truth data obtained using an industrial system
Automatic Generation of Video Summaries for Historical Films
A video summary is a sequence of video clips extracted from a longer video. Much shorter than the original, the summary preserves its essential messages. In the project ECHO (European Chronicles On-line) a system was developed to store and manage large collections of historical films for the preservation of cultural heritage. At the University of Mannheim we have developed the video summarization component of the ECHO system. In this paper we discuss the particular challenges the historical film material poses, and how we have designed new video processing algorithms and modified existing ones to cope with noisy black-and-white films. We also report empirical results from the use of our summarization tool at the four major European national video archives
- ā¦