92 research outputs found

    Computer Vision Systems, Second International Workshop, ICVS 2001 Vancouver, Canada, July 7-8, 2001, Proceedings

    No full text

    Hand Gesture Interaction with Human-Computer

    Get PDF
    Hand gestures are an important modality for human computer interaction. Compared to many existing interfaces, hand gestures have the advantages of being easy to use, natural, and intuitive. Successful applications of hand gesture recognition include computer games control, human-robot interaction, and sign language recognition, to name a few. Vision-based recognition systems can give computers the capability of understanding and responding to hand gestures. The paper gives an overview of the field of hand gesture interaction with Human- Computer, and describes the early stages of a project about gestural command sets, an issue that has often been neglected. Currently we have built a first prototype for exploring the use of pieand marking menus in gesture-based interaction. The purpose is to study if such menus, with practice, could support the development of autonomous gestural command sets. The scenario is remote control of home appliances, such as TV sets and DVD players, which in the future could be extended to the more general scenario of ubiquitous computing in everyday situations. Some early observations are reported, mainly concerning problems with user fatigue and precision of gestures. Future work is discussed, such as introducing flow menus for reducing fatigue, and control menus for continuous control functions. The computer vision algorithms will also have to be developed further

    Visual Attention Mechanism for a Social Robot

    Get PDF
    This paper describes a visual perception system for a social robot. The central part of this system is an artificial attention mechanism that discriminates the most relevant information from all the visual information perceived by the robot. It is composed by three stages. At the preattentive stage, the concept of saliency is implemented based on ‘proto-objects’ [37]. From these objects, different saliency maps are generated. Then, the semiattentive stage identifies and tracks significant items according to the tasks to accomplish. This tracking process allows to implement the ‘inhibition of return’. Finally, the attentive stage fixes the field of attention to the most relevant object depending on the behaviours to carry out. Three behaviours have been implemented and tested which allow the robot to detect visual landmarks in an initially unknown environment, and to recognize and capture the upper-body motion of people interested in interact with it

    Visual Concept Detection in Images and Videos

    Get PDF
    The rapidly increasing proliferation of digital images and videos leads to a situation where content-based search in multimedia databases becomes more and more important. A prerequisite for effective image and video search is to analyze and index media content automatically. Current approaches in the field of image and video retrieval focus on semantic concepts serving as an intermediate description to bridge the “semantic gap” between the data representation and the human interpretation. Due to the large complexity and variability in the appearance of visual concepts, the detection of arbitrary concepts represents a very challenging task. In this thesis, the following aspects of visual concept detection systems are addressed: First, enhanced local descriptors for mid-level feature coding are presented. Based on the observation that scale-invariant feature transform (SIFT) descriptors with different spatial extents yield large performance differences, a novel concept detection system is proposed that combines feature representations for different spatial extents using multiple kernel learning (MKL). A multi-modal video concept detection system is presented that relies on Bag-of-Words representations for visual and in particular for audio features. Furthermore, a method for the SIFT-based integration of color information, called color moment SIFT, is introduced. Comparative experimental results demonstrate the superior performance of the proposed systems on the Mediamill and on the VOC Challenge. Second, an approach is presented that systematically utilizes results of object detectors. Novel object-based features are generated based on object detection results using different pooling strategies. For videos, detection results are assembled to object sequences and a shot-based confidence score as well as further features, such as position, frame coverage or movement, are computed for each object class. These features are used as additional input for the support vector machine (SVM)-based concept classifiers. Thus, other related concepts can also profit from object-based features. Extensive experiments on the Mediamill, VOC and TRECVid Challenge show significant improvements in terms of retrieval performance not only for the object classes, but also in particular for a large number of indirectly related concepts. Moreover, it has been demonstrated that a few object-based features are beneficial for a large number of concept classes. On the VOC Challenge, the additional use of object-based features led to a superior performance for the image classification task of 63.8% mean average precision (AP). Furthermore, the generalization capabilities of concept models are investigated. It is shown that different source and target domains lead to a severe loss in concept detection performance. In these cross-domain settings, object-based features achieve a significant performance improvement. Since it is inefficient to run a large number of single-class object detectors, it is additionally demonstrated how a concurrent multi-class object detection system can be constructed to speed up the detection of many object classes in images. Third, a novel, purely web-supervised learning approach for modeling heterogeneous concept classes in images is proposed. Tags and annotations of multimedia data in the WWW are rich sources of information that can be employed for learning visual concepts. The presented approach is aimed at continuous long-term learning of appearance models and improving these models periodically. For this purpose, several components have been developed: a crawling component, a multi-modal clustering component for spam detection and subclass identification, a novel learning component, called “random savanna”, a validation component, an updating component, and a scalability manager. Only a single word describing the visual concept is required to initiate the learning process. Experimental results demonstrate the capabilities of the individual components. Finally, a generic concept detection system is applied to support interdisciplinary research efforts in the field of psychology and media science. The psychological research question addressed in the field of behavioral sciences is, whether and how playing violent content in computer games may induce aggression. Therefore, novel semantic concepts most notably “violence” are detected in computer game videos to gain insights into the interrelationship of violent game events and the brain activity of a player. Experimental results demonstrate the excellent performance of the proposed automatic concept detection approach for such interdisciplinary research

    Vector Disparity Sensor with Vergence Control for Active Vision Systems

    Get PDF
    This paper presents an architecture for computing vector disparity for active vision systems as used on robotics applications. The control of the vergence angle of a binocular system allows us to efficiently explore dynamic environments, but requires a generalization of the disparity computation with respect to a static camera setup, where the disparity is strictly 1-D after the image rectification. The interaction between vision and motor control allows us to develop an active sensor that achieves high accuracy of the disparity computation around the fixation point, and fast reaction time for the vergence control. In this contribution, we address the development of a real-time architecture for vector disparity computation using an FPGA device. We implement the disparity unit and the control module for vergence, version, and tilt to determine the fixation point. In addition, two on-chip different alternatives for the vector disparity engines are discussed based on the luminance (gradient-based) and phase information of the binocular images. The multiscale versions of these engines are able to estimate the vector disparity up to 32 fps on VGA resolution images with very good accuracy as shown using benchmark sequences with known ground-truth. The performances in terms of frame-rate, resource utilization, and accuracy of the presented approaches are discussed. On the basis of these results, our study indicates that the gradient-based approach leads to the best trade-off choice for the integration with the active vision system

    ‘IMPLICIT CREATION’ – NON-PROGRAMMER CONCEPTUAL MODELS FOR AUTHORING IN INTERACTIVE DIGITAL STORYTELLING

    Get PDF
    Interactive Digital Storytelling (IDS) constitutes a research field that emerged from several areas of art, creation and computer science. It inquires technologies and possible artefacts that allow ‘highly-interactive’ experiences of digital worlds with compelling stories. However, the situation for story creators approaching ‘highly-interactive’ storytelling is complex. There is a gap between the available technology, which requires programming and prior knowledge in Artificial Intelligence, and established models of storytelling, which are too linear to have the potential to be highly interactive. This thesis reports on research that lays the ground for bridging this gap, leading to novel creation philosophies in future work. A design research process has been pursued, which centred on the suggestion of conceptual models, explaining a) process structures of interdisciplinary development, b) interactive story structures including the user of the interactive story system, and c) the positioning of human authors within semi-automated creative processes. By means of ‘implicit creation’, storytelling and modelling of simulated worlds are reconciled. The conceptual models are informed by exhaustive literature review in established neighbouring disciplines. These are a) creative principles in different storytelling domains, such as screenwriting, video game writing, role playing and improvisational theatre, b) narratological studies of story grammars and structures, and c) principles of designing interactive systems, in the areas of basic HCI design and models, discourse analysis in conversational systems, as well as game- and simulation design. In a case study of artefact building, the initial models have been put into practice, evaluated and extended. These artefacts are a) a conceived authoring tool (‘Scenejo’) for the creation of digital conversational stories, and b) the development of a serious game (‘The Killer Phrase Game’) as an application development. The study demonstrates how starting out from linear storytelling, iterative steps of ‘implicit creation’ can lead to more variability and interactivity in the designed interactive story. In the concrete case, the steps included abstraction of dialogues into conditional actions, and creating a dynamic world model of the conversation. This process and artefact can be used as a model illustrating non-programmer approaches to ‘implicit creation’ in a learning process. Research demonstrates that the field of Interactive Digital Storytelling still has to be further advanced until general creative principles can be fully established, which is a long-term endeavour, dependent upon environmental factors. It also requires further technological developments. The gap is not yet closed, but it can be better explained. The research results build groundwork for education of prospective authors. Concluding the thesis, IDS-specific creative principles have been proposed for evaluation in future work

    Languages of games and play: A systematic mapping study

    Get PDF
    Digital games are a powerful means for creating enticing, beautiful, educational, and often highly addictive interactive experiences that impact the lives of billions of players worldwide. We explore what informs the design and construction of good games to learn how to speed-up game development. In particular, we study to what extent languages, notations, patterns, and tools, can offer experts theoretical foundations, systematic techniques, and practical solutions they need to raise their productivity and improve the quality of games and play. Despite the growing number of publications on this topic there is currently no overview describing the state-of-the-art that relates research areas, goals, and applications. As a result, efforts and successes are often one-off, lessons learned go overlooked, language reuse remains minimal, and opportunities for collaboration and synergy are lost. We present a systematic map that identifies relevant publications and gives an overview of research areas and publication venues. In addition, we categorize research perspectives along common objectives, techniques, and approaches, illustrated by summaries of selected languages. Finally, we distill challenges and opportunities for future research and development
    corecore