4 research outputs found

    Vision systems with the human in the loop

    Get PDF
    The emerging cognitive vision paradigm deals with vision systems that apply machine learning and automatic reasoning in order to learn from what they perceive. Cognitive vision systems can rate the relevance and consistency of newly acquired knowledge, they can adapt to their environment and thus will exhibit high robustness. This contribution presents vision systems that aim at flexibility and robustness. One is tailored for content-based image retrieval, the others are cognitive vision systems that constitute prototypes of visual active memories which evaluate, gather, and integrate contextual knowledge for visual analysis. All three systems are designed to interact with human users. After we will have discussed adaptive content-based image retrieval and object and action recognition in an office environment, the issue of assessing cognitive systems will be raised. Experiences from psychologically evaluated human-machine interactions will be reported and the promising potential of psychologically-based usability experiments will be stressed

    Combining Speech and Haptics for Intuitive and Efficient Navigation through Image Databases

    No full text
    KƤster T, Pfeiffer M, Bauckhage C, Sagerer G. Combining Speech and Haptics for Intuitive and Efficient Navigation through Image Databases. In: Proc. International Conference on Multimodal Interfaces (ICMIā€™03). ACM; 2003: 180-187.Given the size of todays professional image databases, the standard approach to object- or theme-related image retrieval is to interactively navigate through the content. But as most users of such databases are designers or artists who do not have a technical background, navigation interfaces must be intuitive to use and easy to learn. This paper reports on efforts towards this goal. We present a system for intuitive image retrieval that features different modalities for interaction. Apart from conventional input devices like mouse or keyboard it is also possible to use speech or haptic gesture to indicate what kind of images one is looking for. Seeing a selection of images on the screen, the user provides relevance feedback to narrow the choice of motifs presented next. This is done either by scoring whole images or by choosing certain image regions. In order to derive consistent reactions from multimodal user input, asynchronous integration of modalities and probabilistic reasoning based on Bayesian networks are applied. After addressing technical details, we will discuss a series of usability experiments, which we conducted to examine the impact of multimodal input facilities on interactive image retrieval. The results indicate that users appreciate multimodality. While we observed little decrease in task performance, measures of contentment exceeded those for conventional input devices

    Flexible photo retrieval (FlexPhoReS) : a prototype for multimodel personal digital photo retrieval

    Get PDF
    Digital photo technology is developing rapidly and is motivating more people to have large personal collections of digital photos. However, effective and fast retrieval of digital photos is not always easy, especially when the collections grow into thousands. World Wide Web (WWW) is one of the platforms that allows digital photo users to publish a collection of photos in a centralised and organised way. Users typically find their photos by searching or browsing uSing a keyboard and mouse. Also in development at the moment are alternative user interfaces such as graphical user interfaces with speech (S/GUI) and other multimodal user interfaces which offer more flexibility to users. The aim of this research was to design and evaluate a flexible user interface for a web based personal digital photo retrieval system. A model of a flexible photo retrieval system (FlexPhoReS) was developed based on a review of the literature and a small-scale user study. A prototype, based on the model, was built using MATLAB and WWW technology. FlexPhoReS is a web based personal digital photo retrieval prototype that enables digital photo users to . accomplish photo retrieval tasks (browsing, keyword and visual example searching (CBI)) using either mouse and keyboard input modalities or mouse and speech input modalities. An evaluation with 20 digital photo users was conducted using usability testing methods. The result showed that there was a significant difference in search performance between using mouse and keyboard input modalities and using mouse and speech input modalities. On average, the reduction in search performance time due to using mouse and speech input modalities was 37.31%. Participants were also significantly more satisfied with mouse and speech input modalities than with mouse and keyboard input modalities although they felt that both were complementary. This research demonstrated that the prototype was successful in providing a flexible model of the photo retrieval process by offering alternative input modalities through a multimodal user interface in the World Wide Web environment.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore