351 research outputs found

    Attentive monitoring of multiple video streams driven by a Bayesian foraging strategy

    Full text link
    In this paper we shall consider the problem of deploying attention to subsets of the video streams for collating the most relevant data and information of interest related to a given task. We formalize this monitoring problem as a foraging problem. We propose a probabilistic framework to model observer's attentive behavior as the behavior of a forager. The forager, moment to moment, focuses its attention on the most informative stream/camera, detects interesting objects or activities, or switches to a more profitable stream. The approach proposed here is suitable to be exploited for multi-stream video summarization. Meanwhile, it can serve as a preliminary step for more sophisticated video surveillance, e.g. activity and behavior analysis. Experimental results achieved on the UCR Videoweb Activities Dataset, a publicly available dataset, are presented to illustrate the utility of the proposed technique.Comment: Accepted to IEEE Transactions on Image Processin

    Off-line Foveated Compression and Scene Perception: An Eye-Tracking Approach

    Get PDF
    With the continued growth of digital services offering storage and communication of pictorial information, the need to efficiently represent this information has become increasingly important, both from an information theoretic and a perceptual point of view. There has been a recent interest to design systems for efficient representation and compression of image and video data that take the features of the human visual system into account. One part of this thesis investigates whether knowledge about viewers' gaze positions as measured by an eye-tracker can be used to improve compression efficiency of digital video; regions not directly looked at by a number of previewers are lowpass filtered. This type of video manipulation is called off-line foveation. The amount of compression due to off-line foveation is assessed along with how it affects new viewers' gazing behavior as well as subjective quality. We found additional bitrate savings up to 50% (average 20%) due to off-line foveation prior to compression, without decreasing the subjective quality. In off-line foveation, it would be of great benefit to algorithmically predict where viewers look without having to perform eye-tracking measurements. In the first part of this thesis, new experimental paradigms combined with eye-tracking are used to understand the mechanisms behind gaze control during scene perception, thus investigating the prerequisites for such algorithms. Eye-movements are recorded from observers viewing contrast manipulated images depicting natural scenes under a neutral task. We report that image semantics, rather than the physical image content itself, largely dictates where people choose to look. Together with recent work on gaze prediction in video, the results in this thesis give only moderate support for successful applicability of algorithmic gaze prediction for off-line foveated video compression

    Content-prioritised video coding for British Sign Language communication.

    Get PDF
    Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people

    How Laminar Frontal Cortex and Basal Ganglia Circuits Interact to Control Planned and Reactive Saccades

    Full text link
    The basal ganglia and frontal cortex together allow animals to learn adaptive responses that acquire rewards when prepotent reflexive responses are insufficient. Anatomical studies show a rich pattern of interactions between the basal ganglia and distinct frontal cortical layers. Analysis of the laminar circuitry of the frontal cortex, together with its interactions with the basal ganglia, motor thalamus, superior colliculus, and inferotemporal and parietal cortices, provides new insight into how these brain regions interact to learn and perform complexly conditioned behaviors. A neural model whose cortical component represents the frontal eye fields captures these interacting circuits. Simulations of the neural model illustrate how it provides a functional explanation of the dynamics of 17 physiologically identified cell types found in these areas. The model predicts how action planning or priming (in cortical layers III and VI) is dissociated from execution (in layer V), how a cue may serve either as a movement target or as a discriminative cue to move elsewhere, and how the basal ganglia help choose among competing actions. The model simulates neurophysiological, anatomical, and behavioral data about how monkeys perform saccadic eye movement tasks, including fixation; single saccade, overlap, gap, and memory-guided saccades; anti-saccades; and parallel search among distractors.Defense Advanced Research Projects Agency and the Office of Naval Research (N00014-95-l-0409, N00014-92-J-1309, N00014-95-1-0657); National Science Foundation (IRI-97-20333)
    • …
    corecore