1,107 research outputs found

    Event detection in field sports video using audio-visual features and a support vector machine

    Get PDF
    In this paper, we propose a novel audio-visual feature-based framework for event detection in broadcast video of multiple different field sports. Features indicating significant events are selected and robust detectors built. These features are rooted in characteristics common to all genres of field sports. The evidence gathered by the feature detectors is combined by means of a support vector machine, which infers the occurrence of an event based on a model generated during a training phase. The system is tested generically across multiple genres of field sports including soccer, rugby, hockey, and Gaelic football and the results suggest that high event retrieval and content rejection statistics are achievable

    A contrast-sensitive reversible visible image watermarking technique

    Get PDF
    A reversible (also called lossless, distortion-free, or invertible) visible watermarking scheme is proposed to satisfy the applications, in which the visible watermark is expected to combat copyright piracy but can be removed to losslessly recover the original image. We transparently reveal the watermark image by overlapping it on a user-specified region of the host image through adaptively adjusting the pixel values beneath the watermark, depending on the human visual system-based scaling factors. In order to achieve reversibility, a reconstruction/ recovery packet, which is utilized to restore the watermarked area, is reversibly inserted into non-visibly-watermarked region. The packet is established according to the difference image between the original image and its approximate version instead of its visibly watermarked version so as to alleviate its overhead. For the generation of the approximation, we develop a simple prediction technique that makes use of the unaltered neighboring pixels as auxiliary information. The recovery packet is uniquely encoded before hiding so that the original watermark pattern can be reconstructed based on the encoded packet. In this way, the image recovery process is carried out without needing the availability of the watermark. In addition, our method adopts data compression for further reduction in the recovery packet size and improvement in embedding capacity. The experimental results demonstrate the superiority of the proposed scheme compared to the existing methods

    Segmentation of the face and hands in sign language video sequences using color and motion cues

    Get PDF
    Copyright © 2004 IEEEWe present a hand and face segmentation methodology using color and motion cues for the content-based representation of sign language video sequences. The methodology consists of three stages: skin-color segmentation; change detection; face and hand segmentation mask generation. In skin-color segmentation, a universal color-model is derived and image pixels are classified as skin or nonskin based on their Mahalanobis distance. We derive a segmentation threshold for the classifier. The aim of change detection is to localize moving objects in a video sequences. The change detection technique is based on the F test and block-based motion estimation. Finally, the results from skin-color segmentation and change detection are analyzed to segment the face and hands. The performance of the algorithm is illustrated by simulations carried out on standard test sequences.Nariman Habili, Cheng Chew Lim, and Alireza Moin

    Construction and Evaluation of an Ultra Low Latency Frameless Renderer for VR.

    Get PDF
    © 2016 IEEE.Latency-the delay between a users action and the response to this action-is known to be detrimental to virtual reality. Latency is typically considered to be a discrete value characterising a delay, constant in time and space-but this characterisation is incomplete. Latency changes across the display during scan-out, and how it does so is dependent on the rendering approach used. In this study, we present an ultra-low latency real-time ray-casting renderer for virtual reality, implemented on an FPGA. Our renderer has a latency of 1 ms from tracker to pixel. Its frameless nature means that the region of the display with the lowest latency immediately follows the scan-beam. This is in contrast to frame-based systems such as those using typical GPUs, for which the latency increases as scan-out proceeds. Using a series of high and low speed videos of our system in use, we confirm its latency of 1 ms. We examine how the renderer performs when driving a traditional sequential scan-out display on a readily available HMO, the Oculus Rift OK2. We contrast this with an equivalent apparatus built using a GPU. Using captured human head motion and a set of image quality measures, we assess the ability of these systems to faithfully recreate the stimuli of an ideal virtual reality system-one with a zero latency tracker, renderer and display running at 1 kHz. Finally, we examine the results of these quality measures, and how each rendering approach is affected by velocity of movement and display persistence. We find that our system, with a lower average latency, can more faithfully draw what the ideal virtual reality system would. Further, we find that with low display persistence, the sensitivity to velocity of both systems is lowered, but that it is much lower for ours

    Studies in ambient intelligent lighting

    Get PDF
    The revolution in lighting we are arguably experiencing is led by technical developments in the area of solid state lighting technology. The improved lifetime, efficiency and environmentally friendly raw materials make LEDs the main contender for the light source of the future. The core of the change is, however, not in the basic technology, but in the way users interact with it and the way the quality of the produced effect on the environment is judged. With the new found freedom the users can switch their focus from the confines of the technology to the expression of their needs, regardless of the details of the lighting system. Identifying the user needs, creating an effective language to communicate them to the system, and translating them to control signals that fulfill them, as well as defining the means to measure the quality of the produced result are the topic of study of a new multidisciplinary area of study, Ambient Intelligent Lighting. This thesis describes a series of studies in the field of Ambient Intelligent Lighting, divided in two parts. The first part of the thesis demonstrates how, by adopting a user centric design philosophy, the traditional control paradigms can be superseded by novel, so-called effect driven controls. Chapter 3 describes an algorithm that, using statistical methods and image processing, generates a set of colors based on a term or set of terms. The algorithm uses Internet image search engines (Google Images, Flickr) to acquire a set of images that represent a term and subsequently extracts representative colors from the set. Additionally, an estimate of the quality of the extracted set of colors is computed. Based on the algorithm, a system that automatically enriches music with lyrics based images and lighting was built and is described. Chapter 4 proposes a novel effect driven control algorithm, enabling users easy, natural and system agnostic means to create a spatial light distribution. By using an emerging technology, visible light communication, and an intuitive effect definition, a real time interactive light design system was developed. Usability studies on a virtual prototype of the system demonstrated the perceived ease of use and increased efficiency of an effect driven approach. In chapter 5, using stochastic models, natural temporal light transitions are modeled and reproduced. Based on an example video of a natural light effect, a Markov model of the transitions between colors of a single light source representing the effect is learned. The model is a compact, easy to reproduce, and as the user studies show, recognizable representation of the original light effect. The second part of the thesis studies the perceived quality of one of the unique capabilities of LEDs, chromatic temporal transitions. Using psychophysical methods, existing spatial models of human color vision were found to be unsuitable for predicting the visibility of temporal artifacts caused by the digital controls. The chapters in this part demonstrate new perceptual effects and make the first steps towards building a temporal model of human color vision. In chapter 6 the perception of smoothness of digital light transitions is studied. The studies presented demonstrate the dependence of the visibility of digital steps in a temporal transition on the frequency of change, chromaticity, intensity and direction of change of the transition. Furthermore, a clear link between the visibility of digital steps and flicker visibility is demonstrated. Finally, a new, exponential law for the dependence of the threshold speed of smooth transitions on the changing frequency is hypothesized and proven in subsequent experiments. Chapter 7 studies the discrimination and preference of different color transitions between two colors. Due to memory effects, the discrimination threshold for complete transitions was shown to be larger than the discrimination threshold for two single colors. Two linear transitions in different color spaces were shown to be significantly preferred over a set of other, curved, transitions. Chapter 8 studies chromatic and achromatic flicker visibility in the periphery. A complex change of both the absolute visibility thresholds for different frequencies, as well as the critical flicker frequency is observed. Finally, an increase in the absolute visibility thresholds caused by an addition of a mental task in central vision is demonstrated

    Reflectance Transformation Imaging (RTI) System for Ancient Documentary Artefacts

    No full text
    This tutorial summarises our uses of reflectance transformation imaging in archaeological contexts. It introduces the UK AHRC funded project reflectance Transformation Imaging for Anciant Documentary Artefacts and demonstrates imaging methodologies

    Image quality assessment : utility, beauty, appearance

    Get PDF

    A zerotree wavelet video coder

    Full text link

    Computer vision methods for unconstrained gesture recognition in the context of sign language annotation

    Get PDF
    Cette thèse porte sur l'étude des méthodes de vision par ordinateur pour la reconnaissance de gestes naturels dans le contexte de l'annotation de la Langue des Signes. La langue des signes (LS) est une langue gestuelle développée par les sourds pour communiquer. Un énoncé en LS consiste en une séquence de signes réalisés par les mains, accompagnés d'expressions du visage et de mouvements du haut du corps, permettant de transmettre des informations en parallèles dans le discours. Même si les signes sont définis dans des dictionnaires, on trouve une très grande variabilité liée au contexte lors de leur réalisation. De plus, les signes sont souvent séparés par des mouvements de co-articulation. Cette extrême variabilité et l'effet de co-articulation représentent un problème important dans les recherches en traitement automatique de la LS. Il est donc nécessaire d'avoir de nombreuses vidéos annotées en LS, si l'on veut étudier cette langue et utiliser des méthodes d'apprentissage automatique. Les annotations de vidéo en LS sont réalisées manuellement par des linguistes ou experts en LS, ce qui est source d'erreur, non reproductible et extrêmement chronophage. De plus, la qualité des annotations dépend des connaissances en LS de l'annotateur. L'association de l'expertise de l'annotateur aux traitements automatiques facilite cette tâche et représente un gain de temps et de robustesse. Le but de nos recherches est d'étudier des méthodes de traitement d'images afin d'assister l'annotation des corpus vidéo: suivi des composantes corporelles, segmentation des mains, segmentation temporelle, reconnaissance de gloses. Au cours de cette thèse nous avons étudié un ensemble de méthodes permettant de réaliser l'annotation en glose. Dans un premier temps, nous cherchons à détecter les limites de début et fin de signe. Cette méthode d'annotation nécessite plusieurs traitements de bas niveau afin de segmenter les signes et d'extraire les caractéristiques de mouvement et de forme de la main. D'abord nous proposons une méthode de suivi des composantes corporelles robuste aux occultations basée sur le filtrage particulaire. Ensuite, un algorithme de segmentation des mains est développé afin d'extraire la région des mains même quand elles se trouvent devant le visage. Puis, les caractéristiques de mouvement sont utilisées pour réaliser une première segmentation temporelle des signes qui est par la suite améliorée grâce à l'utilisation de caractéristiques de forme. En effet celles-ci permettent de supprimer les limites de segmentation détectées en milieu des signes. Une fois les signes segmentés, on procède à l'extraction de caractéristiques visuelles pour leur reconnaissance en termes de gloses à l'aide de modèles phonologiques. Nous avons évalué nos algorithmes à l'aide de corpus internationaux, afin de montrer leur avantages et limitations. L'évaluation montre la robustesse de nos méthodes par rapport à la dynamique et le grand nombre d'occultations entre les différents membres. L'annotation résultante est indépendante de l'annotateur et représente un gain de robustese important.This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency
    corecore