281 research outputs found

    Use of morphological filters in detection of flashes and other light events

    Get PDF
    Projecte final de carrera fet en col.laboraciĂł amb Thomson Corporate Research LabIn a collaboration agreement between the UPC and the Thomson Corporate Research Lab, represented by Joan Llach, the objective of this project is to detect local and global flash light events of different intensities in video sequences. Thomson has shown interest in using this kind of information to enable the application of techniques that exploit the characteristics of such events, thus expecting to improve the overall encoding efficiency. This study presents a broad definition for flash light events , and proposes the design and implementation of a flash detector in two steps; a first step of rough detection that uses morphologic filters in both spatial and temporal domains, and a second processing step that offers a much more refined result. In the first stage of the project the objectives were defined and a demo of the application of morphologic filters for the detector was presented. The second stage included a 6 months internship at the Thomson Corporate Research Lab in Princeton (NJ-USA) where the results refinement process was developed and implemente

    Crowd Scene Analysis in Video Surveillance

    Get PDF
    There is an increasing interest in crowd scene analysis in video surveillance due to the ubiquitously deployed video surveillance systems in public places with high density of objects amid the increasing concern on public security and safety. A comprehensive crowd scene analysis approach is required to not only be able to recognize crowd events and detect abnormal events, but also update the innate learning model in an online, real-time fashion. To this end, a set of approaches for Crowd Event Recognition (CER) and Abnormal Event Detection (AED) are developed in this thesis. To address the problem of curse of dimensionality, we propose a video manifold learning method for crowd event analysis. A novel feature descriptor is proposed to encode regional optical flow features of video frames, where adaptive quantization and binarization of the feature code are employed to improve the discriminant ability of crowd motion patterns. Using the feature code as input, a linear dimensionality reduction algorithm that preserves both the intrinsic spatial and temporal properties is proposed, where the generated low-dimensional video manifolds are conducted for CER and AED. Moreover, we introduce a framework for AED by integrating a novel incremental and decremental One-Class Support Vector Machine (OCSVM) with a sliding buffer. It not only updates the model in an online fashion with low computational cost, but also adapts to concept drift by discarding obsolete patterns. Furthermore, the framework has been improved by introducing Multiple Incremental and Decremental Learning (MIDL), kernel fusion, and multiple target tracking, which leads to more accurate and faster AED. In addition, we develop a framework for another video content analysis task, i.e., shot boundary detection. Specifically, instead of directly assessing the pairwise difference between consecutive frames over time, we propose to evaluate a divergence measure between two OCSVM classifiers trained on two successive frame sets, which is more robust to noise and gradual transitions such as fade-in and fade-out. To speed up the processing procedure, the two OCSVM classifiers are updated online by the MIDL proposed for AED. Extensive experiments on five benchmark datasets validate the effectiveness and efficiency of our approaches in comparison with the state of the art

    Robust and efficient techniques for automatic video segmentation.

    Get PDF
    by Lam Cheung Fai.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 174-179).Abstract also in Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Problem Definition --- p.2Chapter 1.2 --- Motivation --- p.5Chapter 1.3 --- Problems --- p.7Chapter 1.3.1 --- Illumination Changes and Motions in Videos --- p.7Chapter 1.3.2 --- Variations in Video Scene Characteristics --- p.8Chapter 1.3.3 --- High Complexity of Algorithms --- p.10Chapter 1.3.4 --- Heterogeneous Approaches to Video Segmentation --- p.10Chapter 1.4 --- Objectives and Approaches --- p.11Chapter 1.5 --- Organization of the Thesis --- p.13Chapter 2 --- Related Work --- p.15Chapter 2.1 --- Algorithms for Uncompressed Videos --- p.16Chapter 2.1.1 --- Pixel-based Method --- p.16Chapter 2.1.2 --- Histogram-based Method --- p.17Chapter 2.1.3 --- Motion-based Algorithms --- p.18Chapter 2.1.4 --- Color-ratio Based Algorithms --- p.18Chapter 2.2 --- Algorithms for Compressed Videos --- p.19Chapter 2.2.1 --- Algorithms based on JPEG Image Sequences --- p.19Chapter 2.2.2 --- Algorithms based on MPEG Videos --- p.20Chapter 2.2.3 --- Algorithms based on VQ Compressed Videos --- p.21Chapter 2.3 --- Frame Difference Analysis Methods --- p.21Chapter 2.3.1 --- Scene Cut Detection --- p.21Chapter 2.3.2 --- Gradual Transition Detection --- p.22Chapter 2.4 --- Speedup Techniques --- p.23Chapter 2.5 --- Other Approaches --- p.24Chapter 3 --- Analysis and Enhancement of Existing Algorithms --- p.25Chapter 3.1 --- Introduction --- p.25Chapter 3.2 --- Video Segmentation Algorithms --- p.26Chapter 3.2.1 --- Frame Difference Metrics --- p.26Chapter 3.2.2 --- Frame Difference Analysis Methods --- p.29Chapter 3.3 --- Analysis of Feature Extraction Algorithms --- p.30Chapter 3.3.1 --- Pair-wise pixel comparison --- p.30Chapter 3.3.2 --- Color histogram comparison --- p.34Chapter 3.3.3 --- Pair-wise block-based comparison of DCT coefficients --- p.38Chapter 3.3.4 --- Pair-wise pixel comparison of DC-images --- p.42Chapter 3.4 --- Analysis of Scene Change Detection Methods --- p.45Chapter 3.4.1 --- Global Threshold Method --- p.45Chapter 3.4.2 --- Sliding Window Method --- p.46Chapter 3.5 --- Enhancements and Modifications --- p.47Chapter 3.5.1 --- Histogram Equalization --- p.49Chapter 3.5.2 --- DD Method --- p.52Chapter 3.5.3 --- LA Method --- p.56Chapter 3.5.4 --- Modification for pair-wise pixel comparison --- p.57Chapter 3.5.5 --- Modification for pair-wise DCT block comparison --- p.61Chapter 3.6 --- Conclusion --- p.69Chapter 4 --- Color Difference Histogram --- p.72Chapter 4.1 --- Introduction --- p.72Chapter 4.2 --- Color Difference Histogram --- p.73Chapter 4.2.1 --- Definition of Color Difference Histogram --- p.73Chapter 4.2.2 --- Sparse Distribution of CDH --- p.76Chapter 4.2.3 --- Resolution of CDH --- p.77Chapter 4.2.4 --- CDH-based Inter-frame Similarity Measure --- p.77Chapter 4.2.5 --- Computational Cost and Discriminating Power --- p.80Chapter 4.2.6 --- Suitability in Scene Change Detection --- p.83Chapter 4.3 --- Insensitivity to Illumination Changes --- p.89Chapter 4.3.1 --- Sensitivity of CDH --- p.90Chapter 4.3.2 --- Comparison with other feature extraction algorithms --- p.93Chapter 4.4 --- Orientation and Motion Invariant --- p.96Chapter 4.4.1 --- Camera Movements --- p.97Chapter 4.4.2 --- Object Motion --- p.100Chapter 4.4.3 --- Comparison with other feature extraction algorithms --- p.100Chapter 4.5 --- Performance of Scene Cut Detection --- p.102Chapter 4.6 --- Time Complexity Comparison --- p.105Chapter 4.7 --- Extension to DCT-compressed Images --- p.106Chapter 4.7.1 --- Performance of scene cut detection --- p.108Chapter 4.8 --- Conclusion --- p.109Chapter 5 --- Scene Change Detection --- p.111Chapter 5.1 --- Introduction --- p.111Chapter 5.2 --- Previous Approaches --- p.112Chapter 5.2.1 --- Scene Cut Detection --- p.112Chapter 5.2.2 --- Gradual Transition Detection --- p.115Chapter 5.3 --- DD Method --- p.116Chapter 5.3.1 --- Detecting Scene Cuts --- p.117Chapter 5.3.2 --- Detecting 1-frame Transitions --- p.121Chapter 5.3.3 --- Detecting Gradual Transitions --- p.129Chapter 5.4 --- Local Thresholding --- p.131Chapter 5.5 --- Experimental Results --- p.134Chapter 5.5.1 --- Performance of CDH+DD and CDH+DL --- p.135Chapter 5.5.2 --- Performance of DD on other features --- p.144Chapter 5.6 --- Conclusion --- p.150Chapter 6 --- Motion Vector Based Approach --- p.151Chapter 6.1 --- Introduction --- p.151Chapter 6.2 --- Previous Approaches --- p.152Chapter 6.3 --- MPEG-I Video Stream Format --- p.153Chapter 6.4 --- Derivation of Frame Differences from Motion Vector Counts --- p.156Chapter 6.4.1 --- Types of Frame Pairs --- p.156Chapter 6.4.2 --- Conditions for Scene Changes --- p.157Chapter 6.4.3 --- Frame Difference Measure --- p.159Chapter 6.5 --- Experiment --- p.160Chapter 6.5.1 --- Performance of MV --- p.161Chapter 6.5.2 --- Performance Enhancement --- p.162Chapter 6.5.3 --- Limitations --- p.163Chapter 6.6 --- Conclusion --- p.164Chapter 7 --- Conclusion and Future Work --- p.165Chapter 7.1 --- Contributions --- p.165Chapter 7.2 --- Future Work --- p.169Chapter 7.3 --- Conclusion --- p.171Bibliography --- p.174Chapter A --- Sample Videos --- p.180Chapter B --- List of Abbreviations --- p.18

    Visually tracked flashlights as interaction devices

    Get PDF
    This thesis examines the feasibility, development and deployment of visually tracked flashlights as interaction devices. Flashlights are cheap, robust and fun. Most people from adults to children of an early age are familiar with flashlights and can use them to search for, select and illuminate objects and features of interest. Flashlights are available in many shapes, sizes, weights and mountings. Flashlights are particularly appropriate to situations where visitors explore dark places such as the caves, tunnels, cellars and dungeons that can be found in museums, theme parks and other visitor attractions. Techniques are developed by which the location and identity of flashlight projections are recovered from the image sequence supplied by a fixed camera monitoring a target surface. The information recovered is used to trigger audiovisual events in response to users' actions. Early trials with three prototype systems, each built using existing techniques in computer vision, show flashlight interfaces to be feasible both technically and from a usability point of view. Novel methods are developed which allow extraction of descriptions of flashlight projections that are independent of the reflectance of the underlying physical surface. Those descriptions are used to locate and recognise individual flashlights and support a multi-user interface technology. The methods developed form the basis of Enlighten, a software product marketed by the University of Nottingham spinoff company Visible Interactions Ltd. Enlighten is currently is daily use at four sites across the UK. Two patents have been filed (UK Patent Publication Number GB2411957 and US Patent Application Number 10/540,498). The UK patent has been granted, and the US application is under review

    Detection and elimination of rock face vegetation from terrestrial LIDAR data using the virtual articulating conical probe algorithm

    Get PDF
    A common use of terrestrial lidar is to conduct studies involving change detection of natural or engineered surfaces. Change detection involves many technical steps beyond the initial data acquisition: data structuring, registration, and elimination of data artifacts such as parallax errors, near-field obstructions, and vegetation. Of these, vegetation detection and elimination with terrestrial lidar scanning (TLS) presents a completely different set of issues when compared to vegetation elimination from aerial lidar scanning (ALS). With ALS, the ground footprint of the lidar laser beam is very large, and the data acquisition hardware supports multi-return waveforms. Also, the underlying surface topography is relatively smooth compared to the overlying vegetation which has a high spatial frequency. On the other hand, with most TLS systems, the width of the lidar laser beam is very small, and the data acquisition hardware supports only first-return signals. For the case where vegetation is covering a rock face, the underlying rock surface is not smooth because rock joints and sharp block edges have a high spatial frequency very similar to the overlying vegetation. Traditional ALS approaches to eliminate vegetation take advantage of the contrast in spatial frequency between the underlying ground surface and the overlying vegetation. When the ALS approach is used on vegetated rock faces, the algorithm, as expected, eliminates the vegetation, but also digitally erodes the sharp corners of the underlying rock. A new method that analyzes the slope of a surface along with relative depth and contiguity information is proposed as a way of differentiating high spatial frequency vegetative cover from similar high spatial frequency rock surfaces. This method, named the Virtual Articulating Conical Probe (VACP) algorithm, offers a solution for detection and elimination of rock face vegetation from TLS point cloud data while not affecting the geometry of the underlying rock surface. Such a tool could prove invaluable to the geotechnical engineer for quantifying rates of vertical-face rock loss that impact civil infrastructure safety --Abstract, page iii

    High dynamic range imaging for the detection of motion.

    Get PDF
    High dynamic range imaging involves imaging at a bit depth higher than the typical 8-12 bits offered by standard video equipment. We propose a method of imaging a scene at high dynamic range, 14+ bits, to detect motion correlated with changes in the measured optical signal. Features within a scene, namely edges, can be tracked through a time sequence and produce a modulation in light levels associated with the edge moving across a region being sampled by the detector. The modulation in the signal is analyzed and a model is proposed that allows for an absolute measurement of the displacement of an edge. In addition, turbulence present in the received optical path produces a modulation in the received signal that can be directly related to the various turbulent eddy sizes. These features, present in the low frequency portion of the spectrum, are correlated to specific values for a relative measurement of the turbulence intensity. In some cases a single element sensor is used for a measurement at a single point. Video technology is also utilized to produce simultaneous measurements across the entire scene. Several applications are explored and the results discussed. Key applications include: the use of this technique to analyze the motions of bridges for the assessment of structural health, noncontact methods of measuring the blood pulse waveform and respiration rate of an individual(s), and the imaging of turbulence, including clear air turbulence, for relative values of intensity. Resonant frequencies of bridges can be measured with this technique as well as eddies formed from turbulent flow

    Feedback-Based Gameplay Metrics and Gameplay Performance Segmentation: An audio-visual approach for assessing player experience.

    Get PDF
    Gameplay metrics is a method and approach that is growing in popularity amongst the game studies research community for its capacity to assess players’ engagement with game systems. Yet, little has been done, to date, to quantify players’ responses to feedback employed by games that conveys information to players, i.e., their audio-visual streams. The present thesis introduces a novel approach to player experience assessment - termed feedback-based gameplay metrics - which seeks to gather gameplay metrics from the audio-visual feedback streams presented to the player during play. So far, gameplay metrics - quantitative data about a game state and the player's interaction with the game system - are directly logged via the game's source code. The need to utilise source code restricts the range of games that researchers can analyse. By using computer science algorithms for audio-visual processing, yet to be employed for processing gameplay footage, the present thesis seeks to extract similar metrics through the audio-visual streams, thus circumventing the need for access to, whilst also proposing a method that focuses on describing the way gameplay information is broadcast to the player during play. In order to operationalise feedback-based gameplay metrics, the present thesis introduces the concept of gameplay performance segmentation which describes how coherent segments of play can be identified and extracted from lengthy game play sessions. Moreover, in order to both contextualise the method for processing metrics and provide a conceptual framework for analysing the results of a feedback-based gameplay metric segmentation, a multi-layered architecture based on five gameplay concepts (system, game world instance, spatial-temporal, degree of freedom and interaction) is also introduced. Finally, based on data gathered from game play sessions with participants, the present thesis discusses the validity of feedback-based gameplay metrics, gameplay performance segmentation and the multi-layered architecture. A software system has also been specifically developed to produce gameplay summaries based on feedback-based gameplay metrics, and examples of summaries (based on several games) are presented and analysed. The present thesis also demonstrates that feedback-based gameplay metrics can be conjointly analysed with other forms of data (such as biometry) in order to build a more complete picture of game play experience. Feedback based game-play metrics constitutes a post-processing approach that allows the researcher or analyst to explore the data however they wish and as many times as they wish. The method is also able to process any audio-visual file, and can therefore process material from a range of audio-visual sources. This novel methodology brings together game studies and computer sciences by extending the range of games that can now be researched but also to provide a viable solution accounting for the exact way players experience games

    An object-based approach to retrieval of image and video content

    Get PDF
    Promising new directions have been opened up for content-based visual retrieval in recent years. Object-based retrieval which allows users to manipulate video objects as part of their searching and browsing interaction, is one of these. It is the purpose of this thesis to constitute itself as a part of a larger stream of research that investigates visual objects as a possible approach to advancing the use of semantics in content-based visual retrieval. The notion of using objects in video retrieval has been seen as desirable for some years, but only very recently has technology started to allow even very basic object-location functions on video. The main hurdles to greater use of objects in video retrieval are the overhead of object segmentation on large amounts of video and the issue of whether objects can actually be used efficiently for multimedia retrieval. Despite this, there are already some examples of work which supports retrieval based on video objects. This thesis investigates an object-based approach to content-based visual retrieval. The main research contributions of this work are a study of shot boundary detection on compressed domain video where a fast detection approach is proposed and evaluated, and a study on the use of objects in interactive image retrieval. An object-based retrieval framework is developed in order to investigate object-based retrieval on a corpus of natural image and video. This framework contains the entire processing chain required to analyse, index and interactively retrieve images and video via object-to-object matching. The experimental results indicate that object-based searching consistently outperforms image-based search using low-level features. This result goes some way towards validating the approach of allowing users to select objects as a basis for searching video archives when the information need dictates it as appropriate

    Show and Tell: Photography, Film and Literary Naturalism in Late Nineteenth Century America.

    Full text link
    This dissertation traces the vexing complexity of the relationship between naturalist literature of the American 1890s and emerging technologies of visual representation: although photographic representation was frequently invoked as naturalism’s ideal model, the novels I consider are better characterized as skeptical interrogators, rather than as eager imitators, of photographic accuracy. In Frank Norris’s McTeague, for example, narrative shifts between intimacy and distance mirror the promise and threat epitomized by the practice of documentary photographers like Jacob Riis. Riis’s photography represented an ideal of accuracy, but McTeague’s narrator ultimately rejects it: the threat of becoming complicit in reprehensible acts or of becoming indistinguishable from their perpetrators, proves too great a risk for the reward of producing a photographically accurate representation. Stephen Crane’s The Monster offers a similarly skeptical assessment of the potential of narrative mediation, one that I read through a comparative consideration of Crane’s novella and the practice of the moving picture lecturers of the 1890s. I argue that these lecturers—poised between a didactic culture of genteel entertainments and an emerging culture of film that was more interested in thrilling than in edifying its audience—embodied the kind of paradoxical mediation that complicates any moral or ethical interpretation of The Monster. In Henry James’s What Maisie Knew and Kate Chopin’s The Awakening, photography and film model representational modes that seem better suited to the hapless characters of these novels than to their knowledgeable narrators. What Maisie Knew articulates the possibility of a naturalist brute mastering her own story. In The Awakening, Edna Pontellier represents the unique accuracy of the embodied, subjective experience of individual subjects, an experience figured by the novel’s soundscape of music and noise. This soundscape functions much like those of early films, in which the sounds of so-called “silent” films often indexed a reality that exceeded the limits of the images playing on the screen. My dissertation urges us to take naturalism seriously, as an insightful commentator on a social world in which everything, it seems, can be made readily available for our viewing pleasure.Ph.D.English Language & LiteratureUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91445/1/berkleya_1.pd

    Augmented Reality Ultrasound Guidance in Anesthesiology

    Get PDF
    Real-time ultrasound has become a mainstay in many image-guided interventions and increasingly popular in several percutaneous procedures in anesthesiology. One of the main constraints of ultrasound-guided needle interventions is identifying and distinguishing the needle tip from needle shaft in the image. Augmented reality (AR) environments have been employed to address challenges surrounding surgical tool visualization, navigation, and positioning in many image-guided interventions. The motivation behind this work was to explore the feasibility and utility of such visualization techniques in anesthesiology to address some of the specific limitations of ultrasound-guided needle interventions. This thesis brings together the goals, guidelines, and best development practices of functional AR ultrasound image guidance (AR-UIG) systems, examines the general structure of such systems suitable for applications in anesthesiology, and provides a series of recommendations for their development. The main components of such systems, including ultrasound calibration and system interface design, as well as applications of AR-UIG systems for quantitative skill assessment, were also examined in this thesis. The effects of ultrasound image reconstruction techniques, as well as phantom material and geometry on ultrasound calibration, were investigated. Ultrasound calibration error was reduced by 10% with synthetic transmit aperture imaging compared with B-mode ultrasound. Phantom properties were shown to have a significant effect on calibration error, which is a variable based on ultrasound beamforming techniques. This finding has the potential to alter how calibration phantoms are designed cognizant of the ultrasound imaging technique. Performance of an AR-UIG guidance system tailored to central line insertions was evaluated in novice and expert user studies. While the system outperformed ultrasound-only guidance with novice users, it did not significantly affect the performance of experienced operators. Although the extensive experience of the users with ultrasound may have affected the results, certain aspects of the AR-UIG system contributed to the lackluster outcomes, which were analyzed via a thorough critique of the design decisions. The application of an AR-UIG system in quantitative skill assessment was investigated, and the first quantitative analysis of needle tip localization error in ultrasound in a simulated central line procedure, performed by experienced operators, is presented. Most participants did not closely follow the needle tip in ultrasound, resulting in 42% unsuccessful needle placements and a 33% complication rate. Compared to successful trials, unsuccessful procedures featured a significantly greater (p=0.04) needle-tip to image-plane distance. Professional experience with ultrasound does not necessarily lead to expert level performance. Along with deliberate practice, quantitative skill assessment may reinforce clinical best practices in ultrasound-guided needle insertions. Based on the development guidelines, an AR-UIG system was developed to address the challenges in ultrasound-guided epidural injections. For improved needle positioning, this system integrated A-mode ultrasound signal obtained from a transducer housed at the tip of the needle. Improved needle navigation was achieved via enhanced visualization of the needle in an AR environment, in which B-mode and A-mode ultrasound data were incorporated. The technical feasibility of the AR-UIG system was evaluated in a preliminary user study. The results suggested that the AR-UIG system has the potential to outperform ultrasound-only guidance
    • 

    corecore