65,284 research outputs found

    Event Detection in Eye-Tracking Data for Use in Applications with Dynamic Stimuli

    Get PDF
    This doctoral thesis has signal processing of eye-tracking data as its main theme. An eye-tracker is a tool used for estimation of the point where one is looking. Automatic algorithms for classification of different types of eye movements, so called events, form the basis for relating the eye-tracking data to cognitive processes during, e.g., reading a text or watching a movie. The problems with the algorithms available today are that there are few algorithms that can handle detection of events during dynamic stimuli and that there is no standardized procedure for how to evaluate the algorithms. This thesis comprises an introduction and four papers describing methods for detection of the most common types of eye movements in eye-tracking data and strategies for evaluation of such methods. The most common types of eye movements are fixations, saccades, and smooth pursuit movements. In addition to these eye movements, the event post-saccadic oscillations, (PSO), is considered. The eye-tracking data in this thesis are recorded using both high- and low-speed eye-trackers. The first paper presents a method for detection of saccades and PSO. The saccades are detected using the acceleration signal and three specialized criteria based on directional information. In order to detect PSO, the interval after each saccade is modeled and the parameters of the model are used to determine whether PSO are present or not. The algorithm was evaluated by comparing the detection results to manual annotations and to the detection results of the most recent PSO detection algorithm. The results show that the algorithm is in good agreement with annotations, and has better performance than the compared algorithm. In the second paper, a method for separation of fixations and smooth pursuit movements is proposed. In the intervals between the detected saccades/PSO, the algorithm uses different spatial scales of the position signal in order to separate between the two types of eye movements. The algorithm is evaluated by computing five different performance measures, showing both general and detailed aspects of the discrimination performance. The performance of the algorithm is compared to the performance of a velocity and dispersion based algorithm, (I-VDT), to the performance of an algorithm based on principle component analysis, (I-PCA), and to manual annotations by two experts. The results show that the proposed algorithm performs considerably better than the compared algorithms. In the third paper, a method based on eye-tracking signals from both eyes is proposed for improved separation of fixations and smooth pursuit movements. The method utilizes directional clustering of the eye-tracking signals in combination with binary filters taking both temporal and spatial aspects of the eye-tracking signal into account. The performance of the method is evaluated using a novel evaluation strategy based on automatically detected moving objects in the video stimuli. The results show that the use of binocular information for separation of fixations and smooth pursuit movements is advantageous in static stimuli, without impairing the algorithm's ability to detect smooth pursuit movements in video and moving dot stimuli. The three first papers in this thesis are based on eye-tracking signals recorded using a stationary eye-tracker, while the fourth paper uses eye-tracking signals recorded using a mobile eye-tracker. In mobile eye-tracking, the user is allowed to move the head and the body, which affects the recorded data. In the fourth paper, a method for compensation of head movements using an inertial measurement unit, (IMU), combined with an event detector for lower sampling rate data is proposed. The event detection is performed by combining information from the eye-tracking signals with information about objects extracted from the scene video of the mobile eye-tracker. The results show that by introducing head movement compensation and information about detected objects in the scene video in the event detector, improved classification can be achieved. In summary, this thesis proposes an entire methodological framework for robust event detection which performs better than previous methods when analyzing eye-tracking signals recorded during dynamic stimuli, and also provides a methodology for performance evaluation of event detection algorithms

    Text Extraction From Natural Scene: Methodology And Application

    Full text link
    With the popularity of the Internet and the smart mobile device, there is an increasing demand for the techniques and applications of image/video-based analytics and information retrieval. Most of these applications can benefit from text information extraction in natural scene. However, scene text extraction is a challenging problem to be solved, due to cluttered background of natural scene and multiple patterns of scene text itself. To solve these problems, this dissertation proposes a framework of scene text extraction. Scene text extraction in our framework is divided into two components, detection and recognition. Scene text detection is to find out the regions containing text from camera captured images/videos. Text layout analysis based on gradient and color analysis is performed to extract candidates of text strings from cluttered background in natural scene. Then text structural analysis is performed to design effective text structural features for distinguishing text from non-text outliers among the candidates of text strings. Scene text recognition is to transform image-based text in detected regions into readable text codes. The most basic and significant step in text recognition is scene text character (STC) prediction, which is multi-class classification among a set of text character categories. We design robust and discriminative feature representations for STC structure, by integrating multiple feature descriptors, coding/pooling schemes, and learning models. Experimental results in benchmark datasets demonstrate the effectiveness and robustness of our proposed framework, which obtains better performance than previously published methods. Our proposed scene text extraction framework is applied to 4 scenarios, 1) reading print labels in grocery package for hand-held object recognition; 2) combining with car detection to localize license plate in camera captured natural scene image; 3) reading indicative signage for assistant navigation in indoor environments; and 4) combining with object tracking to perform scene text extraction in video-based natural scene. The proposed prototype systems and associated evaluation results show that our framework is able to solve the challenges in real applications

    Selected techniques for vehicle tracking and assessment in wide area motion imagery

    Get PDF
    Title from PDF of title page (University of Missouri--Columbia, viewed on March 28, 2011).The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file.Thesis advisor: Dr. K. Palaniappan, and Dr. F. Bunyak.M.S. University of Missouri--Columbia 2010.Tracking of vehicles in wide area motion imagery (WAMI) is very important for civilian and military surveillance. Tracking in a dataset that is characterized by very large format video with an extremely wide field-of-view (covering few to tens of square miles), and with very minimal ground resolution (images taken at about 4000ft to 5000ft above ground) and with low frame rates (1-10 frames/ sec), is a very challenging job. This research describes some of the techniques and approaches taken towards developing a low frame rate automatic and assisted vehicle tracking system and also develops a performance evaluation system for low frame rate tracker. One approach that is taken on this challenging dataset is extracting roads from these images using the geo-registered property of the data. This makes the car detection algorithms using Bayesian approach run considerably faster and efficiently. Also, car tracking algorithms can use this apriori knowledge of roads. The car tracking algorithm using Camshift has been further modified/improved, customizing it to track cars better in this dataset. A performance evaluation system developed in this research can be used for measuring the performance improvement of the tracker as it advances over the coming years. It can also be used for parameter tuning. This performance evaluation system can be used for testing the tracker performance using two approaches, the approach using gaps and approach using tracklets. Both of these frameworks are developed using information theoretics measures and non-information theoretic measures.Includes bibliographical references

    High-level feature detection from video in TRECVid: a 5-year retrospective of achievements

    Get PDF
    Successful and effective content-based access to digital video requires fast, accurate and scalable methods to determine the video content automatically. A variety of contemporary approaches to this rely on text taken from speech within the video, or on matching one video frame against others using low-level characteristics like colour, texture, or shapes, or on determining and matching objects appearing within the video. Possibly the most important technique, however, is one which determines the presence or absence of a high-level or semantic feature, within a video clip or shot. By utilizing dozens, hundreds or even thousands of such semantic features we can support many kinds of content-based video navigation. Critically however, this depends on being able to determine whether each feature is or is not present in a video clip. The last 5 years have seen much progress in the development of techniques to determine the presence of semantic features within video. This progress can be tracked in the annual TRECVid benchmarking activity where dozens of research groups measure the effectiveness of their techniques on common data and using an open, metrics-based approach. In this chapter we summarise the work done on the TRECVid high-level feature task, showing the progress made year-on-year. This provides a fairly comprehensive statement on where the state-of-the-art is regarding this important task, not just for one research group or for one approach, but across the spectrum. We then use this past and on-going work as a basis for highlighting the trends that are emerging in this area, and the questions which remain to be addressed before we can achieve large-scale, fast and reliable high-level feature detection on video

    Evaluation campaigns and TRECVid

    Get PDF
    The TREC Video Retrieval Evaluation (TRECVid) is an international benchmarking activity to encourage research in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVid completed its fifth annual cycle at the end of 2005 and in 2006 TRECVid will involve almost 70 research organizations, universities and other consortia. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for shots from within a video corpus, automatic detection of a variety of semantic and low-level video features, shot boundary detection and the detection of story boundaries in broadcast TV news. This paper will give an introduction to information retrieval (IR) evaluation from both a user and a system perspective, highlighting that system evaluation is by far the most prevalent type of evaluation carried out. We also include a summary of TRECVid as an example of a system evaluation benchmarking campaign and this allows us to discuss whether such campaigns are a good thing or a bad thing. There are arguments for and against these campaigns and we present some of them in the paper concluding that on balance they have had a very positive impact on research progress

    Large scale evaluations of multimedia information retrieval: the TRECVid experience

    Get PDF
    Information Retrieval is a supporting technique which underpins a broad range of content-based applications including retrieval, filtering, summarisation, browsing, classification, clustering, automatic linking, and others. Multimedia information retrieval (MMIR) represents those applications when applied to multimedia information such as image, video, music, etc. In this presentation and extended abstract we are primarily concerned with MMIR as applied to information in digital video format. We begin with a brief overview of large scale evaluations of IR tasks in areas such as text, image and music, just to illustrate that this phenomenon is not just restricted to MMIR on video. The main contribution, however, is a set of pointers and a summarisation of the work done as part of TRECVid, the annual benchmarking exercise for video retrieval tasks
    • ā€¦
    corecore