305 research outputs found

    Discriminatively Trained Latent Ordinal Model for Video Classification

    Full text link
    We study the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF -- it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1604.0150

    Dimensionality reduction and sparse representations in computer vision

    Get PDF
    The proliferation of camera equipped devices, such as netbooks, smartphones and game stations, has led to a significant increase in the production of visual content. This visual information could be used for understanding the environment and offering a natural interface between the users and their surroundings. However, the massive amounts of data and the high computational cost associated with them, encumbers the transfer of sophisticated vision algorithms to real life systems, especially ones that exhibit resource limitations such as restrictions in available memory, processing power and bandwidth. One approach for tackling these issues is to generate compact and descriptive representations of image data by exploiting inherent redundancies. We propose the investigation of dimensionality reduction and sparse representations in order to accomplish this task. In dimensionality reduction, the aim is to reduce the dimensions of the space where image data reside in order to allow resource constrained systems to handle them and, ideally, provide a more insightful description. This goal is achieved by exploiting the inherent redundancies that many classes of images, such as faces under different illumination conditions and objects from different viewpoints, exhibit. We explore the description of natural images by low dimensional non-linear models called image manifolds and investigate the performance of computer vision tasks such as recognition and classification using these low dimensional models. In addition to dimensionality reduction, we study a novel approach in representing images as a sparse linear combination of dictionary examples. We investigate how sparse image representations can be used for a variety of tasks including low level image modeling and higher level semantic information extraction. Using tools from dimensionality reduction and sparse representation, we propose the application of these methods in three hierarchical image layers, namely low-level features, mid-level structures and high-level attributes. Low level features are image descriptors that can be extracted directly from the raw image pixels and include pixel intensities, histograms, and gradients. In the first part of this work, we explore how various techniques in dimensionality reduction, ranging from traditional image compression to the recently proposed Random Projections method, affect the performance of computer vision algorithms such as face detection and face recognition. In addition, we discuss a method that is able to increase the spatial resolution of a single image, without using any training examples, according to the sparse representations framework. In the second part, we explore mid-level structures, including image manifolds and sparse models, produced by abstracting information from low-level features and offer compact modeling of high dimensional data. We propose novel techniques for generating more descriptive image representations and investigate their application in face recognition and object tracking. In the third part of this work, we propose the investigation of a novel framework for representing the semantic contents of images. This framework employs high level semantic attributes that aim to bridge the gap between the visual information of an image and its textual description by utilizing low level features and mid level structures. This innovative paradigm offers revolutionary possibilities including recognizing the category of an object from purely textual information without providing any explicit visual example

    Pedestrian Detection and Tracking in Urban Context Using a Mono-camera

    Get PDF
    Jalakäijate tuvastus ja jälgimine on üks tähtsamaid aspekte edasijõudnud sõitja abisüsteemides. Need süsteemid aitavad vältida ohtlikke olukordi, juhendades sõitjaid ja hoiatades ettetulevate riskide eest. Jalakäijate tuvastuse ja jälgimise põhiideed on tuvastada jalakäijad siis, kui nad on turvalises tsoonis ja ennustada nende asukohta ja suunda. Selle lõputöö eesmärk on uurida võimalikke meetodeid ja arendada nende põhjal hea algoritm jalakäijate tuvastuseks ja jälgimiseks.Selles lõputöös arendatud lahendus keskendub jalakäija täpsele tuvastamisele ja jälgimisele. Süsteemi täpsuse hindamiseks on saadud tulemusi võrreldud olemasolevate lahendustega.Pedestrian detection and tracking are one of the important aspects in Advanced Driver Assistance Systems. These systems help to avoid dangerous situations, by guiding drivers and warning them about the upcoming risks. The main ideas of pedestrian detection and tracking are to detect pedestrians, while they are in the secure zone, and predict their position and direction.The goal of this thesis is to examine possible methods and based on these, to develop a good pedestrian detection and tracking algorithm. The solution developed in this thesis, focuses on accurately detecting and tracking a pedestrian. In order to estimate the accuracy of the system, obtained results will be compared to the existing solutions

    Detection of Motorcycles in Urban Traffic Using Video Analysis: A Review

    Get PDF
    Motorcycles are Vulnerable Road Users (VRU) and as such, in addition to bicycles and pedestrians, they are the traffic actors most affected by accidents in urban areas. Automatic video processing for urban surveillance cameras has the potential to effectively detect and track these road users. The present review focuses on algorithms used for detection and tracking of motorcycles, using the surveillance infrastructure provided by CCTV cameras. Given the importance of results achieved by Deep Learning theory in the field of computer vision, the use of such techniques for detection and tracking of motorcycles is also reviewed. The paper ends by describing the performance measures generally used, publicly available datasets (introducing the Urban Motorbike Dataset (UMD) with quantitative evaluation results for different detectors), discussing the challenges ahead and presenting a set of conclusions with proposed future work in this evolving area

    Object Tracking

    Get PDF
    Object tracking consists in estimation of trajectory of moving objects in the sequence of images. Automation of the computer object tracking is a difficult task. Dynamics of multiple parameters changes representing features and motion of the objects, and temporary partial or full occlusion of the tracked objects have to be considered. This monograph presents the development of object tracking algorithms, methods and systems. Both, state of the art of object tracking methods and also the new trends in research are described in this book. Fourteen chapters are split into two sections. Section 1 presents new theoretical ideas whereas Section 2 presents real-life applications. Despite the variety of topics contained in this monograph it constitutes a consisted knowledge in the field of computer object tracking. The intention of editor was to follow up the very quick progress in the developing of methods as well as extension of the application

    Robust real-time tracking in smart camera networks

    Get PDF

    Text Extraction From Natural Scene: Methodology And Application

    Full text link
    With the popularity of the Internet and the smart mobile device, there is an increasing demand for the techniques and applications of image/video-based analytics and information retrieval. Most of these applications can benefit from text information extraction in natural scene. However, scene text extraction is a challenging problem to be solved, due to cluttered background of natural scene and multiple patterns of scene text itself. To solve these problems, this dissertation proposes a framework of scene text extraction. Scene text extraction in our framework is divided into two components, detection and recognition. Scene text detection is to find out the regions containing text from camera captured images/videos. Text layout analysis based on gradient and color analysis is performed to extract candidates of text strings from cluttered background in natural scene. Then text structural analysis is performed to design effective text structural features for distinguishing text from non-text outliers among the candidates of text strings. Scene text recognition is to transform image-based text in detected regions into readable text codes. The most basic and significant step in text recognition is scene text character (STC) prediction, which is multi-class classification among a set of text character categories. We design robust and discriminative feature representations for STC structure, by integrating multiple feature descriptors, coding/pooling schemes, and learning models. Experimental results in benchmark datasets demonstrate the effectiveness and robustness of our proposed framework, which obtains better performance than previously published methods. Our proposed scene text extraction framework is applied to 4 scenarios, 1) reading print labels in grocery package for hand-held object recognition; 2) combining with car detection to localize license plate in camera captured natural scene image; 3) reading indicative signage for assistant navigation in indoor environments; and 4) combining with object tracking to perform scene text extraction in video-based natural scene. The proposed prototype systems and associated evaluation results show that our framework is able to solve the challenges in real applications

    System Designs for Diabetic Foot Ulcer Image Assessment

    Get PDF
    For individuals with type 2 diabetes, diabetic foot ulcers represent a significant health issue and the wound care cost is quite high. Currently, clinicians and nurses mainly base their wound assessment on visual examination of wound size and the status of the wound tissue. This method is potentially inaccurate for wound assessment and requires extra clinical workload. In view of the prevalence of smartphones with high resolution digital camera, assessing wound healing by analyzing of real-time images using the significant computational power of today’s mobile devices is an attractive approach for managing foot ulcers. Alternatively, the smartphone may be used just for image capture and wireless transfer to a PC or laptop for image processing. To achieve accurate foot ulcer image assessment, we have developed and tested a novel automatic wound image analysis system which accomplishes the following conditions: 1) design of an easy-to-use image capture system which makes the image capture process comfortable for the patient and provides well-controlled image capture conditions; 2) synthesis of efficient and accurate algorithms for real-time wound boundary determination to measure the wound area size; 3) development of a quantitative method to assess the wound healing status based on a foot ulcer image sequence for a given patient and 4) design of a wound image assessment and management system that can be used both in the patient’s home and clinical environment in a tele-medicine fashion. In our work, the wound image is captured by the camera on the smartphone while the patient’s foot is held in place by an image capture box, which is specially design to aid patients in photographing ulcers occurring on the sole of their feet. The experimental results prove that our image capture system guarantees consistent illumination and a fixed distance between the foot and camera. These properties greatly reduce the complexity of the subsequent wound recognition and assessment. The most significant contribution of our work is the development of five different wound boundary determination approaches based on different computer vision algorithms. The first approach employs the level set algorithm to determine the wound boundary directly based on a manually set initial curve. The second and third approaches are the mean-shift segmentation based methods augmented by foot outline detection and analysis. These two approaches have been shown to be efficient to implement (especially on smartphones), prior-knowledge independent and able to provide reasonably accurate wound segmentation results given a set of well-tuned parameters. However, this method suffers from the lack of self-adaptivity due to the fact that it is not based on machine learning. Consequently, a two-stage Support Vector Machine (SVM) binary classifier based wound recognition approach is developed and implemented. This approach consists of three major steps 1) unsupervised super-pixel segmentation, 2) feature descriptor extraction for each super-pixel and 3) supervised classifier based wound boundary determination. The experimental results show that this approach provides promising performance (sensitivity: 73.3%, specificity: 95.6%) when dealing with foot ulcer images captured with our image capture box. In the third approach, we further relax the image capture constraints and generalize the application of our wound recognition system by applying the conditional random field (CRF) based model to solve the wound boundary determination. The key modules in this approach are the TextonBoost based potential learning at different scales and efficient CRF model inference to find the optimal labeling. Finally, the standard K-means clustering algorithm is applied to the determined wound area for color based wound tissue classification. To train the models used in the last two approaches, as well as to evaluate all three methods, we have collected about 100 wound images at the wound clinic in UMass Medical School by tracking 15 patients for a 2-year period, following an IRB approved protocol. The wound recognition results were compared with the ground truth generated by combining clinical labeling from three experienced clinicians. Specificity and sensitivity based measures indicate that the CRF based approach is the most reliable method despite its implementation complexity and computational demands. In addition, sample images of Moulage wound simulations are also used to increase the evaluation flexibility. The advantages and disadvantages of three approaches are described. Another important contribution of this work has been development of a healing score based mechanism for quantitative wound healing status assessment. The wound size and color composition measurements were converted to a score number ranging from 0-10, which indicates the healing trend based on comparisons of subsequent images to an initial foot ulcer image. By comparing the result of the healing score algorithm to the healing scores determined by experienced clinicians, we assess the clinical validity of our healing score algorithm. The level of agreement of our healing score with the three assessing clinicians was quantified by using the Kripendorff’s Alpha Coefficient (KAC). Finally, a collaborative wound image management system between the PC and smartphone was designed and successfully applied in the wound clinic for patients’ wound tracking purpose. This system is proven to be applicable in clinical environment and capable of providing interactive foot ulcer care in a telemedicine fashion
    corecore