47 research outputs found

    Visual Monitoring of Driver and Passenger Control Panel Interactions

    Get PDF

    Incorporating Boltzmann Machine Priors for Semantic Labeling in Images and Videos

    Get PDF
    Semantic labeling is the task of assigning category labels to regions in an image. For example, a scene may consist of regions corresponding to categories such as sky, water, and ground, or parts of a face such as eyes, nose, and mouth. Semantic labeling is an important mid-level vision task for grouping and organizing image regions into coherent parts. Labeling these regions allows us to better understand the scene itself as well as properties of the objects in the scene, such as their parts, location, and interaction within the scene. Typical approaches for this task include the conditional random field (CRF), which is well-suited to modeling local interactions among adjacent image regions. However the CRF is limited in dealing with complex, global (long-range) interactions between regions in an image, and between frames in a video. This thesis presents approaches to modeling long-range interactions within images and videos, for use in semantic labeling. In order to model these long-range interactions, we incorporate priors based on the restricted Boltzmann machine (RBM). The RBM is a generative model which has demonstrated the ability to learn the shape of an object and the CRBM is a temporal extension which can learn the motion of an object. Although the CRF is a good baseline labeler, we show how the RBM and CRBM can be added to the architecture to model both the global object shape within an image and the temporal dependencies of the object from previous frames in a video. We demonstrate the labeling performance of our models for the parts of complex face images from the Labeled Faces in the Wild database (for images) and the YouTube Faces Database (for videos). Our hybrid models produce results that are both quantitatively and qualitatively better than the baseline CRF alone for both images and videos

    IMPROVING EFFICIENCY AND SCALABILITY IN VISUAL SURVEILLANCE APPLICATIONS

    Get PDF
    We present four contributions to visual surveillance: (a) an action recognition method based on the characteristics of human motion in image space; (b) a study of the strengths of five regression techniques for monocular pose estimation that highlights the advantages of kernel PLS; (c) a learning-based method for detecting objects carried by humans requiring minimal annotation; (d) an interactive video segmentation system that reduces supervision by using occlusion and long term spatio-temporal structure information. We propose a representation for human actions that is based solely on motion information and that leverages the characteristics of human movement in the image space. The representation is best suited to visual surveillance settings in which the actions of interest are highly constrained, but also works on more general problems if the actions are ballistic in nature. Our computationally efficient representation achieves good recognition performance on both a commonly used action recognition dataset and on a dataset we collected to simulate a checkout counter. We study discriminative methods for 3D human pose estimation from single images, which build a map from image features to pose. The main difficulty with these methods is the insufficiency of training data due to the high dimensionality of the pose space. However, real datasets can be augmented with data from character animation software, so the scalability of existing approaches becomes important. We argue that Kernel Partial Least Squares approximates Gaussian Process regression robustly, enabling the use of larger datasets, and we show in experiments that kPLS outperforms two state-of-the-art methods based on GP. The high variability in the appearance of carried objects suggests using their relation to the human silhouette to detect them. We adopt a generate-and-test approach that produces candidate regions from protrusion, color contrast and occlusion boundary cues and then filters them with a kernel SVM classifier on context features. Our method exceeds state of the art accuracy and has good generalization capability. We also propose a Multiple Instance Learning framework for the classifier that reduces annotation effort by two orders of magnitude while maintaining comparable accuracy. Finally, we present an interactive video segmentation system that trades off a small amount of segmentation quality for significantly less supervision than necessary in systems in the literature. While applications like video editing could not directly use the output of our system, reasoning about the trajectories of objects in a scene or learning coarse appearance models is still possible. The unsupervised segmentation component at the base of our system effectively employs occlusion boundary cues and achieves competitive results on an unsupervised segmentation dataset. On videos used to evaluate interactive methods, our system requires less interaction time than others, does not rely on appearance information and can extract multiple objects at the same time

    Biometric Systems

    Get PDF
    Because of the accelerating progress in biometrics research and the latest nation-state threats to security, this book's publication is not only timely but also much needed. This volume contains seventeen peer-reviewed chapters reporting the state of the art in biometrics research: security issues, signature verification, fingerprint identification, wrist vascular biometrics, ear detection, face detection and identification (including a new survey of face recognition), person re-identification, electrocardiogram (ECT) recognition, and several multi-modal systems. This book will be a valuable resource for graduate students, engineers, and researchers interested in understanding and investigating this important field of study

    Ensemble Graph Neural Networks on Melanoma and Cervical Cancer Screening Datasets using SLIC Superpixels

    Get PDF
    Graph neural networks (GNNs) have become the standard procedure to deal with graphstructured data and data in non-Euclidean spaces. Since 2017, numerous researchers have been using GNN models in their experiments. However, despite GNNs’ recent rapid growth, there are not yet many real-world applications that benefit from these models. In this thesis, we use GNNs as image classifiers. To improve the efficiency and reduce the complexity of the models, we first generate graphs from the images by creating superpixels of the images and use them as our graph nodes instead of individual pixels. Then, we define edges on these nodes based on the distance of each superpixel to their closest ones. We propose two ensemble frameworks containing a pre-trained ResNet18 and two Graph Neural Network (GNN) models (GAT and GIN). We call these frameworks GATRes and GATGIN, respectively. We test these frameworks on two real-world medical applications: Cervical Cancer Screening and Melanoma detection. Cervical cancer is among the top four common cancer in women worldwide. Cervical cancer can be easily prevented if caught in its pre-cancerous stage. Determining the appropriate treatment method depends on patients’ physiological differences. A treatment that works effectively for one woman may obscure future cancerous growth in another woman due to differences in the type of their cervix. In this thesis, we experiment with multiple GNNs on this dataset to distinguish the cervix types and examine if these models enhance detection performance and accuracy. The other problem we consider is Melanoma skin cancer. Melanoma is the most lethal skin cancer. If melanoma gets diagnosed in the early stages, the patients’ survival rate will increase significantly. This research applies GNN models on the Melanoma dataset to discriminate between melanoma and benign skin lesions. We show that our models’ sensitivity and accuracy outperform the individual models in our classification tasks. Our GATRes model also outperforms the accuracies achieved by previously published papers on the Cervical Cancer Screening dataset

    Human Pose Estimation from Monocular Images : a Comprehensive Survey

    Get PDF
    Human pose estimation refers to the estimation of the location of body parts and how they are connected in an image. Human pose estimation from monocular images has wide applications (e.g., image indexing). Several surveys on human pose estimation can be found in the literature, but they focus on a certain category; for example, model-based approaches or human motion analysis, etc. As far as we know, an overall review of this problem domain has yet to be provided. Furthermore, recent advancements based on deep learning have brought novel algorithms for this problem. In this paper, a comprehensive survey of human pose estimation from monocular images is carried out including milestone works and recent advancements. Based on one standard pipeline for the solution of computer vision problems, this survey splits the problema into several modules: feature extraction and description, human body models, and modelin methods. Problem modeling methods are approached based on two means of categorization in this survey. One way to categorize includes top-down and bottom-up methods, and another way includes generative and discriminative methods. Considering the fact that one direct application of human pose estimation is to provide initialization for automatic video surveillance, there are additional sections for motion-related methods in all modules: motion features, motion models, and motion-based methods. Finally, the paper also collects 26 publicly available data sets for validation and provides error measurement methods that are frequently used
    corecore