24 research outputs found

    NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

    Full text link
    Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding. [The dataset is available at: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

    Video-based Bed Monitoring

    Get PDF

    Using Radio Frequency and Motion Sensing to Improve Camera Sensor Systems

    Get PDF
    Camera-based sensor systems have advanced significantly in recent years. This advancement is a combination of camera CMOS (complementary metal-oxide-semiconductor) hardware technology improvement and new computer vision (CV) algorithms that can better process the rich information captured. As the world becoming more connected and digitized through increased deployment of various sensors, cameras have become a cost-effective solution with the advantages of small sensor size, intuitive sensing results, rich visual information, and neural network-friendly. The increased deployment and advantages of camera-based sensor systems have fueled applications such as surveillance, object detection, person re-identification, scene reconstruction, visual tracking, pose estimation, and localization. However, camera-based sensor systems have fundamental limitations such as extreme power consumption, privacy-intrusive, and inability to see-through obstacles and other non-ideal visual conditions such as darkness, smoke, and fog. In this dissertation, we aim to improve the capability and performance of camera-based sensor systems by utilizing additional sensing modalities such as commodity WiFi and mmWave (millimeter wave) radios, and ultra-low-power and low-cost sensors such as inertial measurement units (IMU). In particular, we set out to study three problems: (1) power and storage consumption of continuous-vision wearable cameras, (2) human presence detection, localization, and re-identification in both indoor and outdoor spaces, and (3) augmenting the sensing capability of camera-based systems in non-ideal situations. We propose to use an ultra-low-power, low-cost IMU sensor, along with readily available camera information, to solve the first problem. WiFi devices will be utilized in the second problem, where our goal is to reduce the hardware deployment cost and leverage existing WiFi infrastructure as much as possible. Finally, we will use a low-cost, off-the-shelf mmWave radar to extend the sensing capability of a camera in non-ideal visual sensing situations.Doctor of Philosoph

    Handbook of Vascular Biometrics

    Get PDF

    Handbook of Vascular Biometrics

    Get PDF
    This open access handbook provides the first comprehensive overview of biometrics exploiting the shape of human blood vessels for biometric recognition, i.e. vascular biometrics, including finger vein recognition, hand/palm vein recognition, retina recognition, and sclera recognition. After an introductory chapter summarizing the state of the art in and availability of commercial systems and open datasets/open source software, individual chapters focus on specific aspects of one of the biometric modalities, including questions of usability, security, and privacy. The book features contributions from both academia and major industrial manufacturers

    Adaptive Methods for Robust Document Image Understanding

    Get PDF
    A vast amount of digital document material is continuously being produced as part of major digitization efforts around the world. In this context, generic and efficient automatic solutions for document image understanding represent a stringent necessity. We propose a generic framework for document image understanding systems, usable for practically any document types available in digital form. Following the introduced workflow, we shift our attention to each of the following processing stages in turn: quality assurance, image enhancement, color reduction and binarization, skew and orientation detection, page segmentation and logical layout analysis. We review the state of the art in each area, identify current defficiencies, point out promising directions and give specific guidelines for future investigation. We address some of the identified issues by means of novel algorithmic solutions putting special focus on generality, computational efficiency and the exploitation of all available sources of information. More specifically, we introduce the following original methods: a fully automatic detection of color reference targets in digitized material, accurate foreground extraction from color historical documents, font enhancement for hot metal typesetted prints, a theoretically optimal solution for the document binarization problem from both computational complexity- and threshold selection point of view, a layout-independent skew and orientation detection, a robust and versatile page segmentation method, a semi-automatic front page detection algorithm and a complete framework for article segmentation in periodical publications. The proposed methods are experimentally evaluated on large datasets consisting of real-life heterogeneous document scans. The obtained results show that a document understanding system combining these modules is able to robustly process a wide variety of documents with good overall accuracy
    corecore