7 research outputs found

    CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis in the Wild

    Full text link
    Non-intrusive, real-time analysis of the dynamics of the eye region allows us to monitor humans' visual attention allocation and estimate their mental state during the performance of real-world tasks, which can potentially benefit a wide range of human-computer interaction (HCI) applications. While commercial eye-tracking devices have been frequently employed, the difficulty of customizing these devices places unnecessary constraints on the exploration of more efficient, end-to-end models of eye dynamics. In this work, we propose CLERA, a unified model for Cognitive Load and Eye Region Analysis, which achieves precise keypoint detection and spatiotemporal tracking in a joint-learning framework. Our method demonstrates significant efficiency and outperforms prior work on tasks including cognitive load estimation, eye landmark detection, and blink estimation. We also introduce a large-scale dataset of 30k human faces with joint pupil, eye-openness, and landmark annotation, which aims to support future HCI research on human factors and eye-related analysis.Comment: ACM Transactions on Computer-Human Interactio

    Deep Learning Assisted Intelligent Visual and Vehicle Tracking Systems

    Get PDF
    Sensor fusion and tracking is the ability to bring together measurements from multiple sensors of the current and past time to estimate the current state of a system. The resulting state estimate is more accurate compared with the direct sensor measurement because it balances between the state prediction based on the assumed motion model and the noisy sensor measurement. Systems can then use the information provided by the sensor fusion and tracking process to support more-intelligent actions and achieve autonomy in a system like an autonomous vehicle. In the past, widely used sensor data are structured, which can be directly used in the tracking system, e.g., distance, temperature, acceleration, and force. The measurements\u27 uncertainty can be estimated from experiments. However, currently, a large number of unstructured data sources can be generated from sensors such as cameras and LiDAR sensors, which bring new challenges to the fusion and tracking system. The traditional algorithm cannot directly use these unstructured data, and it needs another method or process to “understand” them first. For example, if a system tries to track a particular person in a video sequence, it needs to understand where the person is in the first place. However, the traditional tracking method cannot finish such a task. The measurement model for unstructured data is usually difficult to construct. Deep learning techniques provide promising solutions to this type of problem. A deep learning method can learn and understand the unstructured data to accomplish tasks such as object detection in images, object localization in LiDAR point clouds, and driver behavior prediction from the current traffic conditions. Deep-learning architectures such as deep neural networks, deep belief networks, recurrent neural networks, and convolutional neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, and machine translation, where they have produced results comparable with human expert performance. How to incorporate information obtained via deep learning into our tracking system is one of the topics of this dissertation. Another challenging task is using learning methods to improve a tracking filter\u27s performance. In a tracking system, many manually tuned system parameters affect the tracking performance, e.g., the process noise covariance and measurement noise covariance in a Kalman Filter (KF). These parameters used to be estimated by running the tracking algorithm several times and selecting the one that gives the optimal performance. How to learn the system parameters automatically from data, and how to use machine learning techniques directly to provide useful information to the tracking systems are critical to the proposed tracking system. The proposed research on the intelligent tracking system has two objectives. The first objective is to make a visual tracking filter smart enough to understand unstructured data sources. The second objective is to apply learning algorithms to improve a tracking filter\u27s performance. The goal is to develop an intelligent tracking system that can understand the unstructured data and use the data to improve itself

    Detecting Rotated Objects as Gaussian Distributions and Its 3-D Generalization

    Full text link
    Existing detection methods commonly use a parameterized bounding box (BBox) to model and detect (horizontal) objects and an additional rotation angle parameter is used for rotated objects. We argue that such a mechanism has fundamental limitations in building an effective regression loss for rotation detection, especially for high-precision detection with high IoU (e.g. 0.75). Instead, we propose to model the rotated objects as Gaussian distributions. A direct advantage is that our new regression loss regarding the distance between two Gaussians e.g. Kullback-Leibler Divergence (KLD), can well align the actual detection performance metric, which is not well addressed in existing methods. Moreover, the two bottlenecks i.e. boundary discontinuity and square-like problem also disappear. We also propose an efficient Gaussian metric-based label assignment strategy to further boost the performance. Interestingly, by analyzing the BBox parameters' gradients under our Gaussian-based KLD loss, we show that these parameters are dynamically updated with interpretable physical meaning, which help explain the effectiveness of our approach, especially for high-precision detection. We extend our approach from 2-D to 3-D with a tailored algorithm design to handle the heading estimation, and experimental results on twelve public datasets (2-D/3-D, aerial/text/face images) with various base detectors show its superiority.Comment: 19 pages, 11 figures, 16 tables, accepted by TPAMI 2022. Journal extension for GWD (ICML'21) and KLD (NeurIPS'21). arXiv admin note: text overlap with arXiv:2101.1195
    corecore