9 research outputs found

    Visual focus of attention estimation using eye center localization

    Get PDF
    Estimating people visual focus of attention (VFOA) plays a crucial role in various practical systems such as human-robot interaction. It is challenging to extract the cue of the VFOA of a person due to the difficulty of recognizing gaze directionality. In this paper, we propose an improved integrodifferential approach to represent gaze via efficiently and accurately localizing the eye center in lower resolution image. The proposed method takes advantage of the drastic intensity changes between the iris and the sclera and the grayscale of the eye center as well. The number of kernels is optimized to convolute the original eye region image, and the eye center is located via searching the maximum ratio derivative of the neighbor curve magnitudes in the convolution image. Experimental results confirm that the algorithm outperforms the state-of-the-art methods in terms of computational cost, accuracy, and robustness to illumination changes

    Independent dual graph attention convolutional network for skeleton-based action recognition

    No full text
    Graph convolutional networks (GCNs) have been widely adopted in skeleton-based action recognition, achieving impressive outcomes. However, the convolution operations in GCNs fail to make full use of the original input data, which restricts its ability to accurately capture the correlation within the skeleton. To solve this issue, this study introduces an independent dual graph attention convolutional network (IDGAN). Specifically, IDGAN additionally incorporates an instinctive attention module that leverages self-attention to capture the correlation among the joints in the original input skeleton. In addition, two independent convolutional operations are used to process two self-attention modules, respectively, to further refine the relationship between skeleton joints. Extensive experiments on several publicly available datasets show that IDGAN outperforms most state-of-the-art algorithms.</p

    Multi-stage adaptive regression for online activity recognition

    No full text
    Online activity recognition which aims to detect and recognize activity instantly from a continuous video stream is a key technology in human-robot interaction. However, the partial activity observation problem, mainly due to the incomplete sequence acquisition, makes it greatly challenging. This paper proposes a novel approach, named Multi-stage Adaptive Regression (MAR), for online activity recognition with the main focus on addressing the partial observation problem. Specifically, the MAR framework delicately assembles overlapped activity observations to improve its robustness against arbitrary activity segments. Then multiple score functions corresponding to each specific performance stage are collaboratively learned via a adaptive label strategy to enhance its power of discriminating similar partial activities. Moreover, the Online Human Interaction (OHI) database is constructed to evaluate the online activity recognition in human interaction scenarios. Extensive experimental evaluations on the Multi-Modal Action Detection (MAD) database and the OHI database show that the MAR method achieves an outstanding performance over the state-of-the-art approaches

    A neural refinement network for single image view synthesis

    No full text
    Recent years have seen an increasing interest in single image view synthesis. It remains however a challenging task due to the lack of comprehensive colour and depth information from different views. In this paper, we propose a novel view synthesis approach that incorporates a Neural Image Refinement Network (NIRN) and generates both depth and colour images for the target view in an end-to-end manner. The appearance of the colour image greatly benefits from the generated depth image as it provides an intermediate projection relationship for the object in the 3D world. Since the direct application of geometric projection mapping will result in empty regions and/or distortions, our approach proposes to embed a novel refinement network into the view synthesis pipeline for improved performance. Experimental results on three publicly available datasets demonstrate that our NIRN outperforms other state-of-the-art view synthesis methods.</p

    Human following for mobile robots

    No full text
    Human following is an essential function in many robotic systems. Most of the existing human following algorithms are based on human tracking algorithms. However, in practical scenarios, the human subject might easily disappear due to occlusions and quick movements. In order to solve the problem of occlusion, this paper proposed a classification-based human following framework. After using a pretrained MobileNetV2 model to detect the human subjects, the robot will automatically train a classification model to identify the target person. In the end, the robot is controlled by some rule-based motion commands to follow the target human. Experimental results on several practical scenarios have demonstrated the effectiveness of the algorithm. </p

    Walking motion real-time detection method based on walking stick, IoT, COPOD and improved LightGBM

    No full text
    Real-time walking behavior monitoring is essential in ensuring safety and improving people’s physical conditions with mobility difficulties. In this paper, a real-time walking motion detection system based on the intelligent walking stick, mobile phone and multi-label imbalance classification method combining focal loss and LightGBM (MFGBoost) is proposed. The Internet of Things (IoT) technology is utilized for communicating between the walking stick and mobile phone. The new MFGBoost is embedded into the Raspberry Pi to classify human motions. MFGBoost is scalable, and other boosting models, such as XGBoost, could also be used as its base classifier. An improved derivation method of the multi-classification focal loss function is proposed in this paper, which is the key to the combination of multi-classification focal loss and Boosting algorithms. We propose a novel denoise method based on window matrix and COPOD algorithm (W-OD). The window matrix is designed to extract data features and smooth noise, and COPOD could output the noise level of the model. A weighted loss function is designed to adjust the model’s attention to different samples based on the W-OD algorithm. We evaluate the latest classification model from multiple perspectives on multiple benchmark datasets and demonstrate that MFGBoost and W-OD-MFGBoost could improve classification performance and decision-making efficiency. Experiments conducted on human motion datasets show that W-OD-MFGBoost could achieve more than 97 percent classification accuracy

    An adaptive multi-class imbalanced classification framework based on ensemble methods and deep network

    No full text
    Data imbalance is one of the most difficult problems in machine learning. The improved ensemble learning model is a promising solution to mitigate this challenge. In this paper, an improved multi-class imbalanced data classification framework is proposed by combining the Focal Loss with Boosting model (FL-Boosting). By addressing the confusion of the second-order derivation of Focal Loss in the traditional ensemble learning model, the proposed model achieves a more efficient and accurate classification of the imbalanced data. More specifically, a Highly Adaptive Focal Loss (HAFL) is proposed to ensure that the model maintains lasting attention to the minority samples, which could be combined with boosting model to build HAFL-Boosting to achieve better performance. The framework has the scalability to adapt to different situations according to typical ensemble learning algorithms such as LightGBM, XGBoost and CatBoost. In addition, to implement the application of the proposed framework on deep models, a two-stage classification method combining ConvNeXt with the improved boosting model is proposed, which could improve the recognition ability to high-dimensional imbalanced data. We evaluate the HAFL-Boosting and the two-stage class imbalance classification method by ablation experiments and benchmark experiments, which demonstrated that the proposed methods obviously improved the scores on several evaluation indexes. The comparative experiments with the latest classification models show that the proposed methods could achieve leading performance from multiple perspectives

    Learning a 3D gaze estimator with adaptive weighted strategy

    No full text
    As a method of predicting the target’s attention distribution, gaze estimation plays an important role in human-computer interaction. In this paper, we learn a 3D gaze estimator with adaptive weighted strategy to get the mapping from the complete images to the gaze vector. We select the both eyes, the complete face and their fusion features as the input of the regression model of gaze estimator. Considering that the different areas of the face have different contributions on the results of gaze estimation under free head movement, we design a new learning strategy for the regression net. To improve the efficiency of the regression model to a great extent, we propose a weighted network that can adjust the learning strategy of the regression net adaptively. Experimental results conducted on the MPIIGaze and EyeDiap datasets demonstrate that our method can achieve superior performance compared with other state-of-the-art 3D gaze estimation methods

    The DREAM Dataset: Supporting a data-driven study of autism spectrum disorder and robot enhanced therapy

    No full text
    We present a dataset of behavioral data recorded from 61 children diagnosed with Autism Spectrum Disorder (ASD). The data was collected during a large-scale evaluation of Robot Enhanced Therapy (RET). The dataset covers over 3000 therapy sessions and more than 300 hours of therapy. Half of the children interacted with the social robot NAO supervised by a therapist. The other half, constituting a control group, interacted directly with a therapist. Both groups followed the Applied Behavior Analysis (ABA) protocol. Each session was recorded with three RGB cameras and two RGBD (Kinect) cameras, providing detailed information of children’s behavior during therapy. This public release of the dataset comprises body motion, head position and orientation, and eye gaze variables, all specified as 3D data in a joint frame of reference. In addition, metadata including participant age, gender, and autism diagnosis (ADOS) variables are included. We release this data with the hope of supporting further data-driven studies towards improved therapy methods as well as a better understanding of ASD in general
    corecore