25 research outputs found

    DeePoint: Pointing Recognition and Direction Estimation From A Fixed View

    Full text link
    In this paper, we realize automatic visual recognition and direction estimation of pointing. We introduce the first neural pointing understanding method based on two key contributions. The first is the introduction of a first-of-its-kind large-scale dataset for pointing recognition and direction estimation, which we refer to as the DP Dataset. DP Dataset consists of more than 2 million frames of over 33 people pointing in various styles annotated for each frame with pointing timings and 3D directions. The second is DeePoint, a novel deep network model for joint recognition and 3D direction estimation of pointing. DeePoint is a Transformer-based network which fully leverages the spatio-temporal coordination of the body parts, not just the hands. Through extensive experiments, we demonstrate the accuracy and efficiency of DeePoint. We believe DP Dataset and DeePoint will serve as a sound foundation for visual human intention understanding

    Attribute-Aware Loss Function for Accurate Semantic Segmentation Considering the Pedestrian Orientations

    Get PDF
    Numerous applications such as autonomous driving, satellite imagery sensing, and biomedical imaging use computer vision as an important tool for perception tasks. For Intelligent Transportation Systems (ITS), it is required to precisely recognize and locate scenes in sensor data. Semantic segmentation is one of computer vision methods intended to perform such tasks. However, the existing semantic segmentation tasks label each pixel with a single object's class. Recognizing object attributes, e.g., pedestrian orientation, will be more informative and help for a better scene understanding. Thus, we propose a method to perform semantic segmentation with pedestrian attribute recognition simultaneously. We introduce an attribute-aware loss function that can be applied to an arbitrary base model. Furthermore, a re-annotation to the existing Cityscapes dataset enriches the ground-truth labels by annotating the attributes of pedestrian orientation. We implement the proposed method and compare the experimental results with others. The attribute-aware semantic segmentation shows the ability to outperform baseline methods both in the traditional object segmentation task and the expanded attribute detection task

    Predicting reliable H2_2 column density maps from molecular line data using machine learning

    Full text link
    The total mass estimate of molecular clouds suffers from the uncertainty in the H2_2-CO conversion factor, the so-called XCOX_{\rm CO} factor, which is used to convert the 12^{12}CO (1--0) integrated intensity to the H2_2 column density. We demonstrate the machine learning's ability to predict the H2_2 column density from the 12^{12}CO, 13^{13}CO, and C18^{18}O (1--0) data set of four star-forming molecular clouds; Orion A, Orion B, Aquila, and M17. When the training is performed on a subset of each cloud, the overall distribution of the predicted column density is consistent with that of the Herschel column density. The total column density predicted and observed is consistent within 10\%, suggesting that the machine learning prediction provides a reasonable total mass estimate of each cloud. However, the distribution of the column density for values >2×1022> \sim 2 \times 10^{22} cm2^{-2}, which corresponds to the dense gas, could not be predicted well. This indicates that molecular line observations tracing the dense gas are required for the training. We also found a significant difference between the predicted and observed column density when we created the model after training the data on different clouds. This highlights the presence of different XCOX_{\rm CO} factors between the clouds, and further training in various clouds is required to correct for these variations. We also demonstrated that this method could predict the column density toward the area not observed by Herschel if the molecular line and column density maps are available for the small portion, and the molecular line data are available for the larger areas.Comment: Accepted for publication in MNRA

    Distance determination of molecular clouds in the 1st quadrant of the Galactic plane using deep learning : I. Method and Results

    Full text link
    Machine learning has been successfully applied in varied field but whether it is a viable tool for determining the distance to molecular clouds in the Galaxy is an open question. In the Galaxy, the kinematic distance is commonly employed as the distance to a molecular cloud. However, there is a problem in that for the inner Galaxy, two different solutions, the ``Near'' solution, and the ``Far'' solution, can be derived simultaneously. We attempted to construct a two-class (``Near'' or ``Far'') inference model using a Convolutional Neural Network (CNN), a form of deep learning that can capture spatial features generally. In this study, we used the CO dataset toward the 1st quadrant of the Galactic plane obtained with the Nobeyama 45-m radio telescope (l = 62-10 degree, |b| < 1 degree). In the model, we applied the three-dimensional distribution (position-position-velocity) of the 12CO (J=1-0) emissions as the main input. The dataset with ``Near'' or ``Far'' annotation was made from the HII region catalog of the infrared astronomy satellite WISE to train the model. As a result, we could construct a CNN model with a 76% accuracy rate on the training dataset. By using the model, we determined the distance to molecular clouds identified by the CLUMPFIND algorithm. We found that the mass of the molecular clouds with a distance of < 8.15 kpc identified in the 12CO data follows a power-law distribution with an index of about -2.3 in the mass range of M >10^3 Msun. Also, the detailed molecular gas distribution of the Galaxy as seen from the Galactic North pole was determined.Comment: 29 pages, 12 figure

    MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results

    Full text link
    Small Object Detection (SOD) is an important machine vision topic because (i) a variety of real-world applications require object detection for distant objects and (ii) SOD is a challenging task due to the noisy, blurred, and less-informative image appearances of small objects. This paper proposes a new SOD dataset consisting of 39,070 images including 137,121 bird instances, which is called the Small Object Detection for Spotting Birds (SOD4SB) dataset. The detail of the challenge with the SOD4SB dataset is introduced in this paper. In total, 223 participants joined this challenge. This paper briefly introduces the award-winning methods. The dataset, the baseline code, and the website for evaluation on the public testset are publicly available.Comment: This paper is included in the proceedings of the 18th International Conference on Machine Vision Applications (MVA2023). It will be officially published at a later date. Project page : https://www.mva-org.jp/mva2023/challeng

    動的に変化する環境における固定カメラのための背景画像推定

    Get PDF
    京都大学0048新制・課程博士博士(情報学)甲第17244号情博第475号新制||情||85(附属図書館)29990京都大学大学院情報学研究科知能情報学専攻(主査)教授 美濃 導彦, 教授 松山 隆司, 教授 中村 裕一学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDA

    Future Pose Prediction from 3D Human Skeleton Sequence with Surrounding Situation

    No full text
    Human pose prediction is vital for robot applications such as human–robot interaction and autonomous control of robots. Recent prediction methods often use deep learning and are based on a 3D human skeleton sequence to predict future poses. Even if the starting motions of 3D human skeleton sequences are very similar, their future poses will have variety. It makes it difficult to predict future poses only from a given human skeleton sequence. Meanwhile, when carefully observing human motions, we can find that human motions are often affected by objects or other people around the target person. We consider that the presence of surrounding objects is an important clue for the prediction. This paper proposes a method for predicting the future skeleton sequence by incorporating the surrounding situation into the prediction model. The proposed method uses a feature of an image around the target person as the surrounding information. We confirmed the performance improvement of the proposed method through evaluations on publicly available datasets. As a result, the prediction accuracy was improved for object-related and human-related motions

    SDOF-Tracker: Fast and Accurate Multiple Human Tracking by Skipped-Detection and Optical-Flow

    Full text link
    Multiple human tracking is a fundamental problem for scene understanding. Although both accuracy and speed are required in real-world applications, recent tracking methods based on deep learning have focused on accuracy and require substantial running time. This study aims to improve running speed by performing human detection at a certain frame interval because it accounts for most of the running time. The question is how to maintain accuracy while skipping human detection. In this paper, we propose a method that complements the detection results with optical flow, based on the fact that someone's appearance does not change much between adjacent frames. To maintain the tracking accuracy, we introduce robust interest point selection within human regions and a tracking termination metric calculated by the distribution of the interest points. On the MOT20 dataset in the MOTChallenge, the proposed SDOF-Tracker achieved the best performance in terms of the total running speed while maintaining the MOTA metric. Our code is available at https://github.com/hitottiez/sdof-tracker

    Machine Learning-Based Interpretable Modeling for Subjective Emotional Dynamics Sensing Using Facial EMG

    No full text
    Understanding the association between subjective emotional experiences and physiological signals is of practical and theoretical significance. Previous psychophysiological studies have shown a linear relationship between dynamic emotional valence experiences and facial electromyography (EMG) activities. However, whether and how subjective emotional valence dynamics relate to facial EMG changes nonlinearly remains unknown. To investigate this issue, we re-analyzed the data of two previous studies that measured dynamic valence ratings and facial EMG of the corrugator supercilii and zygomatic major muscles from 50 participants who viewed emotional film clips. We employed multilinear regression analyses and two nonlinear machine learning (ML) models: random forest and long short-term memory. In cross-validation, these ML models outperformed linear regression in terms of the mean squared error and correlation coefficient. Interpretation of the random forest model using the SHapley Additive exPlanation tool revealed nonlinear and interactive associations between several EMG features and subjective valence dynamics. These findings suggest that nonlinear ML models can better fit the relationship between subjective emotional valence dynamics and facial EMG than conventional linear models and highlight a nonlinear and complex relationship. The findings encourage emotion sensing using facial EMG and offer insight into the subjective–physiological association
    corecore