25 research outputs found
DeePoint: Pointing Recognition and Direction Estimation From A Fixed View
In this paper, we realize automatic visual recognition and direction
estimation of pointing. We introduce the first neural pointing understanding
method based on two key contributions. The first is the introduction of a
first-of-its-kind large-scale dataset for pointing recognition and direction
estimation, which we refer to as the DP Dataset. DP Dataset consists of more
than 2 million frames of over 33 people pointing in various styles annotated
for each frame with pointing timings and 3D directions. The second is DeePoint,
a novel deep network model for joint recognition and 3D direction estimation of
pointing. DeePoint is a Transformer-based network which fully leverages the
spatio-temporal coordination of the body parts, not just the hands. Through
extensive experiments, we demonstrate the accuracy and efficiency of DeePoint.
We believe DP Dataset and DeePoint will serve as a sound foundation for visual
human intention understanding
Attribute-Aware Loss Function for Accurate Semantic Segmentation Considering the Pedestrian Orientations
Numerous applications such as autonomous driving, satellite imagery sensing, and biomedical imaging use computer vision as an important tool for perception tasks. For Intelligent Transportation Systems (ITS), it is required to precisely recognize and locate scenes in sensor data. Semantic segmentation is one of computer vision methods intended to perform such tasks. However, the existing semantic segmentation tasks label each pixel with a single object's class. Recognizing object attributes, e.g., pedestrian orientation, will be more informative and help for a better scene understanding. Thus, we propose a method to perform semantic segmentation with pedestrian attribute recognition simultaneously. We introduce an attribute-aware loss function that can be applied to an arbitrary base model. Furthermore, a re-annotation to the existing Cityscapes dataset enriches the ground-truth labels by annotating the attributes of pedestrian orientation. We implement the proposed method and compare the experimental results with others. The attribute-aware semantic segmentation shows the ability to outperform baseline methods both in the traditional object segmentation task and the expanded attribute detection task
Predicting reliable H column density maps from molecular line data using machine learning
The total mass estimate of molecular clouds suffers from the uncertainty in
the H-CO conversion factor, the so-called factor, which is
used to convert the CO (1--0) integrated intensity to the H column
density. We demonstrate the machine learning's ability to predict the H
column density from the CO, CO, and CO (1--0) data set of
four star-forming molecular clouds; Orion A, Orion B, Aquila, and M17. When the
training is performed on a subset of each cloud, the overall distribution of
the predicted column density is consistent with that of the Herschel column
density. The total column density predicted and observed is consistent within
10\%, suggesting that the machine learning prediction provides a reasonable
total mass estimate of each cloud. However, the distribution of the column
density for values cm, which corresponds to
the dense gas, could not be predicted well. This indicates that molecular line
observations tracing the dense gas are required for the training. We also found
a significant difference between the predicted and observed column density when
we created the model after training the data on different clouds. This
highlights the presence of different factors between the clouds,
and further training in various clouds is required to correct for these
variations. We also demonstrated that this method could predict the column
density toward the area not observed by Herschel if the molecular line and
column density maps are available for the small portion, and the molecular line
data are available for the larger areas.Comment: Accepted for publication in MNRA
Distance determination of molecular clouds in the 1st quadrant of the Galactic plane using deep learning : I. Method and Results
Machine learning has been successfully applied in varied field but whether it
is a viable tool for determining the distance to molecular clouds in the Galaxy
is an open question. In the Galaxy, the kinematic distance is commonly employed
as the distance to a molecular cloud. However, there is a problem in that for
the inner Galaxy, two different solutions, the ``Near'' solution, and the
``Far'' solution, can be derived simultaneously. We attempted to construct a
two-class (``Near'' or ``Far'') inference model using a Convolutional Neural
Network (CNN), a form of deep learning that can capture spatial features
generally. In this study, we used the CO dataset toward the 1st quadrant of the
Galactic plane obtained with the Nobeyama 45-m radio telescope (l = 62-10
degree, |b| < 1 degree). In the model, we applied the three-dimensional
distribution (position-position-velocity) of the 12CO (J=1-0) emissions as the
main input. The dataset with ``Near'' or ``Far'' annotation was made from the
HII region catalog of the infrared astronomy satellite WISE to train the model.
As a result, we could construct a CNN model with a 76% accuracy rate on the
training dataset. By using the model, we determined the distance to molecular
clouds identified by the CLUMPFIND algorithm. We found that the mass of the
molecular clouds with a distance of < 8.15 kpc identified in the 12CO data
follows a power-law distribution with an index of about -2.3 in the mass range
of M >10^3 Msun. Also, the detailed molecular gas distribution of the Galaxy as
seen from the Galactic North pole was determined.Comment: 29 pages, 12 figure
MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results
Small Object Detection (SOD) is an important machine vision topic because (i)
a variety of real-world applications require object detection for distant
objects and (ii) SOD is a challenging task due to the noisy, blurred, and
less-informative image appearances of small objects. This paper proposes a new
SOD dataset consisting of 39,070 images including 137,121 bird instances, which
is called the Small Object Detection for Spotting Birds (SOD4SB) dataset. The
detail of the challenge with the SOD4SB dataset is introduced in this paper. In
total, 223 participants joined this challenge. This paper briefly introduces
the award-winning methods. The dataset, the baseline code, and the website for
evaluation on the public testset are publicly available.Comment: This paper is included in the proceedings of the 18th International
Conference on Machine Vision Applications (MVA2023). It will be officially
published at a later date. Project page :
https://www.mva-org.jp/mva2023/challeng
動的に変化する環境における固定カメラのための背景画像推定
京都大学0048新制・課程博士博士(情報学)甲第17244号情博第475号新制||情||85(附属図書館)29990京都大学大学院情報学研究科知能情報学専攻(主査)教授 美濃 導彦, 教授 松山 隆司, 教授 中村 裕一学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDA
Future Pose Prediction from 3D Human Skeleton Sequence with Surrounding Situation
Human pose prediction is vital for robot applications such as human–robot interaction and autonomous control of robots. Recent prediction methods often use deep learning and are based on a 3D human skeleton sequence to predict future poses. Even if the starting motions of 3D human skeleton sequences are very similar, their future poses will have variety. It makes it difficult to predict future poses only from a given human skeleton sequence. Meanwhile, when carefully observing human motions, we can find that human motions are often affected by objects or other people around the target person. We consider that the presence of surrounding objects is an important clue for the prediction. This paper proposes a method for predicting the future skeleton sequence by incorporating the surrounding situation into the prediction model. The proposed method uses a feature of an image around the target person as the surrounding information. We confirmed the performance improvement of the proposed method through evaluations on publicly available datasets. As a result, the prediction accuracy was improved for object-related and human-related motions
SDOF-Tracker: Fast and Accurate Multiple Human Tracking by Skipped-Detection and Optical-Flow
Multiple human tracking is a fundamental problem for scene understanding.
Although both accuracy and speed are required in real-world applications,
recent tracking methods based on deep learning have focused on accuracy and
require substantial running time. This study aims to improve running speed by
performing human detection at a certain frame interval because it accounts for
most of the running time. The question is how to maintain accuracy while
skipping human detection. In this paper, we propose a method that complements
the detection results with optical flow, based on the fact that someone's
appearance does not change much between adjacent frames. To maintain the
tracking accuracy, we introduce robust interest point selection within human
regions and a tracking termination metric calculated by the distribution of the
interest points. On the MOT20 dataset in the MOTChallenge, the proposed
SDOF-Tracker achieved the best performance in terms of the total running speed
while maintaining the MOTA metric. Our code is available at
https://github.com/hitottiez/sdof-tracker
Machine Learning-Based Interpretable Modeling for Subjective Emotional Dynamics Sensing Using Facial EMG
Understanding the association between subjective emotional experiences and physiological signals is of practical and theoretical significance. Previous psychophysiological studies have shown a linear relationship between dynamic emotional valence experiences and facial electromyography (EMG) activities. However, whether and how subjective emotional valence dynamics relate to facial EMG changes nonlinearly remains unknown. To investigate this issue, we re-analyzed the data of two previous studies that measured dynamic valence ratings and facial EMG of the corrugator supercilii and zygomatic major muscles from 50 participants who viewed emotional film clips. We employed multilinear regression analyses and two nonlinear machine learning (ML) models: random forest and long short-term memory. In cross-validation, these ML models outperformed linear regression in terms of the mean squared error and correlation coefficient. Interpretation of the random forest model using the SHapley Additive exPlanation tool revealed nonlinear and interactive associations between several EMG features and subjective valence dynamics. These findings suggest that nonlinear ML models can better fit the relationship between subjective emotional valence dynamics and facial EMG than conventional linear models and highlight a nonlinear and complex relationship. The findings encourage emotion sensing using facial EMG and offer insight into the subjective–physiological association