3,708 research outputs found
Relaxed Spatio-Temporal Deep Feature Aggregation for Real-Fake Expression Prediction
Frame-level visual features are generally aggregated in time with the
techniques such as LSTM, Fisher Vectors, NetVLAD etc. to produce a robust
video-level representation. We here introduce a learnable aggregation technique
whose primary objective is to retain short-time temporal structure between
frame-level features and their spatial interdependencies in the representation.
Also, it can be easily adapted to the cases where there have very scarce
training samples. We evaluate the method on a real-fake expression prediction
dataset to demonstrate its superiority. Our method obtains 65% score on the
test dataset in the official MAP evaluation and there is only one misclassified
decision with the best reported result in the Chalearn Challenge (i.e. 66:7%) .
Lastly, we believe that this method can be extended to different problems such
as action/event recognition in future.Comment: Submitted to International Conference on Computer Vision Workshop
Fusing deep learned and hand-crafted features of appearance, shape, and dynamics for automatic pain estimation
Automatic continuous time, continuous value assessment of a patient's pain from face video is highly sought after by the medical profession. Despite the recent advances in deep learning that attain impressive results in many domains, pain estimation risks not being able to benefit from this due to the difficulty in obtaining data sets of considerable size. In this work we propose a combination of hand-crafted and deep-learned features that makes the most of deep learning techniques in small sample settings. Encoding shape, appearance, and dynamics, our method significantly outperforms the current state of the art, attaining a RMSE error of less than 1 point on a 16-level pain scale, whilst simultaneously scoring a 67.3% Pearson correlation coefficient between our predicted pain level time series and the ground truth
- …