10,749 research outputs found

    Learning to Extract Motion from Videos in Convolutional Neural Networks

    Full text link
    This paper shows how to extract dense optical flow from videos with a convolutional neural network (CNN). The proposed model constitutes a potential building block for deeper architectures to allow using motion without resorting to an external algorithm, \eg for recognition in videos. We derive our network architecture from signal processing principles to provide desired invariances to image contrast, phase and texture. We constrain weights within the network to enforce strict rotation invariance and substantially reduce the number of parameters to learn. We demonstrate end-to-end training on only 8 sequences of the Middlebury dataset, orders of magnitude less than competing CNN-based motion estimation methods, and obtain comparable performance to classical methods on the Middlebury benchmark. Importantly, our method outputs a distributed representation of motion that allows representing multiple, transparent motions, and dynamic textures. Our contributions on network design and rotation invariance offer insights nonspecific to motion estimation

    XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera

    Full text link
    We present a real-time approach for multi-person 3D motion capture at over 30 fps using a single RGB camera. It operates successfully in generic scenes which may contain occlusions by objects and by other people. Our method operates in subsequent stages. The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals.We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy. In the second stage, a fully connected neural network turns the possibly partial (on account of occlusion) 2Dpose and 3Dpose features for each subject into a complete 3Dpose estimate per individual. The third stage applies space-time skeletal model fitting to the predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose, and enforce temporal coherence. Our method returns the full skeletal pose in joint angles for each subject. This is a further key distinction from previous work that do not produce joint angle results of a coherent skeleton in real time for multi-person scenes. The proposed system runs on consumer hardware at a previously unseen speed of more than 30 fps given 512x320 images as input while achieving state-of-the-art accuracy, which we will demonstrate on a range of challenging real-world scenes.Comment: To appear in ACM Transactions on Graphics (SIGGRAPH) 202

    RGB-D datasets using microsoft kinect or similar sensors: a survey

    Get PDF
    RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms

    XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera

    No full text
    We present a real-time approach for multi-person 3D motion capture at over 30 fps using a single RGB camera. It operates in generic scenes and is robust to difficult occlusions both by other people and objects. Our method operates in subsequent stages. The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals. We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy. In the second stage, a fully-connected neural network turns the possibly partial (on account of occlusion) 2D pose and 3D pose features for each subject into a complete 3D pose estimate per individual. The third stage applies space-time skeletal model fitting to the predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose, and enforce temporal coherence. Our method returns the full skeletal pose in joint angles for each subject. This is a further key distinction from previous work that neither extracted global body positions nor joint angle results of a coherent skeleton in real time for multi-person scenes. The proposed system runs on consumer hardware at a previously unseen speed of more than 30 fps given 512x320 images as input while achieving state-of-the-art accuracy, which we will demonstrate on a range of challenging real-world scenes

    Gait Recognition from Motion Capture Data

    Full text link
    Gait recognition from motion capture data, as a pattern classification discipline, can be improved by the use of machine learning. This paper contributes to the state-of-the-art with a statistical approach for extracting robust gait features directly from raw data by a modification of Linear Discriminant Analysis with Maximum Margin Criterion. Experiments on the CMU MoCap database show that the suggested method outperforms thirteen relevant methods based on geometric features and a method to learn the features by a combination of Principal Component Analysis and Linear Discriminant Analysis. The methods are evaluated in terms of the distribution of biometric templates in respective feature spaces expressed in a number of class separability coefficients and classification metrics. Results also indicate a high portability of learned features, that means, we can learn what aspects of walk people generally differ in and extract those as general gait features. Recognizing people without needing group-specific features is convenient as particular people might not always provide annotated learning data. As a contribution to reproducible research, our evaluation framework and database have been made publicly available. This research makes motion capture technology directly applicable for human recognition.Comment: Preprint. Full paper accepted at the ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), special issue on Representation, Analysis and Recognition of 3D Humans. 18 pages. arXiv admin note: substantial text overlap with arXiv:1701.00995, arXiv:1609.04392, arXiv:1609.0693

    Anti-spoofing using challenge-response user interaction

    Get PDF
    2D facial identification has attracted a great amount of attention over the past years, due to its several advantages including practicality and simple requirements. However, without its capability to recognize a real user from an impersonator, face identification system becomes ineffective and vulnerable to spoof attacks. With the great evolution of smart portable devices, more advanced sorts of attacks have been developed, especially the replayed videos spoofing attempts that are becoming more difficult to recognize. Consequently, several studies have investigated the types of vulnerabilities a face biometric system might encounter and proposed various successful anti-spoofing algorithms. Unlike spoofing detection for passive or motionless authentication methods that were profoundly studied, anti-spoofing systems applied on interactive user verification methods were broadly examined as a potential robust spoofing prevention approach. This study aims first at comparing the performance of the existing spoofing detection techniques on passive and interactive authentication methods using a more balanced collected dataset and second proposes a fusion scheme that combines both texture analysis with interaction in order to enhance the accuracy of spoofing detection

    ์‹ ์ฒด ์ž„๋ฒ ๋”ฉ์„ ํ™œ์šฉํ•œ ์˜คํ† ์ธ์ฝ”๋” ๊ธฐ๋ฐ˜ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ชจํ˜•์˜ ์„ฑ๋Šฅ ๊ฐœ์„ 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2021.8. ๋ฐ•์ข…ํ—Œ.Deep learning models have dominated the field of computer vision, achieving state-of-the-art performance in various tasks. In particular, with recent increases in images and videos of people being posted on social media, research on computer vision tasks for analyzing human visual information is being used in various ways. This thesis addresses classifying fashion styles and measuring motion similarity as two computer vision tasks related to humans. In real-world fashion style classification problems, the number of samples collected for each style class varies according to the fashion trend at the time of data collection, resulting in class imbalance. In this thesis, to cope with this class imbalance problem, generalized few-shot learning, in which both minority classes and majority classes are used for learning and evaluation, is employed. Additionally, the modalities of the foreground images, cropped to show only the body and fashion item parts, and the fashion attribute information are reflected in the fashion image embedding through a variational autoencoder. The K-fashion dataset collected from a Korean fashion shopping mall is used for the model training and evaluation. Motion similarity measurement is used as a sub-module in various tasks such as action recognition, anomaly detection, and person re-identification; however, it has attracted less attention than the other tasks because the same motion can be represented differently depending on the performer's body structure and camera angle. The lack of public datasets for model training and evaluation also makes research challenging. Therefore, we propose an artificial dataset for model training, with motion embeddings separated from the body structure and camera angle attributes for training using an autoencoder architecture. The autoencoder is designed to generate motion embeddings for each body part to measure motion similarity by body part. Furthermore, motion speed is synchronized by matching patches performing similar motions using dynamic time warping. The similarity score dataset for evaluation was collected through a crowdsourcing platform utilizing videos of NTU RGB+D 120, a dataset for action recognition. When the proposed models were verified with each evaluation dataset, both outperformed the baselines. In the fashion style classification problem, the proposed model showed the most balanced performance, without bias toward either the minority classes or the majority classes, among all the models. In addition, In the motion similarity measurement experiments, the correlation coefficient of the proposed model to the human-measured similarity score was higher than that of the baselines.์ปดํ“จํ„ฐ ๋น„์ „์€ ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ์ด ๊ฐ•์ ์„ ๋ณด์ด๋Š” ๋ถ„์•ผ๋กœ, ๋‹ค์–‘ํ•œ ํƒœ์Šคํฌ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ณ  ์žˆ๋‹ค. ํŠนํžˆ, ์‚ฌ๋žŒ์ด ํฌํ•จ๋œ ์ด๋ฏธ์ง€๋‚˜ ๋™์˜์ƒ์„ ๋”ฅ๋Ÿฌ๋‹์„ ํ†ตํ•ด ๋ถ„์„ํ•˜๋Š” ํƒœ์Šคํฌ์˜ ๊ฒฝ์šฐ, ์ตœ๊ทผ ์†Œ์…œ ๋ฏธ๋””์–ด์— ์‚ฌ๋žŒ์ด ํฌํ•จ๋œ ์ด๋ฏธ์ง€ ๋˜๋Š” ๋™์˜์ƒ ๊ฒŒ์‹œ๋ฌผ์ด ๋Š˜์–ด๋‚˜๋ฉด์„œ ๊ทธ ํ™œ์šฉ ๊ฐ€์น˜๊ฐ€ ๋†’์•„์ง€๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์‚ฌ๋žŒ๊ณผ ๊ด€๋ จ๋œ ์ปดํ“จํ„ฐ ๋น„์ „ ํƒœ์Šคํฌ ์ค‘ ํŒจ์…˜ ์Šคํƒ€์ผ ๋ถ„๋ฅ˜ ๋ฌธ์ œ์™€ ๋™์ž‘ ์œ ์‚ฌ๋„ ์ธก์ •์— ๋Œ€ํ•ด ๋‹ค๋ฃฌ๋‹ค. ํŒจ์…˜ ์Šคํƒ€์ผ ๋ถ„๋ฅ˜ ๋ฌธ์ œ์˜ ๊ฒฝ์šฐ, ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์‹œ์ ์˜ ํŒจ์…˜ ์œ ํ–‰์— ๋”ฐ๋ผ ์Šคํƒ€์ผ ํด๋ž˜์Šค๋ณ„ ์ˆ˜์ง‘๋˜๋Š” ์ƒ˜ํ”Œ์˜ ์–‘์ด ๋‹ฌ๋ผ์ง€๊ธฐ ๋•Œ๋ฌธ์— ์ด๋กœ๋ถ€ํ„ฐ ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜•์ด ๋ฐœ์ƒํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ์— ๋Œ€์ฒ˜ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ์†Œ์ˆ˜ ์ƒ˜ํ”Œ ํด๋ž˜์Šค์™€ ๋‹ค์ˆ˜ ์ƒ˜ํ”Œ ํด๋ž˜์Šค๋ฅผ ํ•™์Šต ๋ฐ ํ‰๊ฐ€์— ๋ชจ๋‘ ์‚ฌ์šฉํ•˜๋Š” ์ผ๋ฐ˜ํ™”๋œ ํ“จ์ƒท๋Ÿฌ๋‹์œผ๋กœ ํŒจ์…˜ ์Šคํƒ€์ผ ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ฅผ ์„ค์ •ํ•˜์˜€๋‹ค. ๋˜ํ•œ ๋ณ€๋ถ„ ์˜คํ† ์ธ์ฝ”๋” ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์„ ํ†ตํ•ด, ์‹ ์ฒด ๋ฐ ํŒจ์…˜ ์•„์ดํ…œ ๋ถ€๋ถ„๋งŒ ์ž˜๋ผ๋‚ธ ์ „๊ฒฝ ์ด๋ฏธ์ง€ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ์™€ ํŒจ์…˜ ์†์„ฑ ์ •๋ณด ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ๊ฐ€ ํŒจ์…˜ ์ด๋ฏธ์ง€์˜ ์ž„๋ฒ ๋”ฉ ํ•™์Šต์— ๋ฐ˜์˜๋˜๋„๋ก ํ•˜์˜€๋‹ค. ํ•™์Šต ๋ฐ ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ๋Š” ํ•œ๊ตญ ํŒจ์…˜ ์‡ผํ•‘๋ชฐ์—์„œ ์ˆ˜์ง‘๋œ K-fashion ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ํ•œํŽธ, ๋™์ž‘ ์œ ์‚ฌ๋„ ์ธก์ •์€ ํ–‰์œ„ ์ธ์‹, ์ด์ƒ ๋™์ž‘ ๊ฐ์ง€, ์‚ฌ๋žŒ ์žฌ์ธ์‹ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์˜ ํ•˜์œ„ ๋ชจ๋“ˆ๋กœ ํ™œ์šฉ๋˜๊ณ  ์žˆ์ง€๋งŒ ๊ทธ ์ž์ฒด๊ฐ€ ์—ฐ๊ตฌ๋œ ์ ์€ ๋งŽ์ง€ ์•Š์€๋ฐ, ์ด๋Š” ๊ฐ™์€ ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•˜๋”๋ผ๋„ ์‹ ์ฒด ๊ตฌ์กฐ ๋ฐ ์นด๋ฉ”๋ผ ๊ฐ๋„์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒŒ ํ‘œํ˜„๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์œผ๋กœ ๋ถ€ํ„ฐ ๊ธฐ์ธํ•œ๋‹ค. ํ•™์Šต ๋ฐ ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์…‹์ด ๋งŽ์ง€ ์•Š๋‹ค๋Š” ์  ๋˜ํ•œ ์—ฐ๊ตฌ๋ฅผ ์–ด๋ ต๊ฒŒ ํ•˜๋Š” ์š”์ธ์ด๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํ•™์Šต์„ ์œ„ํ•œ ์ธ๊ณต ๋ฐ์ดํ„ฐ์…‹์„ ์ˆ˜์ง‘ํ•˜์—ฌ ์˜คํ† ์ธ์ฝ”๋” ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ์‹ ์ฒด ๊ตฌ์กฐ ๋ฐ ์นด๋ฉ”๋ผ ๊ฐ๋„ ์š”์†Œ๊ฐ€ ๋ถ„๋ฆฌ๋œ ๋™์ž‘ ์ž„๋ฒ ๋”ฉ์„ ํ•™์Šตํ•˜์˜€๋‹ค. ์ด๋•Œ, ๊ฐ ์‹ ์ฒด ๋ถ€์œ„๋ณ„๋กœ ๋™์ž‘ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋กํ•˜์—ฌ ์‹ ์ฒด ๋ถ€์œ„๋ณ„๋กœ ๋™์ž‘ ์œ ์‚ฌ๋„ ์ธก์ •์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ํ•˜์˜€๋‹ค. ๋‘ ๋™์ž‘ ์‚ฌ์ด์˜ ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•  ๋•Œ์—๋Š” ๋™์  ์‹œ๊ฐ„ ์›Œํ•‘ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉ, ๋น„์Šทํ•œ ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ตฌ๊ฐ„๋ผ๋ฆฌ ์ •๋ ฌ์‹œ์ผœ ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•˜๋„๋ก ํ•จ์œผ๋กœ์จ, ๋™์ž‘ ์ˆ˜ํ–‰ ์†๋„์˜ ์ฐจ์ด๋ฅผ ๋ณด์ •ํ•˜์˜€๋‹ค. ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ์œ ์‚ฌ๋„ ์ ์ˆ˜ ๋ฐ์ดํ„ฐ์…‹์€ ํ–‰์œ„ ์ธ์‹ ๋ฐ์ดํ„ฐ์…‹์ธ NTU-RGB+D 120์˜ ์˜์ƒ์„ ํ™œ์šฉํ•˜์—ฌ ํฌ๋ผ์šฐ๋“œ ์†Œ์‹ฑ ํ”Œ๋žซํผ์„ ํ†ตํ•ด ์ˆ˜์ง‘๋˜์—ˆ๋‹ค. ๋‘ ๊ฐ€์ง€ ํƒœ์Šคํฌ์˜ ์ œ์•ˆ ๋ชจ๋ธ์„ ๊ฐ๊ฐ์˜ ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๊ฒ€์ฆํ•œ ๊ฒฐ๊ณผ, ๋ชจ๋‘ ๋น„๊ต ๋ชจ๋ธ ๋Œ€๋น„ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๊ธฐ๋กํ•˜์˜€๋‹ค. ํŒจ์…˜ ์Šคํƒ€์ผ ๋ถ„๋ฅ˜ ๋ฌธ์ œ์˜ ๊ฒฝ์šฐ, ๋ชจ๋“  ๋น„๊ต๊ตฐ์—์„œ ์†Œ์ˆ˜ ์ƒ˜ํ”Œ ํด๋ž˜์Šค์™€ ๋‹ค์ˆ˜ ์ƒ˜ํ”Œ ํด๋ž˜์Šค ์ค‘ ํ•œ ์ชฝ์œผ๋กœ ์น˜์šฐ์น˜์ง€ ์•Š๋Š” ๊ฐ€์žฅ ๊ท ํ˜•์žกํžŒ ์ถ”๋ก  ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ๊ณ , ๋™์ž‘ ์œ ์‚ฌ๋„ ์ธก์ •์˜ ๊ฒฝ์šฐ ์‚ฌ๋žŒ์ด ์ธก์ •ํ•œ ์œ ์‚ฌ๋„ ์ ์ˆ˜์™€ ์ƒ๊ด€๊ณ„์ˆ˜์—์„œ ๋น„๊ต ๋ชจ๋ธ ๋Œ€๋น„ ๋” ๋†’์€ ์ˆ˜์น˜๋ฅผ ๋‚˜ํƒ€๋‚ด์—ˆ๋‹ค.Chapter 1 Introduction 1 1.1 Background and motivation 1 1.2 Research contribution 5 1.2.1 Fashion style classication 5 1.2.2 Human motion similarity 9 1.2.3 Summary of the contributions 11 1.3 Thesis outline 13 Chapter 2 Literature Review 14 2.1 Fashion style classication 14 2.1.1 Machine learning and deep learning-based approaches 14 2.1.2 Class imbalance 15 2.1.3 Variational autoencoder 17 2.2 Human motion similarity 19 2.2.1 Measuring the similarity between two people 19 2.2.2 Human body embedding 20 2.2.3 Datasets for measuring the similarity 20 2.2.4 Triplet and quadruplet losses 21 2.2.5 Dynamic time warping 22 Chapter 3 Fashion Style Classication 24 3.1 Dataset for fashion style classication: K-fashion 24 3.2 Multimodal variational inference for fashion style classication 28 3.2.1 CADA-VAE 31 3.2.2 Generating multimodal features 33 3.2.3 Classier training with cyclic oversampling 36 3.3 Experimental results for fashion style classication 38 3.3.1 Implementation details 38 3.3.2 Settings for experiments 42 3.3.3 Experimental results on K-fashion 44 3.3.4 Qualitative analysis 48 3.3.5 Eectiveness of the cyclic oversampling 50 Chapter 4 Motion Similarity Measurement 53 4.1 Datasets for motion similarity 53 4.1.1 Synthetic motion dataset: SARA dataset 53 4.1.2 NTU RGB+D 120 similarity annotations 55 4.2 Framework for measuring motion similarity 58 4.2.1 Body part embedding model 58 4.2.2 Measuring motion similarity 67 4.3 Experimental results for measuring motion similarity 68 4.3.1 Implementation details 68 4.3.2 Experimental results on NTU RGB+D 120 similarity annotations 72 4.3.3 Visualization of motion latent clusters 78 4.4 Application 81 4.4.1 Real-world application with dancing videos 81 4.4.2 Tuning similarity scores to match human perception 87 Chapter 5 Conclusions 89 5.1 Summary and contributions 89 5.2 Limitations and future research 91 Appendices 93 Chapter A NTU RGB+D 120 Similarity Annotations 94 A.1 Data collection 94 A.2 AMT score analysis 95 Chapter B Data Cleansing of NTU RGB+D 120 Skeletal Data 100 Chapter C Motion Sequence Generation Using Mixamo 102 Bibliography 104 ๊ตญ๋ฌธ์ดˆ๋ก 123๋ฐ•

    Learning and Acting in Peripersonal Space: Moving, Reaching, and Grasping

    Get PDF
    The young infant explores its body, its sensorimotor system, and the immediately accessible parts of its environment, over the course of a few months creating a model of peripersonal space useful for reaching and grasping objects around it. Drawing on constraints from the empirical literature on infant behavior, we present a preliminary computational model of this learning process, implemented and evaluated on a physical robot. The learning agent explores the relationship between the configuration space of the arm, sensing joint angles through proprioception, and its visual perceptions of the hand and grippers. The resulting knowledge is represented as the peripersonal space (PPS) graph, where nodes represent states of the arm, edges represent safe movements, and paths represent safe trajectories from one pose to another. In our model, the learning process is driven by intrinsic motivation. When repeatedly performing an action, the agent learns the typical result, but also detects unusual outcomes, and is motivated to learn how to make those unusual results reliable. Arm motions typically leave the static background unchanged, but occasionally bump an object, changing its static position. The reach action is learned as a reliable way to bump and move an object in the environment. Similarly, once a reliable reach action is learned, it typically makes a quasi-static change in the environment, moving an object from one static position to another. The unusual outcome is that the object is accidentally grasped (thanks to the innate Palmar reflex), and thereafter moves dynamically with the hand. Learning to make grasps reliable is more complex than for reaches, but we demonstrate significant progress. Our current results are steps toward autonomous sensorimotor learning of motion, reaching, and grasping in peripersonal space, based on unguided exploration and intrinsic motivation.Comment: 35 pages, 13 figure

    On the evolution of flow topology in turbulent Rayleigh-Bรฉnard convection

    Get PDF
    Copyright 2016 AIP Publishing. This article may be downloaded for personal use only. Any other use requires prior permission of the author and AIP Publishing.Small-scale dynamics is the spirit of turbulence physics. It implicates many attributes of flow topology evolution, coherent structures, hairpin vorticity dynamics, and mechanism of the kinetic energy cascade. In this work, several dynamical aspects of the small-scale motions have been numerically studied in a framework of Rayleigh-Benard convection (RBC). To do so, direct numerical simulations have been carried out at two Rayleigh numbers Ra = 10(8) and 10(10), inside an air-filled rectangular cell of aspect ratio unity and pi span-wise open-ended distance. As a main feature, the average rate of the invariants of the velocity gradient tensor (Q(G), R-G) has displayed the so-calledPeer ReviewedPostprint (author's final draft

    Automatic visual detection of human behavior: a review from 2000 to 2014

    Get PDF
    Due to advances in information technology (e.g., digital video cameras, ubiquitous sensors), the automatic detection of human behaviors from video is a very recent research topic. In this paper, we perform a systematic and recent literature review on this topic, from 2000 to 2014, covering a selection of 193 papers that were searched from six major scientific publishers. The selected papers were classified into three main subjects: detection techniques, datasets and applications. The detection techniques were divided into four categories (initialization, tracking, pose estimation and recognition). The list of datasets includes eight examples (e.g., Hollywood action). Finally, several application areas were identified, including human detection, abnormal activity detection, action recognition, player modeling and pedestrian detection. Our analysis provides a road map to guide future research for designing automatic visual human behavior detection systems.This work is funded by the Portuguese Foundation for Science and Technology (FCT - Fundacao para a Ciencia e a Tecnologia) under research Grant SFRH/BD/84939/2012
    • โ€ฆ
    corecore