4,048 research outputs found

    μΌλ°˜ν™”λœ 4차원 λ™μž‘ νŠΉμ§•μ„ μ΄μš©ν•œ μ‹œμ„ κ°μ— λ¬΄κ΄€ν•œ 행동 인식

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : 전기·컴퓨터곡학뢀, 2014. 8. μ΅œμ§„μ˜.λ³Έ 논문은 μΌλ°˜ν™”λœ 4차원 [x,y,z,t] λ™μž‘ νŠΉμ§•μ„ μ΄μš©ν•˜μ—¬ μ‹œμ„ κ°μ— λ¬΄κ΄€ν•œ 행동 및 행동 λ°©ν–₯ 인식 문제λ₯Ό ν•΄κ²°ν•˜λŠ” 것을 λͺ©μ μœΌλ‘œ μˆ˜ν–‰λ˜μ—ˆλ‹€. 기쑴의 행동 인식 μ—°κ΅¬λŠ” 주둜 μΉ΄λ©”λΌμ˜ μœ„μΉ˜λŠ” κ³ μ •λ˜μ–΄μžˆκ³  μ‚¬λžŒλ“€μ€ 카메라λ₯Ό 바라보고 μ„œμžˆλŠ” 상황을 κ°€μ •ν•˜μ˜€λ‹€. κ·ΈλŸ¬λ‚˜ μ‹€μ œ λΉ„λ””μ˜€λ‚˜ κ°μ‹œμΉ΄λ©”λΌμ— λ“±μž₯ν•˜λŠ” μ‚¬λžŒλ“€μ€ 카메라λ₯Ό μ˜μ‹ν•˜μ§€ μ•Šκ³  μžμ—°μŠ€λŸ½κ²Œ ν–‰λ™ν•˜κΈ° λ•Œλ¬Έμ— μ œν•œλœ 쑰건, ν™˜κ²½μ—μ„œ 행동을 μΈμ‹ν•˜λŠ” 것과 달리, μΉ΄λ©”λΌμ˜ μœ„μΉ˜μ™€ μ‚¬λžŒμ˜ λ°©ν–₯에 λ”°λΌμ„œ λ‹€μ–‘ν•œ μ‹œμ„ κ°μ—μ„œ μ˜μƒμ΄ 촬영될 수 μžˆλ‹€. λ”°λΌμ„œ μ‹€μ œ μ–΄ν”Œλ¦¬μΌ€μ΄μ…˜μ— μ μš©ν•˜κΈ° μœ„ν•΄μ„œλŠ” λ¬΄μž‘μœ„μ˜ μ‹œμ„ κ°μ—μ„œ μ˜μƒμ΄ 듀어왔을 λ•Œ 행동 인식을 ν•˜λŠ” 것이 ν•„μˆ˜μ μ΄λ©°, μ–΄λ–€ λ°©ν–₯으둜 ν–‰λ™ν•˜λŠ” 지 μ•Œ 수 μžˆλ‹€λ©΄ λˆ„κ΅¬μ™€ μƒν˜Έμž‘μš©μ„ ν•˜λŠ” 지 μ•„λŠ”λ° 도움을 쀄 수 μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” λͺ‡ 개의 λ‹€λ₯Έ μ‹œμ„ κ°μ—μ„œ 찍힌 μ˜μƒμ„ μ΄μš©ν•˜μ—¬ 3차원 [x,y,z] μž…μ²΄λ₯Ό λ³΅μ›ν•˜κ³ , μ—°μ†λœ 3차원 μž…μ²΄μ—μ„œ 4차원 μ‹œκ³΅κ°„ νŠΉμ§•μ μ„ κ΅¬ν•˜λŠ” 방법을 μ œμ•ˆν•˜μ—¬ μ‹œμ„ κ°μ— λ¬΄κ΄€ν•œ 행동 및 행동 λ°©ν–₯ 인식을 μˆ˜ν–‰ν•˜μ˜€λ‹€. 3차원 μž…μ²΄ 및 μ—°μ†λœ 3차원 μž…μ²΄μ—μ„œ κ΅¬ν•œ 4차원 μ‹œκ³΅κ°„ νŠΉμ§•μ μ€ λͺ¨λ“  μ‹œμ„ κ°μ—μ„œμ˜ 정보λ₯Ό κ°–κ³  μžˆμœΌλ―€λ‘œ, μ›ν•˜λŠ” μ‹œμ„ κ°μœΌλ‘œ μ‚¬μ˜μ„ ν•˜μ—¬ 각 μ‹œμ„ κ°μ—μ„œμ˜ νŠΉμ§•μ„ 얻을 수 μžˆλ‹€. μ‚¬μ˜λœ 싀루엣과 4차원 μ‹œκ³΅κ°„ νŠΉμ§•μ μ˜ μœ„μΉ˜λ₯Ό λ°”νƒ•μœΌλ‘œ 각각 μ›€μ§μ΄λŠ” λΆ€λΆ„κ³Ό 움직이지 μ•ŠλŠ” 뢀뢄에 λŒ€ν•œ 정보λ₯Ό ν¬ν•¨ν•˜λŠ” motion history images (MHIs)와 non motion history images (NMHIs) λ₯Ό λ§Œλ“€μ–΄ 행동 인식을 μœ„ν•œ νŠΉμ§•μœΌλ‘œ μ‚¬μš©μ„ ν•˜μ˜€λ‹€. MHIsλ§ŒμœΌλ‘œλŠ” 행동 μ‹œ μ›€μ§μ΄λŠ” 뢀뢄이 λΉ„μŠ·ν•œ νŒ¨ν„΄μ„ 보일 λ•Œ 쒋은 μ„±λŠ₯을 보μž₯ν•  수 μ—†κ³  λ”°λΌμ„œ 행동 μ‹œ 움직이지 μ•ŠλŠ” 뢀뢄에 λŒ€ν•œ 정보λ₯Ό 쀄 수 μžˆλŠ” NMHIsλ₯Ό μ œμ•ˆν•˜μ˜€λ‹€. 행동 인식을 μœ„ν•œ ν•™μŠ΅ λ‹¨κ³„μ—μ„œ MHIs와 NMHIsλŠ” 클래슀λ₯Ό κ³ λ €ν•œ 차원 μΆ•μ†Œ μ•Œκ³ λ¦¬μ¦˜μΈ class-augmented principal component analysis (CA-PCA)λ₯Ό ν†΅ν•΄μ„œ 차원이 μΆ•μ†Œλ˜λ©°, 이 λ•Œ 행동 라벨을 μ΄μš©ν•˜μ—¬ 차원을 μΆ•μ†Œν•˜λ―€λ‘œ 각 행동이 잘 뢄리가 λ˜λ„λ‘ν•˜λŠ” principal axisλ₯Ό 찾을 수 μžˆλ‹€. 차원이 μΆ•μ†Œλœ MHIs와 NMHIsλŠ” support vector data description (SVDD) λ°©λ²•μœΌλ‘œ ν•™μŠ΅λ˜κ³ , support vector domain density description (SVDDD)λ₯Ό μ΄μš©ν•˜μ—¬ μΈμ‹λœλ‹€. 행동 λ°©ν–₯을 ν•™μŠ΅ν• λ•Œμ—λŠ” 각 행동에 λŒ€ν•΄ λ°©ν–₯ 라벨을 μ‚¬μš©ν•˜μ—¬ principal axisλ₯Ό κ΅¬ν•˜λ©°, λ§ˆμ°¬κ°€μ§€λ‘œ SVDD둜 ν•™μŠ΅μ„ ν•˜κ³  SVDDDλ₯Ό μ΄μš©ν•˜μ—¬ μΈμ‹λœλ‹€. μ œμ•ˆλœ 4차원 μ‹œκ³΅κ°„ νŠΉμ§•μ μ€ μ‹œμ„ κ°μ— λ¬΄κ΄€ν•œ 행동 및 행동 λ°©ν–₯ 인식에 μ‚¬μš©λ  수 있으며 μ‹€ν—˜μ„ 톡해 4차원 μ‹œκ³΅κ°„ νŠΉμ§•μ μ΄ 각 ν–‰λ™μ˜ νŠΉμ§•μ„ μ••μΆ•μ μœΌλ‘œ 잘 보여주고 μžˆμŒμ„ λ³΄μ˜€λ‹€. λ˜ν•œ μ‹€μ œ μ–΄ν”Œλ¦¬μΌ€μ΄μ…˜μ—μ„œμ²˜λŸΌ λ¬΄μž‘μœ„μ˜ μ‹œμ„ κ°μ—μ„œ μ˜μƒμ΄ 듀어왔을 경우λ₯Ό κ°€μ •ν•˜κΈ° μœ„ν•˜μ—¬ ν•™μŠ΅ 데이터셋과 μ „ν˜€ λ‹€λ₯Έ μƒˆλ‘œμš΄ 인식 데이터셋을 κ΅¬μΆ•ν•˜μ˜€λ‹€. 기쑴의 μ—¬λŸ¬ μ‹œμ„ κ°μ—μ„œ 촬영 된 IXMAS 행동 인식 데이터셋을 μ΄μš©ν•˜μ—¬ ν•™μŠ΅μ„ ν•˜κ³ , ν•™μŠ΅ 데이터셋과 λ‹€λ₯Έ μ‹œμ„ κ°μ—μ„œ μ΄¬μ˜ν•œ SNU λ°μ΄ν„°μ…‹μ—μ„œ 인식 μ‹€ν—˜μ„ ν•˜μ—¬ μ œμ•ˆν•œ μ•Œκ³ λ¦¬μ¦˜μ„ κ²€μ¦ν•˜μ˜€λ‹€. μ‹€ν—˜ κ²°κ³Ό μ œμ•ˆν•œ 방법은 ν•™μŠ΅μ„ μœ„ν•΄ μ΄¬μ˜ν•œ μ˜μƒμ— ν¬ν•¨λ˜μ§€ μ•ŠλŠ” μ‹œμ„ κ°μ—μ„œ ν…ŒμŠ€νŠΈ μ˜μƒμ΄ 듀어왔을 κ²½μš°μ—λ„ 쒋은 μ„±λŠ₯을 λ³΄μ΄λŠ” 것을 ν™•μΈν•˜μ˜€λ‹€. λ˜ν•œ 5개의 λ°©ν–₯으둜 촬영된 SNU 데이터셋을 μ΄μš©ν•˜μ—¬ 행동 λ°©ν–₯ 인식 μ‹€ν—˜μ„ ν•˜μ˜€μœΌλ©°, 쒋은 λ°©ν–₯ 인식λ₯ μ„ λ³΄μ΄λŠ” 것을 ν™•μΈν•˜μ˜€λ‹€. 행동 λ°©ν–₯ 인식을 ν†΅ν•΄μ„œ μ˜μƒ λ‚΄μ—μ„œ μ—¬λŸ¬ μ‚¬λžŒμ΄ λ“±μž₯ν•  λ•Œ λ‹€λ₯Έμ‚¬λžŒλ“€κ³Ό μ–΄λ–»κ²Œ μƒν˜Έ μž‘μš©μ„ ν•˜λŠ”μ§€ 정보λ₯Ό μ•Œ 수 있고, μ΄λŠ” μ˜μƒμ„ ν•΄μ„ν•˜λŠ”λ° 도움을 쀄 수 μžˆμ„ κ²ƒμœΌλ‘œ μƒκ°λœλ‹€.In this thesis, we propose a method to recognize human action and their orientation independently of viewpoints using generalized 4D [x,y,z,t] motion features. The conventional action recognition methods assume that the camera view is fixed and people are standing towards the cameras. However, in real life scenarios, the cameras are installed at various positions for their purposes and the orientation of people are chosen arbitrarily. Therefore, the images can be taken with various views according to the position of camera and the orientation of people. To recognize human action and their orientation under this difficult scenario, we focus on the view invariant action recognition method which can recognize the test videos from any arbitrary view. For this purpose, we propose a method to recognize human action and their orientation independently of viewpoints by developing 4D space-time interest points (4D-STIPs, [x,y,z,t]) using 3D space (3D-S, [x,y,z]) volumes reconstructed from images of a finite number of different views. Since the 3D-S volumes and the 4D-STIPs are constructed using volumetric information, the features for arbitrary 2D space (2D-S, [x,y]) viewpoint can be generated by projecting the 3D-S volumes and 4D-STIPs on corresponding test image planes. With these projected features, we construct motion history images (MHIs) and non-motion history images (NMHIs) which encode the moving and non-moving parts of an action respectively. Since MHIs cannot guarantee a good performance when moving parts of an object show similar patterns, we propose NMHIs and combine it with MHIs to add the information from stationary parts of an object in the description of the particular action class. To reduce the dimension of MHIs and NMHIs, we apply class-augmented principal component analysis (CA-PCA) which uses class information for dimension reduction. Since we use the action label for reducing the dimension of features, we obtain the principal axis which can separate each action well. After reducing the feature dimension, the final features are trained by support vector data description method (SVDD) and tested by support vector domain density description (SVDDD). As for the recognition of action orientation, the features are reduced the dimension using orientation label. Similarly, the reduced features are trained by SVDD and tested by SVDDD. The proposed 4D-STIPs can be applied to view invariant recognition of action and their orientation, and we verify that they represent the properties of each action compactly in experiments. To assume arbitrary test view as in real applications, we develop a new testing dataset which is totally different from the training dataset. We verify our algorithm by training action models using the multi-view IXMAS dataset and testing using SNU dataset. Experimental results show that the proposed method is more generalized and outperforms the state-of-the-art methods, especially when training the classifier with the information insufficient about the test views. As for the recognition of action orientation, we experiment with SNU dataset taken from 5 different orientations to verify recognition performance. The recognition of action orientation can be helpful in analyzing the video by providing the information about interactions of people.1 Introduction 1 1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contents of the research . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.1 Generalized 4D motion features . . . . . . . . . . . . . . . . 10 1.2.2 View invariant action recognition . . . . . . . . . . . . . . . 11 1.2.3 Recognition of action orientation . . . . . . . . . . . . . . . 12 2 Generalized 4D Motion Features 14 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1 Harris corner detector . . . . . . . . . . . . . . . . . . . . . 18 2.2.2 3D space-time interest points . . . . . . . . . . . . . . . . . 21 2.2.3 3D reconstruction . . . . . . . . . . . . . . . . . . . . . . . 23 2.3 Proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.1 Modified 3D space-time interest points . . . . . . . . . . . . 27 2.3.2 4D space-time interest points . . . . . . . . . . . . . . . . . 30 2.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.5 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3 View Invariant Action Recognition 40 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2.1 Motion history images . . . . . . . . . . . . . . . . . . . . . 45 3.2.2 Class-augmented principal component analysis . . . . . . . 47 3.2.3 Support vector data description . . . . . . . . . . . . . . . . 53 3.2.4 Support vector domain density description . . . . . . . . . . 56 3.3 Proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.3.1 Silhouettes . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3.2 Space-time interest points . . . . . . . . . . . . . . . . . . . 67 3.3.3 Motion history images and Non-motion history images . . . 69 3.3.4 Training and Testing . . . . . . . . . . . . . . . . . . . . . . 72 3.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.5 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4 Recognition of Action Orientation 87 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.2 Proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.2.1 Training and Testing . . . . . . . . . . . . . . . . . . . . . . 93 4.3 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.4 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5 Conclusions 100 Bibliography 103 Abstract in Korean 113Docto

    Multimodal Multipart Learning for Action Recognition in Depth Videos

    Full text link
    The articulated and complex nature of human actions makes the task of action recognition difficult. One approach to handle this complexity is dividing it to the kinetics of body parts and analyzing the actions based on these partial descriptors. We propose a joint sparse regression based learning method which utilizes the structured sparsity to model each action as a combination of multimodal features from a sparse set of body parts. To represent dynamics and appearance of parts, we employ a heterogeneous set of depth and skeleton based features. The proper structure of multimodal multipart features are formulated into the learning framework via the proposed hierarchical mixed norm, to regularize the structured features of each part and to apply sparsity between them, in favor of a group feature selection. Our experimental results expose the effectiveness of the proposed learning method in which it outperforms other methods in all three tested datasets while saturating one of them by achieving perfect accuracy

    Dual Projection and Selfduality in Three Dimensions

    Get PDF
    We discuss the notion of duality and selfduality in the context of the dual projection operation that creates an internal space of potentials. Contrary to the prevailing algebraic or group theoretical methods, this technique is applicable to both even and odd dimensions. The role of parity in the kernel of the Gauss law to determine the dimensional dependence is clarified. We derive the appropriate invariant actions, discuss the symmetry groups and their proper generators. In particular, the novel concept of duality symmetry and selfduality in Maxwell theory in (2+1) dimensions is analysed in details. The corresponding action is a 3D version of the familiar duality symmetric electromagnetic theory in 4D. Finally, the duality symmetric actions in the different dimensions constructed here manifest both the SO(2) and Z2Z_2 symmetries, contrary to conventional results.Comment: 20 pages, late

    Recognition of human activities and expressions in video sequences using shape context descriptor

    Get PDF
    The recognition of objects and classes of objects is of importance in the field of computer vision due to its applicability in areas such as video surveillance, medical imaging and retrieval of images and videos from large databases on the Internet. Effective recognition of object classes is still a challenge in vision; hence, there is much interest to improve the rate of recognition in order to keep up with the rising demands of the fields where these techniques are being applied. This thesis investigates the recognition of activities and expressions in video sequences using a new descriptor called the spatiotemporal shape context. The shape context is a well-known algorithm that describes the shape of an object based upon the mutual distribution of points in the contour of the object; however, it falls short when the distinctive property of an object is not just its shape but also its movement across frames in a video sequence. Since actions and expressions tend to have a motion component that enhances the capability of distinguishing them, the shape based information from the shape context proves insufficient. This thesis proposes new 3D and 4D spatiotemporal shape context descriptors that incorporate into the original shape context changes in motion across frames. Results of classification of actions and expressions demonstrate that the spatiotemporal shape context is better than the original shape context at enhancing recognition of classes in the activity and expression domains

    View-invariant action recognition

    Full text link
    Human action recognition is an important problem in computer vision. It has a wide range of applications in surveillance, human-computer interaction, augmented reality, video indexing, and retrieval. The varying pattern of spatio-temporal appearance generated by human action is key for identifying the performed action. We have seen a lot of research exploring this dynamics of spatio-temporal appearance for learning a visual representation of human actions. However, most of the research in action recognition is focused on some common viewpoints, and these approaches do not perform well when there is a change in viewpoint. Human actions are performed in a 3-dimensional environment and are projected to a 2-dimensional space when captured as a video from a given viewpoint. Therefore, an action will have a different spatio-temporal appearance from different viewpoints. The research in view-invariant action recognition addresses this problem and focuses on recognizing human actions from unseen viewpoints

    DeepHuMS: Deep Human Motion Signature for 3D Skeletal Sequences

    Full text link
    3D Human Motion Indexing and Retrieval is an interesting problem due to the rise of several data-driven applications aimed at analyzing and/or re-utilizing 3D human skeletal data, such as data-driven animation, analysis of sports bio-mechanics, human surveillance etc. Spatio-temporal articulations of humans, noisy/missing data, different speeds of the same motion etc. make it challenging and several of the existing state of the art methods use hand-craft features along with optimization based or histogram based comparison in order to perform retrieval. Further, they demonstrate it only for very small datasets and few classes. We make a case for using a learned representation that should recognize the motion as well as enforce a discriminative ranking. To that end, we propose, a 3D human motion descriptor learned using a deep network. Our learned embedding is generalizable and applicable to real-world data - addressing the aforementioned challenges and further enables sub-motion searching in its embedding space using another network. Our model exploits the inter-class similarity using trajectory cues, and performs far superior in a self-supervised setting. State of the art results on all these fronts is shown on two large scale 3D human motion datasets - NTU RGB+D and HDM05.Comment: Under Review, Conferenc
    • …
    corecore