340 research outputs found

    Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields

    Full text link
    This work presents a first evaluation of using spatio-temporal receptive fields from a recently proposed time-causal spatio-temporal scale-space framework as primitives for video analysis. We propose a new family of video descriptors based on regional statistics of spatio-temporal receptive field responses and evaluate this approach on the problem of dynamic texture recognition. Our approach generalises a previously used method, based on joint histograms of receptive field responses, from the spatial to the spatio-temporal domain and from object recognition to dynamic texture recognition. The time-recursive formulation enables computationally efficient time-causal recognition. The experimental evaluation demonstrates competitive performance compared to state-of-the-art. Especially, it is shown that binary versions of our dynamic texture descriptors achieve improved performance compared to a large range of similar methods using different primitives either handcrafted or learned from data. Further, our qualitative and quantitative investigation into parameter choices and the use of different sets of receptive fields highlights the robustness and flexibility of our approach. Together, these results support the descriptive power of this family of time-causal spatio-temporal receptive fields, validate our approach for dynamic texture recognition and point towards the possibility of designing a range of video analysis methods based on these new time-causal spatio-temporal primitives.Comment: 29 pages, 16 figure

    Convolutional Neural Network on Three Orthogonal Planes for Dynamic Texture Classification

    Get PDF
    Dynamic Textures (DTs) are sequences of images of moving scenes that exhibit certain stationarity properties in time such as smoke, vegetation and fire. The analysis of DT is important for recognition, segmentation, synthesis or retrieval for a range of applications including surveillance, medical imaging and remote sensing. Deep learning methods have shown impressive results and are now the new state of the art for a wide range of computer vision tasks including image and video recognition and segmentation. In particular, Convolutional Neural Networks (CNNs) have recently proven to be well suited for texture analysis with a design similar to a filter bank approach. In this paper, we develop a new approach to DT analysis based on a CNN method applied on three orthogonal planes x y , xt and y t . We train CNNs on spatial frames and temporal slices extracted from the DT sequences and combine their outputs to obtain a competitive DT classifier. Our results on a wide range of commonly used DT classification benchmark datasets prove the robustness of our approach. Significant improvement of the state of the art is shown on the larger datasets.Comment: 19 pages, 10 figure

    ΠΠ›Π“ΠžΠ Π˜Π’Πœ ΠΠΠΠ›Π˜Π—Π Π”Π˜ΠΠΠœΠ˜Π§Π•Π‘ΠšΠ˜Π₯ Π’Π•ΠšΠ‘Π’Π£Π 

    Get PDF
    Recognizing dynamic patterns based on visual processing is significant for many applications such as remote monitoring for the prevention of natural disasters, e.g. forest fires, various types of surveillance, e.g. traffic monitoring, background subtraction in challenging environments, e.g. outdoor scenes with vegetation, homeland security applications and scientific studies of animal behavior. In the context of surveillance, recognizing dynamic patterns is of significance to isolate activities of interest (e.g. fire) from distracting background (e.g. windblown vegetation and changes in scene illumination).Methods: pattern recognition, computer vision.Results: This paper presents video based image processing algorithm with samples usually containing a cluttered background. According to the spatiotemporal features, four categorized groups were formulated. Dynamic texture recognition algorithm refers image objects to one of this group. Motion, color, facial, energy Laws and ELBP features are extracted for dynamic texture categorization. Classification based on boosted random forest.Practical relevance: Experimental results show that the proposed method is feasible and effective for video-based dynamic texture categorization. Averaged classification accuracy on the all video images is 95.2%.ΠŸΠΎΡΡ‚Π°Π½ΠΎΠ²ΠΊΠ° ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΡ‹: ΠžΠ±Π½Π°Ρ€ΡƒΠΆΠ΅Π½ΠΈΠ΅ динамичСских тСкстур Π½Π° видСоизобраТСниях Π² настоящСС врСмя Π½Π°Ρ…ΠΎΠ΄ΠΈΡ‚ всС Π±ΠΎΠ»Π΅Π΅ ΡˆΠΈΡ€ΠΎΠΊΠΎΠ΅ ΠΏΡ€ΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ Π² систСмах ΠΊΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€Π½ΠΎΠ³ΠΎ зрСния. НапримСр, ΠΎΠ±Π½Π°Ρ€ΡƒΠΆΠ΅Π½ΠΈΠ΅ Π΄Ρ‹ΠΌΠ° ΠΈ ΠΏΠ»Π°ΠΌΠ΅Π½ΠΈ Π² систСмах экологичСского ΠΌΠΎΠ½ΠΈΡ‚ΠΎΡ€ΠΈΠ½Π³Π°, Π°Π½Π°Π»ΠΈΠ· Π°Π²Ρ‚ΠΎΠΌΠΎΠ±ΠΈΠ»ΡŒΠ½ΠΎΠ³ΠΎ Ρ‚Ρ€Π°Ρ„ΠΈΠΊΠ° ΠΏΡ€ΠΈ ΠΌΠΎΠ½ΠΈΡ‚ΠΎΡ€ΠΈΠ½Π³Π΅ загруТСнности Π΄ΠΎΡ€ΠΎΠ³, ΠΈ Π² Π΄Ρ€ΡƒΠ³ΠΈΡ… систСмах. Поиск ΠΎΠ±ΡŠΠ΅ΠΊΡ‚Π° интСрСса Π½Π° динамичСском Ρ„ΠΎΠ½Π΅ часто Π±Ρ‹Π²Π°Π΅Ρ‚ Π·Π°Ρ‚Ρ€ΡƒΠ΄Π½Π΅Π½ Π·Π° счСт ΠΏΠΎΡ…ΠΎΠΆΠΈΡ… тСкстурных ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ² ΠΈΠ»ΠΈ ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ² двиТСния Ρƒ Ρ„ΠΎΠ½Π° ΠΈ искомого ΠΎΠ±ΡŠΠ΅ΠΊΡ‚Π°. Π’ связи с этим Π²ΠΎΠ·Π½ΠΈΠΊΠ°Π΅Ρ‚ Π½Π΅ΠΎΠ±Ρ…ΠΎΠ΄ΠΈΠΌΠΎΡΡ‚ΡŒ Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ° классификации динамичСских тСкстур для выдСлСния ΠΎΠ±ΡŠΠ΅ΠΊΡ‚ΠΎΠ² интСрСса Π½Π° динамичСском Ρ„ΠΎΠ½Π΅.ΠœΠ΅Ρ‚ΠΎΠ΄Ρ‹: распознаваниС ΠΎΠ±Ρ€Π°Π·ΠΎΠ², ΠΊΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€Π½ΠΎΠ΅ Π·Ρ€Π΅Π½ΠΈΠ΅.Π Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚Ρ‹: Π’ Π΄Π°Π½Π½ΠΎΠΉ Ρ€Π°Π±ΠΎΡ‚Π΅ рассматриваСтся ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠ° Π²ΠΈΠ΄Π΅ΠΎΠΈΠ·ΠΎΠ±Ρ€Π°ΠΆΠ΅Π½ΠΈΠΉ содСрТащих ΠΎΠ±ΡŠΠ΅ΠΊΡ‚Ρ‹ с динамичСским ΠΏΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅ΠΌ Π½Π° динамичСском Ρ„ΠΎΠ½Π΅, Ρ‚Π°ΠΊΠΈΠ΅ ΠΊΠ°ΠΊ Π²ΠΎΠ΄Π°, Ρ‚ΡƒΠΌΠ°Π½, пламя, Ρ‚Π΅ΠΊΡΡ‚ΠΈΠ»ΡŒ Π½Π° Π²Π΅Ρ‚Ρ€Ρƒ ΠΈ Π΄Ρ€. Π Π°Π·Ρ€Π°Π±ΠΎΡ‚Π°Π½ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌ отнСсСния ΠΎΠ±ΡŠΠ΅ΠΊΡ‚ΠΎΠ² видСоизобраТСния ΠΊ ΠΎΠ΄Π½ΠΎΠΉ ΠΈΠ· Ρ‡Π΅Ρ‚Ρ‹Ρ€Π΅Ρ… ΠΏΡ€Π΅Π΄Π»Π°Π³Π°Π΅ΠΌΡ‹Ρ… ΠΊΠ°Ρ‚Π΅Π³ΠΎΡ€ΠΈΠΉ. Π˜Π·Π²Π»Π΅ΠΊΠ°ΡŽΡ‚ΡΡ ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΈ двиТСния, Ρ†Π²Π΅Ρ‚ΠΎΠ²Ρ‹Π΅ особСнности, Ρ„Ρ€Π°ΠΊΡ‚Π°Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ, энСргСтичСскиС ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΈ Ласа, строятся ELBP-гистограммы. Π’ качСствС классификатора использован бустинговый случайный лСс.ΠŸΡ€Π°ΠΊΡ‚ΠΈΡ‡Π΅ΡΠΊΠ°Ρ Π·Π½Π°Ρ‡ΠΈΠΌΠΎΡΡ‚ΡŒ: Π Π°Π·Ρ€Π°Π±ΠΎΡ‚Π°Π½ ΠΌΠ΅Ρ‚ΠΎΠ΄, ΠΏΠΎΠ·Π²ΠΎΠ»ΡΡŽΡ‰ΠΈΠΉ Ρ€Π°Π·Π΄Π΅Π»ΠΈΡ‚ΡŒ динамичСскиС тСкстур Π½Π° ΠΊΠ°Ρ‚Π΅Π³ΠΎΡ€ΠΈΠΈ: ΠΏΠΎ Ρ‚ΠΈΠΏΡƒ двиТСния (пСриодичСскоС ΠΈ Ρ…Π°ΠΎΡ‚ΠΈΡ‡Π½ΠΎΠ΅) ΠΈ Ρ‚ΠΈΠΏΡƒ ΠΎΠ±ΡŠΠ΅ΠΊΡ‚ΠΎΠ² интСрСса (ΠΏΡ€ΠΈΡ€ΠΎΠ΄Π½Ρ‹Π΅ ΠΈ искусствСнныС). Π­ΠΊΡΠΏΠ΅Ρ€ΠΈΠΌΠ΅Π½Ρ‚Π°Π»ΡŒΠ½Ρ‹Π΅ исслСдования ΠΏΠΎΠ΄Ρ‚Π²Π΅Ρ€ΠΆΠ΄Π°ΡŽΡ‚ ΡΡ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½ΠΎΡΡ‚ΡŒ ΠΏΡ€Π΅Π΄Π»ΠΎΠΆΠ΅Π½Π½ΠΎΠ³ΠΎ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ° для отнСсСния ΠΎΠ±ΡŠΠ΅ΠΊΡ‚ΠΎΠ² изобраТСния ΠΊ Ρ‚ΠΎΠΉ ΠΈΠ»ΠΈ ΠΈΠ½ΠΎΠΉ ΠΊΠ°Ρ‚Π΅Π³ΠΎΡ€ΠΈΠΈ. БрСдняя Ρ‚ΠΎΡ‡Π½ΠΎΡΡ‚ΡŒ классификации составила 95.2%

    Semantic Model Vectors for Complex Video Event Recognition

    Full text link

    Unleashing the Power of VGG16: Advancements in Facial Emotion Recognization

    Get PDF
    In facial emotion detection, researchers are actively exploring effective methods to identify and understand facial expressions. This study introduces a novel mechanism for emotion identification using diverse facial photos captured under varying lighting conditions. A meticulously pre-processed dataset ensures data consistency and quality. Leveraging deep learning architectures, the study utilizes feature extraction techniques to capture subtle emotive cues and build an emotion classification model using convolutional neural networks (CNNs). The proposed methodology achieves an impressive 97% accuracy on the validation set, outperforming previous methods in terms of accuracy and robustness. Challenges such as lighting variations, head posture, and occlusions are acknowledged, and multimodal approaches incorporating additional modalities like auditory or physiological data are suggested for further improvement. The outcomes of this research have wide-ranging implications for affective computing, human-computer interaction, and mental health diagnosis, advancing the field of facial emotion identification and paving the way for sophisticated technology capable of understanding and responding to human emotions across diverse domains

    Video texture analysis based on HEVC encoding statistics

    Get PDF

    Discriminatively Trained Latent Ordinal Model for Video Classification

    Full text link
    We study the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF -- it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1604.0150

    Efficient Human Activity Recognition in Large Image and Video Databases

    Get PDF
    Vision-based human action recognition has attracted considerable interest in recent research for its applications to video surveillance, content-based search, healthcare, and interactive games. Most existing research deals with building informative feature descriptors, designing efficient and robust algorithms, proposing versatile and challenging datasets, and fusing multiple modalities. Often, these approaches build on certain conventions such as the use of motion cues to determine video descriptors, application of off-the-shelf classifiers, and single-factor classification of videos. In this thesis, we deal with important but overlooked issues such as efficiency, simplicity, and scalability of human activity recognition in different application scenarios: controlled video environment (e.g.~indoor surveillance), unconstrained videos (e.g.~YouTube), depth or skeletal data (e.g.~captured by Kinect), and person images (e.g.~Flicker). In particular, we are interested in answering questions like (a) is it possible to efficiently recognize human actions in controlled videos without temporal cues? (b) given that the large-scale unconstrained video data are often of high dimension low sample size (HDLSS) nature, how to efficiently recognize human actions in such data? (c) considering the rich 3D motion information available from depth or motion capture sensors, is it possible to recognize both the actions and the actors using only the motion dynamics of underlying activities? and (d) can motion information from monocular videos be used for automatically determining saliency regions for recognizing actions in still images
    • …
    corecore