133 research outputs found

    Video anomaly detection and localization by local motion based joint video representation and OCELM

    Get PDF
    Nowadays, human-based video analysis becomes increasingly exhausting due to the ubiquitous use of surveillance cameras and explosive growth of video data. This paper proposes a novel approach to detect and localize video anomalies automatically. For video feature extraction, video volumes are jointly represented by two novel local motion based video descriptors, SL-HOF and ULGP-OF. SL-HOF descriptor captures the spatial distribution information of 3D local regions’ motion in the spatio-temporal cuboid extracted from video, which can implicitly reflect the structural information of foreground and depict foreground motion more precisely than the normal HOF descriptor. To locate the video foreground more accurately, we propose a new Robust PCA based foreground localization scheme. ULGP-OF descriptor, which seamlessly combines the classic 2D texture descriptor LGP and optical flow, is proposed to describe the motion statistics of local region texture in the areas located by the foreground localization scheme. Both SL-HOF and ULGP-OF are shown to be more discriminative than existing video descriptors in anomaly detection. To model features of normal video events, we introduce the newly-emergent one-class Extreme Learning Machine (OCELM) as the data description algorithm. With a tremendous reduction in training time, OCELM can yield comparable or better performance than existing algorithms like the classic OCSVM, which makes our approach easier for model updating and more applicable to fast learning from the rapidly generated surveillance data. The proposed approach is tested on UCSD ped1, ped2 and UMN datasets, and experimental results show that our approach can achieve state-of-the-art results in both video anomaly detection and localization task.This work was supported by the National Natural Science Foundation of China (Project nos. 60970034, 61170287, 61232016)

    Human Motion Trajectory Prediction: A Survey

    Full text link
    With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important. Specifically, predicting future positions of dynamic agents and planning considering such predictions are key tasks for self-driving vehicles, service robots and advanced surveillance systems. This paper provides a survey of human motion trajectory prediction. We review, analyze and structure a large selection of work from different communities and propose a taxonomy that categorizes existing methods based on the motion modeling approach and level of contextual information used. We provide an overview of the existing datasets and performance metrics. We discuss limitations of the state of the art and outline directions for further research.Comment: Submitted to the International Journal of Robotics Research (IJRR), 37 page

    Non-iterative and Fast Deep Learning: Multilayer Extreme Learning Machines

    Get PDF
    In the past decade, deep learning techniques have powered many aspects of our daily life, and drawn ever-increasing research interests. However, conventional deep learning approaches, such as deep belief network (DBN), restricted Boltzmann machine (RBM), and convolutional neural network (CNN), suffer from time-consuming training process due to fine-tuning of a large number of parameters and the complicated hierarchical structure. Furthermore, the above complication makes it difficult to theoretically analyze and prove the universal approximation of those conventional deep learning approaches. In order to tackle the issues, multilayer extreme learning machines (ML-ELM) were proposed, which accelerate the development of deep learning. Compared with conventional deep learning, ML-ELMs are non-iterative and fast due to the random feature mapping mechanism. In this paper, we perform a thorough review on the development of ML-ELMs, including stacked ELM autoencoder (ELM-AE), residual ELM, and local receptive field based ELM (ELM-LRF), as well as address their applications. In addition, we also discuss the connection between random neural networks and conventional deep learning

    Simple and Complex Human Action Recognition in Constrained and Unconstrained Videos

    Get PDF
    Human action recognition plays a crucial role in visual learning applications such as video understanding and surveillance, video retrieval, human-computer interactions, and autonomous driving systems. A variety of methodologies have been proposed for human action recognition via developing of low-level features along with the bag-of-visual-word models. However, much less research has been performed on the compound of pre-processing, encoding and classification stages. This dissertation focuses on enhancing the action recognition performances via ensemble learning, hybrid classifier, hierarchical feature representation, and key action perception methodologies. Action variation is one of the crucial challenges in video analysis and action recognition. We address this problem by proposing the hybrid classifier (HC) to discriminate actions which contain similar forms of motion features such as walking, running, and jogging. Aside from that, we show and proof that the fusion of various appearance-based and motion features can boost the simple and complex action recognition performance. The next part of the dissertation introduces pooled-feature representation (PFR) which is derived from a double phase encoding framework (DPE). Considering that a given unconstrained video is composed of a sequence of simple frames, the first phase of DPE generates temporal sub-volumes from the video and represents them individually by employing the proposed improved rank pooling (IRP) method. The second phase constructs the pool of features by fusing the represented vectors from the first phase. The pool is compressed and then encoded to provide video-parts vector (VPV). The DPE framework allows distilling the video representation and hierarchically extracting new information. Compared with recent video encoding approaches, VPV can preserve the higher-level information through standard encoding of low-level features in two phases. Furthermore, the encoded vectors from both phases of DPE are fused along with a compression stage to develop PFR

    Nonlinear Parametric and Neural Network Modelling for Medical Image Classification

    Get PDF
    System identification and artificial neural networks (ANN) are families of algorithms used in systems engineering and machine learning respectively that use structure detection and learning strategies to build models of complex systems by taking advantage of input-output type data. These models play an essential role in science and engineering because they fill the gap in those cases where we know the input-output behaviour of a system, but there is not a mathematical model to understand and predict its changes in future or even prevent threats. In this context, the nonlinear approximation of systems is nowadays very popular since it better describes complex instances. On the other hand, digital image processing is an area of systems engineering that is expanding the analysis dimension level in a variety of real-life problems while it is becoming more attractive and affordable over time. Medicine has made the most of it by supporting important human decision-making processes through computer-aided diagnosis (CAD) systems. This thesis presents three different frameworks for breast cancer detection, with approaches ranging from nonlinear system identification, nonlinear system identification coupled with simple neural networks, to multilayer neural networks. In particular, the nonlinear system identification approaches termed the Nonlinear AutoRegressive with eXogenous inputs (NARX) model and the MultiScales Radial Basis Function (MSRBF) neural networks appear for the first time in image processing. Along with the above contributions takes place the presentation of the Multilayer-Fuzzy Extreme Learning Machine (ML-FELM) neural network for faster training and more accurate image classification. A central research aim is to take advantage of nonlinear system identification and multilayer neural networks to enhance the feature extraction process, while the classification in CAD systems is bolstered. In the case of multilayer neural networks, the extraction is carried throughout stacked autoencoders, a bottleneck network architecture that promotes a data transformation between layers. In the case of nonlinear system identification, the goal is to add flexible models capable of capturing distinctive features from digital images that might be shortly recognised by simpler approaches. The purpose of detecting nonlinearities in digital images is complementary to that of linear models since the goal is to extract features in greater depth, in which both linear and nonlinear elements can be captured. This aim is relevant because, accordingly to previous work cited in the first chapter, not all spatial relationships existing in digital images can be explained appropriately with linear dependencies. Experimental results show that the methodologies based on system identification produced reliable images models with customised mathematical structure. The models came to include nonlinearities in different proportions, depending upon the case under examination. The information about nonlinearity and model structure was used as part of the whole image model. It was found that, in some instances, the models from different clinical classes in the breast cancer detection problem presented a particular structure. For example, NARX models of the malignant class showed higher non-linearity percentage and depended more on exogenous inputs compared to other classes. Regarding classification performance, comparisons of the three new CAD systems with existing methods had variable results. As for the NARX model, its performance was superior in three cases but was overcame in two. However, the comparison must be taken with caution since different databases were used. The MSRBF model was better in 5 out of 6 cases and had superior specificity in all instances, overcoming in 3.5% the closest model in this line. The ML-FELM model was the best in 6 out of 6 cases, although it was defeated in accuracy by 0.6% in one case and specificity in 0.22% in another one

    Proceedings of the 2015 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

    Get PDF
    This book is a collection of proceedings of the talks given at the 2015 annual joint workshop of Fraunhofer IOSB and the Vision and Fusion Laboratory (IES) by the doctoral students of both institutions. The topics of individual contributions range from computer vision, optical metrology, and world modelling to data fusion and human-machine interaction

    Incremental learning algorithms and applications

    Get PDF
    International audienceIncremental learning refers to learning from streaming data, which arrive over time, with limited memory resources and, ideally, without sacrificing model accuracy. This setting fits different application scenarios where lifelong learning is relevant, e.g. due to changing environments , and it offers an elegant scheme for big data processing by means of its sequential treatment. In this contribution, we formalise the concept of incremental learning, we discuss particular challenges which arise in this setting, and we give an overview about popular approaches, its theoretical foundations, and applications which emerged in the last years

    Multimedia security and privacy protection in the internet of things: research developments and challenges

    Get PDF
    With the rapid growth of the internet of things (IoT), huge amounts of multimedia data are being generated from and/or exchanged through various IoT devices, systems and applications. The security and privacy of multimedia data have, however, emerged as key challenges that have the potential to impact the successful deployment of IoT devices in some data-sensitive applications. In this paper, we conduct a comprehensive survey on multimedia data security and privacy protection in the IoT. First, we classify multimedia data into different types and security levels according to application areas. Then, we analyse and discuss the existing multimedia data protection schemes in the IoT, including traditional techniques (e.g., cryptography and watermarking) and emerging technologies (e.g., blockchain and federated learning). Based on the detailed analysis on the research development of IoT-related multimedia security and privacy protection, we point out some open challenges and provide future research directions, aiming to advance the study in the relevant fields and assist researchers in gaining a deeper understanding of the state of the art on multimedia data protection in the IoT
    • …
    corecore