63 research outputs found

    Automatic object classification for surveillance videos.

    Get PDF
    PhDThe recent popularity of surveillance video systems, specially located in urban scenarios, demands the development of visual techniques for monitoring purposes. A primary step towards intelligent surveillance video systems consists on automatic object classification, which still remains an open research problem and the keystone for the development of more specific applications. Typically, object representation is based on the inherent visual features. However, psychological studies have demonstrated that human beings can routinely categorise objects according to their behaviour. The existing gap in the understanding between the features automatically extracted by a computer, such as appearance-based features, and the concepts unconsciously perceived by human beings but unattainable for machines, or the behaviour features, is most commonly known as semantic gap. Consequently, this thesis proposes to narrow the semantic gap and bring together machine and human understanding towards object classification. Thus, a Surveillance Media Management is proposed to automatically detect and classify objects by analysing the physical properties inherent in their appearance (machine understanding) and the behaviour patterns which require a higher level of understanding (human understanding). Finally, a probabilistic multimodal fusion algorithm bridges the gap performing an automatic classification considering both machine and human understanding. The performance of the proposed Surveillance Media Management framework has been thoroughly evaluated on outdoor surveillance datasets. The experiments conducted demonstrated that the combination of machine and human understanding substantially enhanced the object classification performance. Finally, the inclusion of human reasoning and understanding provides the essential information to bridge the semantic gap towards smart surveillance video systems

    The Understanding of Human Activities by Computer Vision Techniques

    Get PDF
    Esta tesis propone nuevas metodologías para el aprendizaje de actividades humanas y su clasificación en categorías. Aunque este tema ha sido ampliamente estudiado por la comunidad investigadora en visión por computador, aún encontramos importantes dificultades por resolver. En primer lugar hemos encontrado que la literatura sobre técnicas de visión por computador para el aprendizaje de actividades humanas empleando pocas secuencias de entrenamiento es escasa y además presenta resultados pobres [1] [2]. Sin embargo, este aprendizaje es una herramienta crucial en varios escenarios. Por ejemplo, un sistema de reconocimiento recién desplegado necesita mucho tiempo para adquirir nuevas secuencias de entrenamiento así que el entrenamiento con pocos ejemplos puede acelerar la puesta en funcionamiento. También la detección de comportamientos anómalos, ejemplos de los cuales son difíciles de obtener, puede beneficiarse de estas técnicas. Existen soluciones mediante técnicas de cruce dominios o empleando características invariantes, sin embargo estas soluciones omiten información del escenario objetivo la cual reduce el ruido en el sistema mejorando los resultados cuando se tiene en cuenta y ejemplos de actividades anómalas siguen siendo difíciles de obtener. Estos sistemas entrenados con poca información se enfrentan a dos problemas principales: por una parte el sistema de entrenamiento puede sufrir de inestabilidades numéricas en la estimación de los parámetros del modelo, por otra, existe una falta de información representativa proveniente de actividades diversas. Nos hemos enfrentado a estos problemas proponiendo novedosos métodos para el aprendizaje de actividades humanas usando tan solo un ejemplo, lo que se denomina one-shot learning. Nuestras propuestas se basan en sistemas generativos, derivadas de los Modelos Ocultos de Markov[3][4], puesto que cada clase de actividad debe ser aprendida con tan solo un ejemplo. Además, hemos ampliado la diversidad de información en los modelos aplicado una transferencia de información desde fuentes externas al escenario[5]. En esta tesis se explican varias propuestas y se muestra como con ellas hemos conseguidos resultados en el estado del arte en tres bases de datos públicas [6][7][8]. La segunda dificultad a la que nos hemos enfrentado es el reconocimiento de actividades sin restricciones en el escenario. En este caso no tiene por qué coincidir el escenario de entrenamiento y el de evaluación por lo que la reducción de ruido anteriormente expuesta no es aplicable. Esto supone que se pueda emplear cualquier ejemplo etiquetado para entrenamiento independientemente del escenario de origen. Esta libertad nos permite extraer vídeos desde cualquier fuente evitando la restricción en el número de ejemplos de entrenamiento. Teniendo suficientes ejemplos de entrenamiento tanto métodos generativos como discriminativos pueden ser empleados. En el momento de realización de esta tesis encontramos que el estado del arte obtiene los mejores resultados empleando métodos discriminativos, sin embargo, la mayoría de propuestas no suelen considerar la información temporal a largo plazo de las actividades[9]. Esta información puede ser crucial para distinguir entre actividades donde el orden de sub-acciones es determinante, y puede ser una ayuda en otras situaciones[10]. Para ello hemos diseñado un sistema que incluye dicha información en una Máquina de Vectores de Soporte. Además, el sistema permite cierta flexibilidad en la alineación de las secuencias a comparar, característica muy útil si la segmentación de las actividades no es perfecta. Utilizando este sistema hemos obtenido resultados en el estado del arte para cuatro bases de datos complejas sin restricciones en los escenarios[11][12][13][14]. Los trabajos realizados en esta tesis han servido para realizar tres artículos en revistas del primer cuartil [15][16][17], dos ya publicados y otro enviado. Además, se han publicado 8 artículos en congresos internacionales y uno nacional [18][19][20][21][22][23][24][25][26]. [1]Seo, H. J. and Milanfar, P. (2011). Action recognition from one example. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5):867–882.(2011) [2]Yang, Y., Saleemi, I., and Shah, M. Discovering motion primitives for unsupervised grouping and one-shot learning of human actions, gestures, and expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7):1635–1648. (2013) [3]Rabiner, L. R. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286. (1989) [4]Bishop, C. M. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA. (2006) [5]Cook, D., Feuz, K., and Krishnan, N. Transfer learning for activity recognition: a survey. Knowledge and Information Systems, pages 1–20. (2013) [6]Schuldt, C., Laptev, I., and Caputo, B. Recognizing human actions: a local svm approach. In International Conference on Pattern Recognition (ICPR). (2004) [7]Weinland, D., Ronfard, R., and Boyer, E. Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding, 104(2-3):249–257. (2006) [8]Gorelick, L., Blank, M., Shechtman, E., Irani, M., and Basri, R. Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12):2247–2253. (2007) [9]Wang, H. and Schmid, C. Action recognition with improved trajectories. In IEEE International Conference on Computer Vision (ICCV). (2013) [10]Choi, J., Wang, Z., Lee, S.-C., and Jeon, W. J. A spatio-temporal pyramid matching for video retrieval. Computer Vision and Image Understanding, 117(6):660 – 669. (2013) [11]Oh, S., Hoogs, A., Perera, A., Cuntoor, N., Chen, C.-C., Lee, J. T., Mukherjee, S., Aggarwal, J. K., Lee, H., Davis, L., Swears, E., Wang, X., Ji, Q., Reddy, K., Shah, M., Vondrick, C., Pirsiavash, H., Ramanan, D., Yuen, J., Torralba, A., Song, B., Fong, A., Roy-Chowdhury, A., and Desai, M. A large-scale benchmark dataset for event recognition in surveillance video. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3153–3160. (2011) [12] Niebles, J. C., Chen, C.-W., and Fei-Fei, L. Modeling temporal structure of decomposable motion segments for activity classification. In European Conference on Computer Vision (ECCV), pages 392–405.(2010) [13]Reddy, K. K. and Shah, M. Recognizing 50 human action categories of web videos. Machine Vision and Applications, 24(5):971–981. (2013) [14]Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., and Serre, T. HMDB: a large video database for human motion recognition. In IEEE International Conference on Computer Vision (ICCV). (2011) [15]Rodriguez, M., Orrite, C., Medrano, C., and Makris, D. One-shot learning of human activity with an map adapted gmm and simplex-hmm. IEEE Transactions on Cybernetics, PP(99):1–12. (2016) [16]Rodriguez, M., Orrite, C., Medrano, C., and Makris, D. A time flexible kernel framework for video-based activity recognition. Image and Vision Computing 48-49:26 – 36. (2016) [17]Rodriguez, M., Orrite, C., Medrano, C., and Makris, D. Extended Study for One-shot Learning of Human Activity by a Simplex-HMM. IEEE Transactions on Cybernetics (Enviado) [18]Orrite, C., Rodriguez, M., Medrano, C. One-shot learning of temporal sequences using a distance dependent Chinese Restaurant Process. In Proceedings of the 23nd International Conference Pattern Recognition ICPR (December 2016) [19]Rodriguez, M., Medrano, C., Herrero, E., and Orrite, C. Spectral Clustering Using Friendship Path Similarity Proceedings of the 7th Iberian Conference, IbPRIA (June 2015) [20]Orrite, C., Soler, J., Rodriguez, M., Herrero, E., and Casas, R. Image-based location recognition and scenario modelling. In Proceedings of the 10th International Conference on Computer Vision Theory and Applications, VISAPP (March 2015) [21]Castán, D., Rodríguez, M., Ortega, A., Orrite, C., and Lleida, E. Vivolab and cvlab - mediaeval 2014: Violent scenes detection affect task. In Working Notes Proceedings of the MediaEval (October 2014) [22]Orrite, C., Rodriguez, M., Herrero, E., Rogez, G., and Velastin, S. A. Automatic segmentation and recognition of human actions in monocular sequences In Proceedings of the 22nd International Conference Pattern Recognition ICPR (August 2014) [23]Rodriguez, M., Medrano, C., Herrero, E., and Orrite, C. Transfer learning of human poses for action recognition. In 4th International Workshop of Human Behavior Unterstanding (HBU). (October 2013) [24]Rodriguez, M., Orrite, C., and Medrano, C. Human action recognition with limited labelled data. In Actas del III Workshop de Reconocimiento de Formas y Analisis de Imagenes, WSRFAI. (September 2013) [25]Orrite, C., Monforte, P., Rodriguez, M., and Herrero, E. Human Action Recognition under Partial Occlusions . Proceedings of the 6th Iberian Conference, IbPRIA (June 2013) [26]Orrite, C., Rodriguez, M., and Montañes, M. One sequence learning of human actions. In 2nd International Workshop of Human Behavior Unterstanding (HBU). (November 2011)This thesis provides some novel frameworks for learning human activities and for further classifying them into categories. This field of research has been largely studied by the computer vision community however there are still many drawbacks to solve. First, we have found few proposals in the literature for learning human activities from limited number of sequences. However, this learning is critical in several scenarios. For instance, in the initial stage after a system installation the capture of activity examples is time expensive and therefore, the learning with limited examples may accelerate the operational launch of the system. Moreover, examples for training abnormal behaviour are hardly obtainable and their learning may benefit from the same techniques. This problem is solved by some approaches, such as cross domain implementations or the use of invariant features, but they do not consider the specific scenario information which is useful for reducing the clutter and improving the results. Systems trained with scarce information face two main problems: on the one hand, the training process may suffer from numerical instabilities while estimating the model parameters; on the other hand, the model lacks of representative information coming from a diverse set of activity classes. We have dealt with these problems providing some novel approaches for learning human activities from one example, what is called a one-shot learning method. To do so, we have proposed generative approaches based on Hidden Markov Models as we need to learn each activity class from only one example. In addition, we have transferred information from external sources in order to introduce diverse information into the model. This thesis explains our proposals and shows how these methods achieve state-of-the-art results in three public datasets. Second, we have studied the recognition of human activities in unconstrained scenarios. In this case, the scenario may or may not be repeated in training and evaluation and therefore the clutter reduction previously mentioned does not happen. On the other hand, we can use any labelled video for training the system independently of the target scenario. This freedom allows the extraction of videos from the Internet dismissing the implicit constrains when training with limited examples. Having plenty of training examples both, generative and discriminative, methods can be used and by the time this thesis has been made the state-of-the-art has been achieved by discriminative ones. However, most of the methods usually fail when taking into consideration long-term information of the activities. This information is critical when comparing activities where the order of sub-actions is important, and may be useful in other comparisons as well. Thus, we have designed a framework that incorporates this information in a discriminative classifier. In addition, this method introduces some flexibility for sequence alignment, useful feature when the activity segmentation is not exact. Using this framework we have obtained state-of-the-art results in four challenging public datasets with unconstrained scenarios

    Deep Learning for Crowd Anomaly Detection

    Get PDF
    Today, public areas across the globe are monitored by an increasing amount of surveillance cameras. This widespread usage has presented an ever-growing volume of data that cannot realistically be examined in real-time. Therefore, efforts to understand crowd dynamics have brought light to automatic systems for the detection of anomalies in crowds. This thesis explores the methods used across literature for this purpose, with a focus on those fusing dense optical flow in a feature extraction stage to the crowd anomaly detection problem. To this extent, five different deep learning architectures are trained using optical flow maps estimated by three deep learning-based techniques. More specifically, a 2D convolutional network, a 3D convolutional network, and LSTM-based convolutional recurrent network, a pre-trained variant of the latter, and a ConvLSTM-based autoencoder is trained using both regular frames and optical flow maps estimated by LiteFlowNet3, RAFT, and GMA on the UCSD Pedestrian 1 dataset. The experimental results have shown that while prone to overfitting, the use of optical flow maps may improve the performance of supervised spatio-temporal architectures

    Unusual event detection in real-world surveillance applications

    Get PDF
    Given the near-ubiquity of CCTV, there is significant ongoing research effort to apply image and video analysis methods together with machine learning techniques towards autonomous analysis of such data sources. However, traditional approaches to scene understanding remain dependent on training based on human annotations that need to be provided for every camera sensor. In this thesis, we propose an unusual event detection and classification approach which is applicable to real-world visual monitoring applications. The goal is to infer the usual behaviours in the scene and to judge the normality of the scene on the basis on the model created. The first requirement for the system is that it should not demand annotated data to train the system. Annotation of the data is a laborious task, and it is not feasible in practice to annotate video data for each camera as an initial stage of event detection. Furthermore, even obtaining training examples for the unusual event class is challenging due to the rarity of such events in video data. Another requirement for the system is online generation of results. In surveillance applications, it is essential to generate real-time results to allow a swift response by a security operator to prevent harmful consequences of unusual and antisocial events. The online learning capabilities also mean that the model can be continuously updated to accommodate natural changes in the environment. The third requirement for the system is the ability to run the process indefinitely. The mentioned requirements are necessary for real-world surveillance applications and the approaches that conform to these requirements need to be investigated. This thesis investigates unusual event detection methods that conform with real-world requirements and investigates the issue through theoretical and experimental study of machine learning and computer vision algorithms

    Non-Gaussian data modeling with hidden Markov models

    Get PDF
    In 2015, 2.5 quintillion bytes of data were daily generated worldwide of which 90% were unstructured data that do not follow any pre-defined model. These data can be found in a great variety of formats among them are texts, images, audio tracks, or videos. With appropriate techniques, this massive amount of data is a goldmine from which one can extract a variety of meaningful embedded information. Among those techniques, machine learning algorithms allow multiple processing possibilities from compact data representation, to data clustering, classification, analysis, and synthesis, to the detection of outliers. Data modeling is the first step for performing any of these tasks and the accuracy and reliability of this initial step is thus crucial for subsequently building up a complete data processing framework. The principal motivation behind my work is the over-use of the Gaussian assumption for data modeling in the literature. Though this assumption is probably the best to make when no information about the data to be modeled is available, in most cases studying a few data properties would make other distributions a better assumption. In this thesis, I focus on proportional data that are most commonly known in the form of histograms and that naturally arise in a number of situations such as in bag-of-words methods. These data are non-Gaussian and their modeling with distributions belonging the Dirichlet family, that have common properties, is expected to be more accurate. The models I focus on are the hidden Markov models, well-known for their capabilities to easily handle dynamic ordered multivariate data. They have been shown to be very effective in numerous fields for various applications for the last 30 years and especially became a corner stone in speech processing. Despite their extensive use in almost all computer vision areas, they are still mainly suited for Gaussian data modeling. I propose here to theoretically derive different approaches for learning and applying to real-world situations hidden Markov models based on mixtures of Dirichlet, generalized Dirichlet, Beta-Liouville distributions, and mixed data. Expectation-Maximization and variational learning approaches are studied and compared over several data sets, specifically for the task of detecting and localizing unusual events. Hybrid HMMs are proposed to model mixed data with the goal of detecting changes in satellite images corrupted by different noises. Finally, several parametric distances for comparing Dirichlet and generalized Dirichlet-based HMMs are proposed and extensively tested for assessing their robustness. My experimental results show situations in which such models are worthy to be used, but also unravel their strength and limitations

    Video based detection of normal and anomalous behaviour of individuals

    Get PDF
    This PhD research has proposed novel computer vision and machine learning algorithms for the problem of video based anomalous event detection of individuals. Varieties of Hidden Markov Models were designed to model the temporal and spatial causalities of crowd behaviour. A Markov Random Field on top of a Gaussian Mixture Model is proposed to incorporate spatial context information during classification. Discriminative conditional random field methods are also proposed. Novel features are proposed to extract motion and appearance information. Most of the proposed approaches comprehensively outperform other techniques on publicly available datasets during the time of publications originating from the results

    Intelligent Transportation Related Complex Systems and Sensors

    Get PDF
    Building around innovative services related to different modes of transport and traffic management, intelligent transport systems (ITS) are being widely adopted worldwide to improve the efficiency and safety of the transportation system. They enable users to be better informed and make safer, more coordinated, and smarter decisions on the use of transport networks. Current ITSs are complex systems, made up of several components/sub-systems characterized by time-dependent interactions among themselves. Some examples of these transportation-related complex systems include: road traffic sensors, autonomous/automated cars, smart cities, smart sensors, virtual sensors, traffic control systems, smart roads, logistics systems, smart mobility systems, and many others that are emerging from niche areas. The efficient operation of these complex systems requires: i) efficient solutions to the issues of sensors/actuators used to capture and control the physical parameters of these systems, as well as the quality of data collected from these systems; ii) tackling complexities using simulations and analytical modelling techniques; and iii) applying optimization techniques to improve the performance of these systems. It includes twenty-four papers, which cover scientific concepts, frameworks, architectures and various other ideas on analytics, trends and applications of transportation-related data

    Semantic Spaces for Video Analysis of Behaviour

    Get PDF
    PhDThere are ever growing interests from the computer vision community into human behaviour analysis based on visual sensors. These interests generally include: (1) behaviour recognition - given a video clip or specific spatio-temporal volume of interest discriminate it into one or more of a set of pre-defined categories; (2) behaviour retrieval - given a video or textual description as query, search for video clips with related behaviour; (3) behaviour summarisation - given a number of video clips, summarise out representative and distinct behaviours. Although countless efforts have been dedicated into problems mentioned above, few works have attempted to analyse human behaviours in a semantic space. In this thesis, we define semantic spaces as a collection of high-dimensional Euclidean space in which semantic meaningful events, e.g. individual word, phrase and visual event, can be represented as vectors or distributions which are referred to as semantic representations. With the semantic space, semantic texts, visual events can be quantitatively compared by inner product, distance and divergence. The introduction of semantic spaces can bring lots of benefits for visual analysis. For example, discovering semantic representations for visual data can facilitate semantic meaningful video summarisation, retrieval and anomaly detection. Semantic space can also seamlessly bridge categories and datasets which are conventionally treated independent. This has encouraged the sharing of data and knowledge across categories and even datasets to improve recognition performance and reduce labelling effort. Moreover, semantic space has the ability to generalise learned model beyond known classes which is usually referred to as zero-shot learning. Nevertheless, discovering such a semantic space is non-trivial due to (1) semantic space is hard to define manually. Humans always have a good sense of specifying the semantic relatedness between visual and textual instances. But a measurable and finite semantic space can be difficult to construct with limited manual supervision. As a result, constructing semantic space from data is adopted to learn in an unsupervised manner; (2) It is hard to build a universal semantic space, i.e. this space is always contextual dependent. So it is important to build semantic space upon selected data such that it is always meaningful within the context. Even with a well constructed semantic space, challenges are still present including; (3) how to represent visual instances in the semantic space; and (4) how to mitigate the misalignment of visual feature and semantic spaces across categories and even datasets when knowledge/data are generalised. This thesis tackles the above challenges by exploiting data from different sources and building contextual semantic space with which data and knowledge can be transferred and shared to facilitate the general video behaviour analysis. To demonstrate the efficacy of semantic space for behaviour analysis, we focus on studying real world problems including surveillance behaviour analysis, zero-shot human action recognition and zero-shot crowd behaviour recognition with techniques specifically tailored for the nature of each problem. Firstly, for video surveillances scenes, we propose to discover semantic representations from the visual data in an unsupervised manner. This is due to the largely availability of unlabelled visual data in surveillance systems. By representing visual instances in the semantic space, data and annotations can be generalised to new events and even new surveillance scenes. Specifically, to detect abnormal events this thesis studies a geometrical alignment between semantic representation of events across scenes. Semantic actions can be thus transferred to new scenes and abnormal events can be detected in an unsupervised way. To model multiple surveillance scenes simultaneously, we show how to learn a shared semantic representation across a group of semantic related scenes through a multi-layer clustering of scenes. With multi-scene modelling we show how to improve surveillance tasks including scene activity profiling/understanding, crossscene query-by-example, behaviour classification, and video summarisation. Secondly, to avoid extremely costly and ambiguous video annotating, we investigate how to generalise recognition models learned from known categories to novel ones, which is often termed as zero-shot learning. To exploit the limited human supervision, e.g. category names, we construct the semantic space via a word-vector representation trained on large textual corpus in an unsupervised manner. Representation of visual instance in semantic space is obtained by learning a visual-to-semantic mapping. We notice that blindly applying the mapping learned from known categories to novel categories can cause bias and deteriorating the performance which is termed as domain shift. To solve this problem we employed techniques including semisupervised learning, self-training, hubness correction, multi-task learning and domain adaptation. All these methods in combine achieve state-of-the-art performance in zero-shot human action task. In the last, we study the possibility to re-use known and manually labelled semantic crowd attributes to recognise rare and unknown crowd behaviours. This task is termed as zero-shot crowd behaviours recognition. Crucially we point out that given the multi-labelled nature of semantic crowd attributes, zero-shot recognition can be improved by exploiting the co-occurrence between attributes. To summarise, this thesis studies methods for analysing video behaviours and demonstrates that exploring semantic spaces for video analysis is advantageous and more importantly enables multi-scene analysis and zero-shot learning beyond conventional learning strategies

    A Deep Learning Approach for Spatiotemporal-Data-Driven Traffic State Estimation

    Get PDF
    The past decade witnessed rapid developments in traffic data sensing technologies in the form of roadside detector hardware, vehicle on-board units, and pedestrian wearable devices. The growing magnitude and complexity of the available traffic data has fueled the demand for data-driven models that can handle large scale inputs. In the recent past, deep-learning-powered algorithms have become the state-of-the-art for various data-driven applications. In this research, three applications of deep learning algorithms for traffic state estimation were investigated. Firstly, network-wide traffic parameters estimation was explored. An attention-based multi-encoder-decoder (Att-MED) neural network architecture was proposed and trained to predict freeway traffic speed up to 60 minutes ahead. Att-MED was designed to encode multiple traffic input sequences: short-term, daily, and weekly cyclic behavior. The proposed network produced an average prediction accuracy of 97.5%, which was superior to the compared baseline models. In addition to improving the output performance, the model\u27s attention weights enhanced the model interpretability. This research additionally explored the utility of low-penetration connected probe-vehicle data for network-wide traffic parameters estimation and prediction on freeways. A novel sequence-to-sequence recurrent graph networks (Seq2Se2 GCN-LSTM) was designed. It was then trained to estimate and predict traffic volume and speed for a 60-minute future time horizon. The proposed methodology generated volume and speed predictions with an average accuracy of 90.5% and 96.6%, respectively, outperforming the investigated baseline models. The proposed method demonstrated robustness against perturbations caused by the probe vehicle fleet\u27s low penetration rate. Secondly, the application of deep learning for road weather detection using roadside CCTVs were investigated. A Vision Transformer (ViT) was trained for simultaneous rain and road surface condition classification. Next, a Spatial Self-Attention (SSA) network was designed to consume the individual detection results, interpret the spatial context, and modify the collective detection output accordingly. The sequential module improved the accuracy of the stand-alone Vision Transformer as measured by the F1-score, raising the total accuracy for both tasks to 96.71% and 98.07%, respectively. Thirdly, a real-time video-based traffic incident detection algorithm was developed to enhance the utilization of the existing roadside CCTV network. The methodology automatically identified the main road regions in video scenes and investigated static vehicles around those areas. The developed algorithm was evaluated using a dataset of roadside videos. The incidents were detected with 85.71% sensitivity and 11.10% false alarm rate with an average delay of 27.53 seconds. In general, the research proposed in this dissertation maximizes the utility of pre-existing traffic infrastructure and emerging probe traffic data. It additionally demonstrated deep learning algorithms\u27 capability of modeling complex spatiotemporal traffic data. This research illustrates that advances in the deep learning field continue to have a high applicability potential in the traffic state estimation domain

    Carried baggage detection and recognition in video surveillance with foreground segmentation

    Get PDF
    Security cameras installed in public spaces or in private organizations continuously record video data with the aim of detecting and preventing crime. For that reason, video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis, have gained high interest in recent years. In this thesis, the primary focus is on two key aspects of video analysis, reliable moving object segmentation and carried object detection & identification. A novel moving object segmentation scheme by background subtraction is presented in this thesis. The scheme relies on background modelling which is based on multi-directional gradient and phase congruency. As a post processing step, the detected foreground contours are refined by classifying the edge segments as either belonging to the foreground or background. Further contour completion technique by anisotropic diffusion is first introduced in this area. The proposed method targets cast shadow removal, gradual illumination change invariance, and closed contour extraction. A state of the art carried object detection method is employed as a benchmark algorithm. This method includes silhouette analysis by comparing human temporal templates with unencumbered human models. The implementation aspects of the algorithm are improved by automatically estimating the viewing direction of the pedestrian and are extended by a carried luggage identification module. As the temporal template is a frequency template and the information that it provides is not sufficient, a colour temporal template is introduced. The standard steps followed by the state of the art algorithm are approached from a different extended (by colour information) perspective, resulting in more accurate carried object segmentation. The experiments conducted in this research show that the proposed closed foreground segmentation technique attains all the aforementioned goals. The incremental improvements applied to the state of the art carried object detection algorithm revealed the full potential of the scheme. The experiments demonstrate the ability of the proposed carried object detection algorithm to supersede the state of the art method
    corecore