1,349 research outputs found

    Convolutional Neural Network on Three Orthogonal Planes for Dynamic Texture Classification

    Get PDF
    Dynamic Textures (DTs) are sequences of images of moving scenes that exhibit certain stationarity properties in time such as smoke, vegetation and fire. The analysis of DT is important for recognition, segmentation, synthesis or retrieval for a range of applications including surveillance, medical imaging and remote sensing. Deep learning methods have shown impressive results and are now the new state of the art for a wide range of computer vision tasks including image and video recognition and segmentation. In particular, Convolutional Neural Networks (CNNs) have recently proven to be well suited for texture analysis with a design similar to a filter bank approach. In this paper, we develop a new approach to DT analysis based on a CNN method applied on three orthogonal planes x y , xt and y t . We train CNNs on spatial frames and temporal slices extracted from the DT sequences and combine their outputs to obtain a competitive DT classifier. Our results on a wide range of commonly used DT classification benchmark datasets prove the robustness of our approach. Significant improvement of the state of the art is shown on the larger datasets.Comment: 19 pages, 10 figure

    Review of Person Re-identification Techniques

    Full text link
    Person re-identification across different surveillance cameras with disjoint fields of view has become one of the most interesting and challenging subjects in the area of intelligent video surveillance. Although several methods have been developed and proposed, certain limitations and unresolved issues remain. In all of the existing re-identification approaches, feature vectors are extracted from segmented still images or video frames. Different similarity or dissimilarity measures have been applied to these vectors. Some methods have used simple constant metrics, whereas others have utilised models to obtain optimised metrics. Some have created models based on local colour or texture information, and others have built models based on the gait of people. In general, the main objective of all these approaches is to achieve a higher-accuracy rate and lowercomputational costs. This study summarises several developments in recent literature and discusses the various available methods used in person re-identification. Specifically, their advantages and disadvantages are mentioned and compared.Comment: Published 201

    Moving cast shadows detection methods for video surveillance applications

    Get PDF
    Moving cast shadows are a major concern in today’s performance from broad range of many vision-based surveillance applications because they highly difficult the object classification task. Several shadow detection methods have been reported in the literature during the last years. They are mainly divided into two domains. One usually works with static images, whereas the second one uses image sequences, namely video content. In spite of the fact that both cases can be analogously analyzed, there is a difference in the application field. The first case, shadow detection methods can be exploited in order to obtain additional geometric and semantic cues about shape and position of its casting object (’shape from shadows’) as well as the localization of the light source. While in the second one, the main purpose is usually change detection, scene matching or surveillance (usually in a background subtraction context). Shadows can in fact modify in a negative way the shape and color of the target object and therefore affect the performance of scene analysis and interpretation in many applications. This chapter wills mainly reviews shadow detection methods as well as their taxonomies related with the second case, thus aiming at those shadows which are associated with moving objects (moving shadows).Peer Reviewe

    Novel Texture-based Probabilistic Object Recognition and Tracking Techniques for Food Intake Analysis and Traffic Monitoring

    Get PDF
    More complex image understanding algorithms are increasingly practical in a host of emerging applications. Object tracking has value in surveillance and data farming; and object recognition has applications in surveillance, data management, and industrial automation. In this work we introduce an object recognition application in automated nutritional intake analysis and a tracking application intended for surveillance in low quality videos. Automated food recognition is useful for personal health applications as well as nutritional studies used to improve public health or inform lawmakers. We introduce a complete, end-to-end system for automated food intake measurement. Images taken by a digital camera are analyzed, plates and food are located, food type is determined by neural network, distance and angle of food is determined and 3D volume estimated, the results are cross referenced with a nutritional database, and before and after meal photos are compared to determine nutritional intake. We compare against contemporary systems and provide detailed experimental results of our system\u27s performance. Our tracking systems consider the problem of car and human tracking on potentially very low quality surveillance videos, from fixed camera or high flying \acrfull{uav}. Our agile framework switches among different simple trackers to find the most applicable tracker based on the object and video properties. Our MAPTrack is an evolution of the agile tracker that uses soft switching to optimize between multiple pertinent trackers, and tracks objects based on motion, appearance, and positional data. In both cases we provide comparisons against trackers intended for similar applications i.e., trackers that stress robustness in bad conditions, with competitive results

    Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions

    Full text link
    3D action recognition has broad applications in human-computer interaction and intelligent surveillance. However, recognizing similar actions remains challenging since previous literature fails to capture motion and shape cues effectively from noisy depth data. In this paper, we propose a novel two-layer Bag-of-Visual-Words (BoVW) model, which suppresses the noise disturbances and jointly encodes both motion and shape cues. First, background clutter is removed by a background modeling method that is designed for depth data. Then, motion and shape cues are jointly used to generate robust and distinctive spatial-temporal interest points (STIPs): motion-based STIPs and shape-based STIPs. In the first layer of our model, a multi-scale 3D local steering kernel (M3DLSK) descriptor is proposed to describe local appearances of cuboids around motion-based STIPs. In the second layer, a spatial-temporal vector (STV) descriptor is proposed to describe the spatial-temporal distributions of shape-based STIPs. Using the Bag-of-Visual-Words (BoVW) model, motion and shape cues are combined to form a fused action representation. Our model performs favorably compared with common STIP detection and description methods. Thorough experiments verify that our model is effective in distinguishing similar actions and robust to background clutter, partial occlusions and pepper noise

    Data-driven model development in environmental geography - Methodological advancements and scientific applications

    Get PDF
    Die Erfassung räumlich kontinuierlicher Daten und raum-zeitlicher Dynamiken ist ein Forschungsschwerpunkt der Umweltgeographie. Zu diesem Ziel sind Modellierungsmethoden erforderlich, die es ermöglichen, aus limitierten Felddaten raum-zeitliche Aussagen abzuleiten. Die Komplexität von Umweltsystemen erfordert dabei die Verwendung von Modellierungsstrategien, die es erlauben, beliebige Zusammenhänge zwischen einer Vielzahl potentieller Prädiktoren zu berücksichtigen. Diese Anforderung verlangt nach einem Paradigmenwechsel von der parametrischen hin zu einer nicht-parametrischen, datengetriebenen Modellentwicklung, was zusätzlich durch die zunehmende Verfügbarkeit von Geodaten verstärkt wird. In diesem Zusammenhang haben sich maschinelle Lernverfahren als ein wichtiges Werkzeug erwiesen, um Muster in nicht-linearen und komplexen Systemen zu erfassen. Durch die wachsende Popularität maschineller Lernverfahren in wissenschaftlichen Zeitschriften und die Entwicklung komfortabler Softwarepakete wird zunehmend der Fehleindruck einer einfachen Anwendbarkeit erzeugt. Dem gegenüber steht jedoch eine Komplexität, die im Detail nur durch eine umfassende Methodenkompetenz kontrolliert werden kann. Diese Problematik gilt insbesondere für Geodaten, die besondere Merkmale wie vor allem räumliche Abhängigkeit aufweisen, womit sie sich von "gewöhnlichen" Daten abheben, was jedoch in maschinellen Lernanwendungen bisher weitestgehend ignoriert wird. Die vorliegende Arbeit beschäftigt sich mit dem Potenzial und der Sensitivität des maschinellen Lernens in der Umweltgeographie. In diesem Zusammenhang wurde eine Reihe von maschinellen Lernanwendungen in einem breiten Spektrum der Umweltgeographie veröffentlicht. Die einzelnen Beiträge stehen unter der übergeordneten Hypothese, dass datengetriebene Modellierungsstrategien nur dann zu einem Informationsgewinn und zu robusten raum-zeitlichen Ergebnissen führen, wenn die Merkmale von geographischen Daten berücksichtigt werden. Neben diesem übergeordneten methodischen Fokus zielt jede Anwendung darauf ab, durch adäquat angewandte Methoden neue fachliche Erkenntnisse in ihrem jeweiligen Forschungsgebiet zu liefern. Im Rahmen der Arbeit wurde eine Vielzahl relevanter Umweltmonitoring-Produkte entwickelt. Die Ergebnisse verdeutlichen, dass sowohl hohe fachwissenschaftliche als auch methodische Kenntnisse unverzichtbar sind, um den Bereich der datengetriebenen Umweltgeographie voranzutreiben. Die Arbeit demonstriert erstmals die Relevanz räumlicher Überfittung in geographischen Lernanwendungen und legt ihre Auswirkungen auf die Modellergebnisse dar. Um diesem Problem entgegenzuwirken, wird eine neue, an Geodaten angepasste Methode zur Modellentwicklung entwickelt, wodurch deutlich verbesserte Ergebnisse erzielt werden können. Diese Arbeit ist abschließend als Appell zu verstehen, über die Standardanwendungen der maschinellen Lernverfahren hinauszudenken, da sie beweist, dass die Anwendung von Standardverfahren auf Geodaten zu starker Überfittung und Fehlinterpretation der Ergebnisse führt. Erst wenn Eigenschaften von geographischen Daten berücksichtigt werden, bietet das maschinelle Lernen ein leistungsstarkes Werkzeug, um wissenschaftlich verlässliche Ergebnisse für die Umweltgeographie zu liefern

    Automatic object classification for surveillance videos.

    Get PDF
    PhDThe recent popularity of surveillance video systems, specially located in urban scenarios, demands the development of visual techniques for monitoring purposes. A primary step towards intelligent surveillance video systems consists on automatic object classification, which still remains an open research problem and the keystone for the development of more specific applications. Typically, object representation is based on the inherent visual features. However, psychological studies have demonstrated that human beings can routinely categorise objects according to their behaviour. The existing gap in the understanding between the features automatically extracted by a computer, such as appearance-based features, and the concepts unconsciously perceived by human beings but unattainable for machines, or the behaviour features, is most commonly known as semantic gap. Consequently, this thesis proposes to narrow the semantic gap and bring together machine and human understanding towards object classification. Thus, a Surveillance Media Management is proposed to automatically detect and classify objects by analysing the physical properties inherent in their appearance (machine understanding) and the behaviour patterns which require a higher level of understanding (human understanding). Finally, a probabilistic multimodal fusion algorithm bridges the gap performing an automatic classification considering both machine and human understanding. The performance of the proposed Surveillance Media Management framework has been thoroughly evaluated on outdoor surveillance datasets. The experiments conducted demonstrated that the combination of machine and human understanding substantially enhanced the object classification performance. Finally, the inclusion of human reasoning and understanding provides the essential information to bridge the semantic gap towards smart surveillance video systems
    corecore