1,349 research outputs found
Convolutional Neural Network on Three Orthogonal Planes for Dynamic Texture Classification
Dynamic Textures (DTs) are sequences of images of moving scenes that exhibit
certain stationarity properties in time such as smoke, vegetation and fire. The
analysis of DT is important for recognition, segmentation, synthesis or
retrieval for a range of applications including surveillance, medical imaging
and remote sensing. Deep learning methods have shown impressive results and are
now the new state of the art for a wide range of computer vision tasks
including image and video recognition and segmentation. In particular,
Convolutional Neural Networks (CNNs) have recently proven to be well suited for
texture analysis with a design similar to a filter bank approach. In this
paper, we develop a new approach to DT analysis based on a CNN method applied
on three orthogonal planes x y , xt and y t . We train CNNs on spatial frames
and temporal slices extracted from the DT sequences and combine their outputs
to obtain a competitive DT classifier. Our results on a wide range of commonly
used DT classification benchmark datasets prove the robustness of our approach.
Significant improvement of the state of the art is shown on the larger
datasets.Comment: 19 pages, 10 figure
Review of Person Re-identification Techniques
Person re-identification across different surveillance cameras with disjoint
fields of view has become one of the most interesting and challenging subjects
in the area of intelligent video surveillance. Although several methods have
been developed and proposed, certain limitations and unresolved issues remain.
In all of the existing re-identification approaches, feature vectors are
extracted from segmented still images or video frames. Different similarity or
dissimilarity measures have been applied to these vectors. Some methods have
used simple constant metrics, whereas others have utilised models to obtain
optimised metrics. Some have created models based on local colour or texture
information, and others have built models based on the gait of people. In
general, the main objective of all these approaches is to achieve a
higher-accuracy rate and lowercomputational costs. This study summarises
several developments in recent literature and discusses the various available
methods used in person re-identification. Specifically, their advantages and
disadvantages are mentioned and compared.Comment: Published 201
Moving cast shadows detection methods for video surveillance applications
Moving cast shadows are a major concern in today’s performance from broad range of many vision-based surveillance applications because they highly difficult the object classification task. Several shadow detection methods have been reported in the literature during the last years. They are mainly divided into two domains. One usually works with static images, whereas the second one uses image sequences, namely video content. In spite of the fact that both cases can be analogously analyzed, there is a difference in the application field. The first case, shadow detection methods can be exploited in order to obtain additional geometric and semantic cues about shape and position of its casting object (’shape from shadows’) as well as the localization of the light source. While in the second one, the main purpose is usually change detection, scene matching or surveillance (usually in a background subtraction context). Shadows can in fact modify in a negative way the shape and color of the target object and therefore affect the performance of scene analysis and interpretation in many applications. This chapter wills mainly reviews shadow detection methods as well as their taxonomies related with the second case, thus aiming at those shadows which are associated with moving objects (moving shadows).Peer Reviewe
Novel Texture-based Probabilistic Object Recognition and Tracking Techniques for Food Intake Analysis and Traffic Monitoring
More complex image understanding algorithms are increasingly practical in a host of emerging applications. Object tracking has value in surveillance and data farming; and object recognition has applications in surveillance, data management, and industrial automation. In this work we introduce an object recognition application in automated nutritional intake analysis and a tracking application intended for surveillance in low quality videos. Automated food recognition is useful for personal health applications as well as nutritional studies used to improve public health or inform lawmakers. We introduce a complete, end-to-end system for automated food intake measurement. Images taken by a digital camera are analyzed, plates and food are located, food type is determined by neural network, distance and angle of food is determined and 3D volume estimated, the results are cross referenced with a nutritional database, and before and after meal photos are compared to determine nutritional intake. We compare against contemporary systems and provide detailed experimental results of our system\u27s performance. Our tracking systems consider the problem of car and human tracking on potentially very low quality surveillance videos, from fixed camera or high flying \acrfull{uav}. Our agile framework switches among different simple trackers to find the most applicable tracker based on the object and video properties. Our MAPTrack is an evolution of the agile tracker that uses soft switching to optimize between multiple pertinent trackers, and tracks objects based on motion, appearance, and positional data. In both cases we provide comparisons against trackers intended for similar applications i.e., trackers that stress robustness in bad conditions, with competitive results
Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions
3D action recognition has broad applications in human-computer interaction
and intelligent surveillance. However, recognizing similar actions remains
challenging since previous literature fails to capture motion and shape cues
effectively from noisy depth data. In this paper, we propose a novel two-layer
Bag-of-Visual-Words (BoVW) model, which suppresses the noise disturbances and
jointly encodes both motion and shape cues. First, background clutter is
removed by a background modeling method that is designed for depth data. Then,
motion and shape cues are jointly used to generate robust and distinctive
spatial-temporal interest points (STIPs): motion-based STIPs and shape-based
STIPs. In the first layer of our model, a multi-scale 3D local steering kernel
(M3DLSK) descriptor is proposed to describe local appearances of cuboids around
motion-based STIPs. In the second layer, a spatial-temporal vector (STV)
descriptor is proposed to describe the spatial-temporal distributions of
shape-based STIPs. Using the Bag-of-Visual-Words (BoVW) model, motion and shape
cues are combined to form a fused action representation. Our model performs
favorably compared with common STIP detection and description methods. Thorough
experiments verify that our model is effective in distinguishing similar
actions and robust to background clutter, partial occlusions and pepper noise
Data-driven model development in environmental geography - Methodological advancements and scientific applications
Die Erfassung räumlich kontinuierlicher Daten und raum-zeitlicher Dynamiken ist ein Forschungsschwerpunkt der Umweltgeographie. Zu diesem Ziel sind Modellierungsmethoden erforderlich, die es ermöglichen, aus limitierten Felddaten raum-zeitliche Aussagen abzuleiten. Die Komplexität von Umweltsystemen erfordert dabei die Verwendung von Modellierungsstrategien, die es erlauben, beliebige Zusammenhänge zwischen einer Vielzahl potentieller Prädiktoren zu berücksichtigen. Diese Anforderung verlangt nach einem Paradigmenwechsel von der parametrischen hin zu einer nicht-parametrischen, datengetriebenen Modellentwicklung, was zusätzlich durch die zunehmende Verfügbarkeit von Geodaten verstärkt wird.
In diesem Zusammenhang haben sich maschinelle Lernverfahren als ein wichtiges Werkzeug erwiesen, um Muster in nicht-linearen und komplexen Systemen zu erfassen. Durch die wachsende Popularität maschineller Lernverfahren in wissenschaftlichen Zeitschriften und die Entwicklung komfortabler Softwarepakete wird zunehmend der Fehleindruck einer einfachen Anwendbarkeit erzeugt. Dem gegenüber steht jedoch eine Komplexität, die im Detail nur durch eine umfassende Methodenkompetenz kontrolliert werden kann.
Diese Problematik gilt insbesondere für Geodaten, die besondere Merkmale wie vor allem räumliche Abhängigkeit aufweisen, womit sie sich von "gewöhnlichen" Daten abheben, was jedoch in maschinellen Lernanwendungen bisher weitestgehend ignoriert wird.
Die vorliegende Arbeit beschäftigt sich mit dem Potenzial und der Sensitivität des maschinellen Lernens in der Umweltgeographie. In diesem Zusammenhang wurde eine Reihe von maschinellen Lernanwendungen in einem breiten Spektrum der Umweltgeographie veröffentlicht. Die einzelnen Beiträge stehen unter der übergeordneten Hypothese, dass datengetriebene Modellierungsstrategien nur dann zu einem Informationsgewinn und zu robusten raum-zeitlichen Ergebnissen führen, wenn die Merkmale von geographischen Daten berücksichtigt werden. Neben diesem übergeordneten methodischen Fokus zielt jede Anwendung darauf ab, durch adäquat angewandte Methoden neue fachliche Erkenntnisse in ihrem jeweiligen Forschungsgebiet zu liefern.
Im Rahmen der Arbeit wurde eine Vielzahl relevanter Umweltmonitoring-Produkte entwickelt. Die Ergebnisse verdeutlichen, dass sowohl hohe fachwissenschaftliche als auch methodische Kenntnisse unverzichtbar sind, um den Bereich der datengetriebenen Umweltgeographie voranzutreiben. Die Arbeit demonstriert erstmals die Relevanz räumlicher Überfittung in geographischen Lernanwendungen und legt ihre Auswirkungen auf die Modellergebnisse dar. Um diesem Problem entgegenzuwirken, wird eine neue, an Geodaten angepasste Methode zur Modellentwicklung entwickelt, wodurch deutlich verbesserte Ergebnisse erzielt werden können.
Diese Arbeit ist abschließend als Appell zu verstehen, über die Standardanwendungen der maschinellen Lernverfahren hinauszudenken, da sie beweist, dass die Anwendung von Standardverfahren auf Geodaten zu starker Überfittung und Fehlinterpretation der Ergebnisse führt. Erst wenn Eigenschaften von geographischen Daten berücksichtigt werden, bietet das maschinelle Lernen ein leistungsstarkes Werkzeug, um wissenschaftlich verlässliche Ergebnisse für die Umweltgeographie zu liefern
Automatic object classification for surveillance videos.
PhDThe recent popularity of surveillance video systems, specially located in urban
scenarios, demands the development of visual techniques for monitoring purposes.
A primary step towards intelligent surveillance video systems consists on automatic
object classification, which still remains an open research problem and the keystone
for the development of more specific applications.
Typically, object representation is based on the inherent visual features. However,
psychological studies have demonstrated that human beings can routinely categorise
objects according to their behaviour. The existing gap in the understanding
between the features automatically extracted by a computer, such as appearance-based
features, and the concepts unconsciously perceived by human beings but
unattainable for machines, or the behaviour features, is most commonly known
as semantic gap. Consequently, this thesis proposes to narrow the semantic gap
and bring together machine and human understanding towards object classification.
Thus, a Surveillance Media Management is proposed to automatically detect and
classify objects by analysing the physical properties inherent in their appearance
(machine understanding) and the behaviour patterns which require a higher level of
understanding (human understanding). Finally, a probabilistic multimodal fusion
algorithm bridges the gap performing an automatic classification considering both
machine and human understanding.
The performance of the proposed Surveillance Media Management framework
has been thoroughly evaluated on outdoor surveillance datasets. The experiments
conducted demonstrated that the combination of machine and human understanding
substantially enhanced the object classification performance. Finally, the inclusion
of human reasoning and understanding provides the essential information to bridge
the semantic gap towards smart surveillance video systems
- …