5,040 research outputs found
Detect the unexpected: a science for surveillance
Purpose β The purpose of this paper is to outline a strategy for research development focused on addressing the neglected role of visual perception in real life tasks such as policing surveillance and command and control settings. Approach β The scale of surveillance task in modern control room is expanding as technology increases input capacity at an accelerating rate. The authors review recent literature highlighting the difficulties that apply to modern surveillance and give examples of how poor detection of the unexpected can be, and how surprising this deficit can be. Perceptual phenomena such as change blindness are linked to the perceptual processes undertaken by law-enforcement personnel. Findings β A scientific programme is outlined for how detection deficits can best be addressed in the context of a multidisciplinary collaborative agenda between researchers and practitioners. The development of a cognitive research field specifically examining the occurrence of perceptual βfailuresβ provides an opportunity for policing agencies to relate laboratory findings in psychology to their own fields of day-to-day enquiry. Originality/value β The paper shows, with examples, where interdisciplinary research may best be focussed on evaluating practical solutions and on generating useable guidelines on procedure and practice. It also argues that these processes should be investigated in real and simulated context-specific studies to confirm the validity of the findings in these new applied scenarios
Web-based Geographical Visualization of Container Itineraries
Around 90% of the world cargo is transported in maritime containers, but only around 2% are physically inspected. This opens the possibility for illicit activities. A viable solution is to control containerized cargo through information-based risk analysis. Container route-based analysis has been considered a key factor in identifying potentially suspicious consignments. Essential part of itinerary analysis is the geographical visualization of the itinerary. In the present paper, we present initial work of a web-based systemβs realization for interactive geographical visualization of container itinerary.JRC.G.4-Maritime affair
Knowledge Distillation for Action Anticipation via Label Smoothing
Human capability to anticipate near future from visual observations and
non-verbal cues is essential for developing intelligent systems that need to
interact with people. Several research areas, such as human-robot interaction
(HRI), assisted living or autonomous driving need to foresee future events to
avoid crashes or help people. Egocentric scenarios are classic examples where
action anticipation is applied due to their numerous applications. Such
challenging task demands to capture and model domain's hidden structure to
reduce prediction uncertainty. Since multiple actions may equally occur in the
future, we treat action anticipation as a multi-label problem with missing
labels extending the concept of label smoothing. This idea resembles the
knowledge distillation process since useful information is injected into the
model during training. We implement a multi-modal framework based on long
short-term memory (LSTM) networks to summarize past observations and make
predictions at different time steps. We perform extensive experiments on
EPIC-Kitchens and EGTEA Gaze+ datasets including more than 2500 and 100 action
classes, respectively. The experiments show that label smoothing systematically
improves performance of state-of-the-art models for action anticipation.Comment: Accepted to ICPR 202
Recommended from our members
Advances in crowd analysis for urban applications through urban event detection
The recent expansion of pervasive computing technology has contributed with novel means to pursue human activities in urban space. The urban dynamics unveiled by these means generate an enormous amount of data. These data are mainly endowed by portable and radio-frequency devices, transportation systems, video surveillance, satellites, unmanned aerial vehicles, and social networking services. This has opened a new avenue of opportunities, to understand and predict urban dynamics in detail, and plan various real-time services and applications in response to that. Over the last decade, certain aspects of the crowd, e.g., mobility, sentimental, size estimation and behavioral, have been analyzed in detail and the outcomes have been reported. This paper mainly conducted an extensive survey on various data sources used for different urban applications, the state-of-the-art on urban data generation techniques and associated processing methods in order to demonstrate their merits and capabilities. Then, available open-access crowd data sets for urban event detection are provided along with relevant application programming interfaces. In addition, an outlook on a support system for urban application is provided which fuses data from all the available pervasive technology sources and finally, some open challenges and promising research directions are outlined
AnΓ‘lise de multidΓ΅es usando coerΓͺncia de vizinhanΓ§a local
Large numbers of crowd analysis methods using computer vision have been developed in the past years. This dissertation presents an approach to explore characteristics inherent to human crowds β proxemics, and neighborhood relationship β with the purpose of extracting crowd features and using them for crowd flow estimation and anomaly detection and localization. Given the optical flow produced by any method, the proposed approach compares the similarity of each flow vector and its neighborhood using the Mahalanobis distance, which can be obtained in an efficient manner using integral images. This similarity value is then used either to filter the original optical flow or to extract features that describe the crowd behavior in different resolutions, depending on the radius of the personal space selected in the analysis. To show that the extracted features are indeed relevant, we tested several classifiers in the context of abnormality detection. More precisely, we used Recurrent Neural Networks, Dense Neural Networks, Support Vector Machines, Random Forest and Extremely Random Trees. The two developed approaches (crowd flow estimation and abnormality detection) were tested on publicly available datasets involving human crowded scenarios and compared with state-of-the-art methods.MΓ©todos para anΓ‘lise de ambientes de multidΓ΅es sΓ£o amplamente desenvolvidos na Γ‘rea de visΓ£o computacional. Esta tese apresenta uma abordagem para explorar caracterΓsticas inerentes Γ s multidΓ΅es humanas - comunicação proxΓͺmica e relaçáes de vizinhanΓ§a - para extrair caracterΓsticas de multidΓ΅es e usΓ‘-las para estimativa de fluxo de multidΓ΅es e detecção e localização de anomalias. Dado o fluxo Γ³ptico produzido por qualquer mΓ©todo, a abordagem proposta compara a similaridade de cada vetor de fluxo e sua vizinhanΓ§a usando a distΓ’ncia de Mahalanobis, que pode ser obtida de maneira eficiente usando imagens integrais. Esse valor de similaridade Γ© entΓ£o utilizado para filtrar o fluxo Γ³ptico original ou para extrair informaçáes que descrevem o comportamento da multidΓ£o em diferentes resoluçáes, dependendo do raio do espaΓ§o pessoal selecionado na anΓ‘lise. Para mostrar que as caracterΓsticas sΓ£o realmente relevantes, testamos vΓ‘rios classificadores no contexto da detecção de anormalidades. Mais precisamente, usamos redes neurais recorrentes, redes neurais densas, mΓ‘quinas de vetores de suporte, floresta aleatΓ³ria e Γ‘rvores extremamente aleatΓ³rias. As duas abordagens desenvolvidas (estimativa do fluxo de multidΓ΅es e detecção de anormalidades) foram testadas em conjuntos de dados pΓΊblicos, envolvendo cenΓ‘rios de multidΓ΅es humanas e comparados com mΓ©todos estado-da-arte
Automatic object classification for surveillance videos.
PhDThe recent popularity of surveillance video systems, specially located in urban
scenarios, demands the development of visual techniques for monitoring purposes.
A primary step towards intelligent surveillance video systems consists on automatic
object classification, which still remains an open research problem and the keystone
for the development of more specific applications.
Typically, object representation is based on the inherent visual features. However,
psychological studies have demonstrated that human beings can routinely categorise
objects according to their behaviour. The existing gap in the understanding
between the features automatically extracted by a computer, such as appearance-based
features, and the concepts unconsciously perceived by human beings but
unattainable for machines, or the behaviour features, is most commonly known
as semantic gap. Consequently, this thesis proposes to narrow the semantic gap
and bring together machine and human understanding towards object classification.
Thus, a Surveillance Media Management is proposed to automatically detect and
classify objects by analysing the physical properties inherent in their appearance
(machine understanding) and the behaviour patterns which require a higher level of
understanding (human understanding). Finally, a probabilistic multimodal fusion
algorithm bridges the gap performing an automatic classification considering both
machine and human understanding.
The performance of the proposed Surveillance Media Management framework
has been thoroughly evaluated on outdoor surveillance datasets. The experiments
conducted demonstrated that the combination of machine and human understanding
substantially enhanced the object classification performance. Finally, the inclusion
of human reasoning and understanding provides the essential information to bridge
the semantic gap towards smart surveillance video systems
Automated camera ranking and selection using video content and scene context
PhDWhen observing a scene with multiple cameras, an important problem to solve is to automatically
identify βwhat camera feed should be shown and when?β The answer to this question is of interest
for a number of applications and scenarios ranging from sports to surveillance. In this thesis we
present a framework for the ranking of each video frame and camera across time and the camera
network, respectively. This ranking is then used for automated video production. In the first stage
information from each camera view and from the objects in it is extracted and represented in a way
that allows for object- and frame-ranking. First objects are detected and ranked within and across
camera views. This ranking takes into account both visible and contextual information related to
the object. Then content ranking is performed based on the objects in the view and camera-network
level information. We propose two novel techniques for content ranking namely: Routing Based
Ranking (RBR) and Multivariate Gaussian based Ranking (MVG). In RBR we use a rule based
framework where weighted fusion of object and frame level information takes place while in MVG
the rank is estimated as a multivariate Gaussian distribution. Through experimental and subjective
validation we demonstrate that the proposed content ranking strategies allows the identification of
the best-camera at each time.
The second part of the thesis focuses on the automatic generation of N-to-1 videos based on the
ranked content. We demonstrate that in such production settings it is undesirable to have frequent
inter-camera switching. Thus motivating the need for a compromise, between selecting the best
camera most of the time and minimising the frequent inter-camera switching, we demonstrate that
state-of-the-art techniques for this task are inadequate and fail in dynamic scenes. We propose three
novel methods for automated camera selection. The first method (Β‘go f ) performs a joint optimization
of a cost function that depends on both the view quality and inter-camera switching so that a
i
Abstract ii
pleasing best-view video sequence can be composed. The other two methods (Β‘dbn and Β‘util) include
the selection decision into the ranking-strategy. In Β‘dbn we model the best-camera selection
as a state sequence via Directed Acyclic Graphs (DAG) designed as a Dynamic Bayesian Network
(DBN), which encodes the contextual knowledge about the camera network and employs the past
information to minimize the inter camera switches. In comparison Β‘util utilizes the past as well
as the future information in a Partially Observable Markov Decision Process (POMDP) where the
camera-selection at a certain time is influenced by the past information and its repercussions in
the future. The performance of the proposed approach is demonstrated on multiple real and synthetic
multi-camera setups. We compare the proposed architectures with various baseline methods
with encouraging results. The performance of the proposed approaches is also validated through
extensive subjective testing
- β¦