14,700 research outputs found
Mesh-SORT: Simple and effective location-wise tracker with lost management strategies
Multi-Object Tracking (MOT) has gained extensive attention in recent years
due to its potential applications in traffic and pedestrian detection. We note
that tracking by detection may suffer from errors generated by noise detectors,
such as an imprecise bounding box before the occlusions, and observed that in
most tracking scenarios, objects tend to move and lost within specific
locations. To counter this, we present a novel tracker to deal with the bad
detector and occlusions. Firstly, we proposed a location-wise sub-region
recognition method which equally divided the frame, which we called mesh. Then
we proposed corresponding location-wise loss management strategies and
different matching strategies. The resulting Mesh-SORT, ablation studies
demonstrate its effectiveness and made 3% fragmentation 7.2% ID switches drop
and 0.4% MOTA improvement compared to the baseline on MOT17 datasets. Finally,
we analyze its limitation on the specific scene and discussed what future works
can be extended.Comment: 14 pages 18 fig
Virtual Reality Games for Motor Rehabilitation
This paper presents a fuzzy logic based method to track user satisfaction without the need for devices to monitor users physiological conditions. User satisfaction is the key to any product’s acceptance; computer applications and video games provide a unique opportunity to provide a tailored environment for each user to better suit their needs. We have implemented a non-adaptive fuzzy logic model of emotion, based on the emotional component of the Fuzzy Logic Adaptive Model of Emotion (FLAME) proposed by El-Nasr, to estimate player emotion in UnrealTournament 2004. In this paper we describe the implementation of this system and present the results of one of several play tests. Our research contradicts the current literature that suggests physiological measurements are needed. We show that it is possible to use a software only method to estimate user emotion
Spatial and temporal background modelling of non-stationary visual scenes
PhDThe prevalence of electronic imaging systems in everyday life has become increasingly apparent
in recent years. Applications are to be found in medical scanning, automated manufacture, and
perhaps most significantly, surveillance. Metropolitan areas, shopping malls, and road traffic
management all employ and benefit from an unprecedented quantity of video cameras for monitoring
purposes. But the high cost and limited effectiveness of employing humans as the final
link in the monitoring chain has driven scientists to seek solutions based on machine vision techniques.
Whilst the field of machine vision has enjoyed consistent rapid development in the last
20 years, some of the most fundamental issues still remain to be solved in a satisfactory manner.
Central to a great many vision applications is the concept of segmentation, and in particular,
most practical systems perform background subtraction as one of the first stages of video
processing. This involves separation of ‘interesting foreground’ from the less informative but
persistent background. But the definition of what is ‘interesting’ is somewhat subjective, and
liable to be application specific. Furthermore, the background may be interpreted as including
the visual appearance of normal activity of any agents present in the scene, human or otherwise.
Thus a background model might be called upon to absorb lighting changes, moving trees and
foliage, or normal traffic flow and pedestrian activity, in order to effect what might be termed in
‘biologically-inspired’ vision as pre-attentive selection. This challenge is one of the Holy Grails
of the computer vision field, and consequently the subject has received considerable attention.
This thesis sets out to address some of the limitations of contemporary methods of background
segmentation by investigating methods of inducing local mutual support amongst pixels
in three starkly contrasting paradigms: (1) locality in the spatial domain, (2) locality in the shortterm
time domain, and (3) locality in the domain of cyclic repetition frequency.
Conventional per pixel models, such as those based on Gaussian Mixture Models, offer no
spatial support between adjacent pixels at all. At the other extreme, eigenspace models impose
a structure in which every image pixel bears the same relation to every other pixel. But Markov
Random Fields permit definition of arbitrary local cliques by construction of a suitable graph, and
3
are used here to facilitate a novel structure capable of exploiting probabilistic local cooccurrence
of adjacent Local Binary Patterns. The result is a method exhibiting strong sensitivity to multiple
learned local pattern hypotheses, whilst relying solely on monochrome image data.
Many background models enforce temporal consistency constraints on a pixel in attempt to
confirm background membership before being accepted as part of the model, and typically some
control over this process is exercised by a learning rate parameter. But in busy scenes, a true
background pixel may be visible for a relatively small fraction of the time and in a temporally
fragmented fashion, thus hindering such background acquisition. However, support in terms of
temporal locality may still be achieved by using Combinatorial Optimization to derive shortterm
background estimates which induce a similar consistency, but are considerably more robust
to disturbance. A novel technique is presented here in which the short-term estimates act as
‘pre-filtered’ data from which a far more compact eigen-background may be constructed.
Many scenes entail elements exhibiting repetitive periodic behaviour. Some road junctions
employing traffic signals are among these, yet little is to be found amongst the literature regarding
the explicit modelling of such periodic processes in a scene. Previous work focussing on gait
recognition has demonstrated approaches based on recurrence of self-similarity by which local
periodicity may be identified. The present work harnesses and extends this method in order
to characterize scenes displaying multiple distinct periodicities by building a spatio-temporal
model. The model may then be used to highlight abnormality in scene activity. Furthermore, a
Phase Locked Loop technique with a novel phase detector is detailed, enabling such a model to
maintain correct synchronization with scene activity in spite of noise and drift of periodicity.
This thesis contends that these three approaches are all manifestations of the same broad
underlying concept: local support in each of the space, time and frequency domains, and furthermore,
that the support can be harnessed practically, as will be demonstrated experimentally
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep Models
Video Anomaly Detection (VAD) serves as a pivotal technology in the
intelligent surveillance systems, enabling the temporal or spatial
identification of anomalous events within videos. While existing reviews
predominantly concentrate on conventional unsupervised methods, they often
overlook the emergence of weakly-supervised and fully-unsupervised approaches.
To address this gap, this survey extends the conventional scope of VAD beyond
unsupervised methods, encompassing a broader spectrum termed Generalized Video
Anomaly Event Detection (GVAED). By skillfully incorporating recent
advancements rooted in diverse assumptions and learning frameworks, this survey
introduces an intuitive taxonomy that seamlessly navigates through
unsupervised, weakly-supervised, supervised and fully-unsupervised VAD
methodologies, elucidating the distinctions and interconnections within these
research trajectories. In addition, this survey facilitates prospective
researchers by assembling a compilation of research resources, including public
datasets, available codebases, programming tools, and pertinent literature.
Furthermore, this survey quantitatively assesses model performance, delves into
research challenges and directions, and outlines potential avenues for future
exploration.Comment: Accepted by ACM Computing Surveys. For more information, please see
our project page: https://github.com/fudanyliu/GVAE
Analysis domain model for shared virtual environments
The field of shared virtual environments, which also
encompasses online games and social 3D environments, has a
system landscape consisting of multiple solutions that share great functional overlap. However, there is little system interoperability between the different solutions. A shared virtual environment has an associated problem domain that is highly complex raising difficult challenges to the development process, starting with the architectural design of the underlying system. This paper has two main contributions. The first contribution is a broad domain analysis of shared virtual environments, which enables developers to have a better understanding of the whole rather than the part(s). The second contribution is a reference domain model for discussing and describing solutions - the Analysis Domain Model
Efficient Min-cost Flow Tracking with Bounded Memory and Computation
This thesis is a contribution to solving multi-target tracking in an optimal fashion for real-time demanding computer vision applications. We introduce a challenging benchmark, recorded with our autonomous driving platform AnnieWAY. Three main challenges of tracking are addressed: Solving the data association (min-cost flow) problem faster than standard solvers, extending this approach to an online setting, and making it real-time capable by a tight approximation of the optimal solution
- …