419 research outputs found
Automatic vehicle tracking and recognition from aerial image sequences
This paper addresses the problem of automated vehicle tracking and
recognition from aerial image sequences. Motivated by its successes in the
existing literature focus on the use of linear appearance subspaces to describe
multi-view object appearance and highlight the challenges involved in their
application as a part of a practical system. A working solution which includes
steps for data extraction and normalization is described. In experiments on
real-world data the proposed methodology achieved promising results with a high
correct recognition rate and few, meaningful errors (type II errors whereby
genuinely similar targets are sometimes being confused with one another).
Directions for future research and possible improvements of the proposed method
are discussed
Information and knowing when to forget it
In this paper we propose several novel approaches for incorporating forgetting mechanisms into sequential prediction based machine learning algorithms. The broad premise of our work, supported and motivated in part by recent findings stemming from neurology research on the development of human brains, is that knowledge acquisition and forgetting are complementary processes, and that learning can (perhaps unintuitively) benefit from the latter too. We demonstrate that if forgetting is implemented in a purposeful and date driven manner, there are a number of benefits which can be gained from discarding information. The framework we introduce is a general one and can be used with any baseline predictor of choice. Hence in this sense it is best described as a meta-algorithm. The method we described was developed through a series of steps which increase the adaptability of the model, while being data driven.We first discussed a weakly adaptive forgetting process which we termed passive forgetting. A fully adaptive framework, which we termed active forgetting was developed by enveloping a passive forgetting process with a monitoring, self-aware module which detects contextual changes and makes a statistically informed choice when the model parameters should be abruptly rather than gradually updated. The effectiveness of the proposed metaframework was demonstrated on a real world data set concerned with a challenge of major practical importance: that of predicting currency exchange rates. Our approach was shown to be highly effective, reducing prediction errors by nearly 40%.Postprin
Towards computer vision based ancient coin recognition in the wild — automatic reliable image preprocessing and normalization
As an attractive area of application in the sphere of cultural heritage, in recent years automatic analysis of ancient coins has been attracting an increasing amount of research attention from the computer vision community. Recent work has demonstrated that the existing state of the art performs extremely poorly when applied on images acquired in realistic conditions. One of the reasons behind this lies in the (often implicit) assumptions made by many of the proposed algorithms — a lack of background clutter, and a uniform scale, orientation, and translation of coins across different images. These assumptions are not satisfied by default and before any further progress in the realm of more complex analysis is made, a robust method capable of preprocessing and normalizing images of coins acquired ‘in the wild’ is needed. In this paper we introduce an algorithm capable of localizing and accurately segmenting out a coin from a cluttered image acquired by an amateur collector. Specifically, we propose a two stage approach which first uses a simple shape hypothesis to localize the coin roughly and then arrives at the final, accurate result by refining this initial estimate using a statistical model learnt from large amounts of data. Our results on data collected ‘in the wild’ demonstrate excellent accuracy even when the proposed algorithm is applied on highly challenging images.Postprin
Motion Segment Decomposition of RGB-D Sequences for Human Behavior Understanding
International audienceIn this paper, we propose a framework for analyzing and understanding human behavior from depth videos. The proposed solution first employs shape analysis of the human pose across time to decompose the full motion into short temporal segments representing elementary motions. Then, each segment is characterized by human motion and depth appearance around hand joints to describe the change in pose of the body and the interaction with objects. Finally , the sequence of temporal segments is modeled through a Dynamic Naive Bayes classifier, which captures the dynamics of elementary motions characterizing human behavior. Experiments on four challenging datasets evaluate the potential of the proposed approach in different contexts, including gesture or activity recognition and online activity detection. Competitive results in comparison with state of the art methods are reported
Deep adaptive anomaly detection using an active learning framework
Anomaly detection is the process of finding unusual events in a given dataset. Anomaly detection is often performed on datasets with a fixed set of predefined features. As a result of this, if the normal features bear a close resemblance to the anomalous features, most anomaly detection algorithms exhibit poor performance. This work seeks to answer the question, can we deform these features so as to make the anomalies standout and hence improve the anomaly detection outcome? We employ a Deep Learning and an Active Learning framework to learn features for anomaly detection. In Active Learning, an Oracle (usually a domain expert) labels a small amount of data over a series of training rounds. The deep neural network is trained after each round to incorporate the feedback from the Oracle into the model. Results on the MNIST, CIFAR-10 and Galaxy Zoo datasets show that our algorithm, Ahunt, significantly outperforms other anomaly detection algorithms used on a fixed, static, set of features. Ahunt can therefore overcome a poor choice of features that happen to be suboptimal for detecting anomalies in the data, learning more appropriate features. We also explore the role of the loss function and Active Learning query strategy, showing these are important, especially when there is a significant variation in the anomalies
Diagnosis prediction from electronic health records (EHR) using the binary diagnosis history vector representation
Large amounts of rich, heterogeneous information nowadays routinely collected by health care providers across the world possess remarkable potential for the extraction of novel medical data and the assessment of different practices in real-world conditions. Specifically in this work our goal is to use Electronic Health Records (EHRs) to predict progression patterns of future diagnoses of ailments for a particular patient, given the patient’s present diagnostic history. Following the highly promising results of a recently proposed approach which introduced the diagnosis history vector representation of a patient’s diagnostic record, we introduce a series of improvements to the model and conduct thorough experiments that demonstrate its scalability, accuracy, and practicability in the clinical context. We show that the model is able to capture well the interaction between a large number of ailments which correspond to the most frequent diagnoses, show how the original learning framework can be adapted to increase its prediction specificity, and describe a principled, probabilistic method for incorporating explicit, human clinical knowledge to overcome semantic limitations of the raw EHR data.PostprintPeer reviewe
FROM VISUAL SALIENCY TO VIDEO BEHAVIOUR UNDERSTANDING
In a world of ever increasing amounts of video data, we are forced to abandon traditional
methods of scene interpretation by fully manual means. Under such circumstances, some form
of automation is highly desirable but this can be a very open ended issue with high complexity.
Dealing with such large amounts of data is a non-trivial task that requires efficient selective
extraction of parts of a scene which have the potential to develop a higher semantic meaning,
alone, or in combination with others. In particular, the types of video data that are in
need of automated analysis tend to be outdoor scenes with high levels of activity generated
from either foreground or background. Such dynamic scenes add considerable complexity
to the problem since we cannot rely on motion energy alone to detect regions of interest.
Furthermore, the behaviour of these regions of motion can differ greatly, while still being
highly dependent, both spatially and temporally on the movement of other objects within
the scene. Modelling these dependencies, whilst eliminating as much redundancy from the
feature extraction process as possible are the challenges addressed by this thesis.
In the first half, finding the right mechanism to extract and represent meaningful features
from dynamic scenes with no prior knowledge is investigated. Meaningful or salient information
is treated as the parts of a scene that stand out or seem unusual or interesting to
us. The novelty of the work is that it is able to select salient scales in both space and time
in which a particular spatio-temporal volume is considered interesting relative to the rest of
the scene. By quantifying the temporal saliency values of regions of motion, it is possible to
consider their importance in terms of both the long and short-term. Variations in entropy
over spatio-temporal scales are used to select a context dependent measure of the local scene
dynamics. A method of quantifying temporal saliency is devised based on the variation of
the entropy of the intensity distribution in a spatio-temporal volume over incraeasing scales.
Entropy is used over traditional filter methods since the stability or predictability of the intensity
distribution over scales of a local spatio-temporal region can be defined more robustly
relative to the context of its neighbourhood, even for regions exhibiting high intensity variation
due to being extremely textured. Results show that it is possible to extract both locally
salient features as well as globally salient temporal features from contrasting scenerios.
In the second part of the thesis, focus will shift towards binding these spatio-temporally
salient features together so that some semantic meaning can be inferred from their interaction.
Interaction in this sense, refers to any form of temporally correlated behaviour between
any salient regions of motion in a scene. Feature binding as a mechanism for interactive
behaviour understanding is particularly important if we consider that regions of interest may
not be treated as particularly significant individually, but represent much more semantically
when considered in combination. Temporally correlated behaviour is identified and classified
using accumulated co-occurrences of salient features at two levels. Firstly, co-occurrences are
accumulated for spatio-temporally proximate salient features to form a local representation.
Then, at the next level, the co-occurrence of these locally spatio-temporally bound features
are accumulated again in order to discover unusual behaviour in the scene. The novelty of
this work is that there are no assumptions made about whether interacting regions should be
spatially proximate. Furthermore, no prior knowledge of the scene topology is used. Results
show that it is possible to detect unusual interactions between regions of motion, which can
visually infer higher levels of semantics.
In the final part of the thesis, a more specific investigation of human behaviour is addressed
through classification and detection of interactions between 2 human subjects. Here, further
modifications are made to the feature extraction process in order to quantify the spatiotemporal
saliency of a region of motion. These features are then grouped to find the people
in the scene. Then, a loose pose distribution model is extracted for each person for finding
salient correlations between poses of two interacting people using canonical correlation
analysis. These canonical factors can be formed into trajectories and used for classification.
Levenshtein distance is then used to categorise the features. The novelty of the work is that
the interactions do not have to be spatially connected or proximate for them to be recognised.
Furthermore, the data used is outdoors and cluttered with non-stationary background. Results
show that co-occurrence techniques have the potential to provide a more generalised,
compact, and meaningful representation of dynamic interactive scene behaviour.EPRSC, part-funded by QinetiQ Ltd and a travel grant was also contributed by RAEng
Staircases as contextual cues that help minimize energetic costs
Staircase climbs are habitually avoided, and staircase steepness is overestimated. Visual impressions of
staircase slant reliably precede each taxing climb and may act as salient, visual cues, prompting behaviour
that supports an ‘economy of action’ (Proffitt, 2006). The thesis adapted the contextual cueing paradigm
with natural scenes (cf. Brockmole & Henderson, 2006b) to test for search and learning biases by scene
content with staircases. For this, target letters, L and T, were placed near and far from staircases, and in
scenes without staircases (three stimulus categories). Eighteen scenes were repeated across blocks, six of
each stimulus category. Response latencies and eye movements were recorded.
Chapter three investigated search biases in initial eye movements in response to the first presentation of
novel, natural scenes of the three stimulus categories. Findings support the notion that early eye
movements were biased towards the incidental scene content of staircases in 36 novel real-world scenes
(N = 118); this bias was magnified for staircases with more steps, independent of target locations. Chapter
two investigated contextual cueing by content of 18 natural scenes, six of each category, repeated across
eight blocks (N = 64); for 27 of these participants, target locations were changed relative to staircase
location in the ninth block. Steeper learning slopes across the eight repetitions were observed for targets
located near staircases compared to the other stimulus categories. Interruptions to learning, due to
changes in target locations in the ninth block, were a function of the distance to staircase location pre and
post changes, consistent with the observed differential learning. Interruptions were equally strong within
and between two nine-block learning sessions (N = 40) that were separated by a 24-hour break. This
additional finding is obtained from a subsequent contextual cueing study, presented in chapter four, and
speaks to a major involvement of episodic memory in the learning reported in this thesis. In sum, the
findings highlight a capacity of staircase percepts to bias initial visual search, and to facilitate short- and
longer-term associative learning near staircases. Overall, the results suggest staircases may be salient
stimuli for cognitive processes that manage energetic resources
- …