419 research outputs found

    Automatic vehicle tracking and recognition from aerial image sequences

    Full text link
    This paper addresses the problem of automated vehicle tracking and recognition from aerial image sequences. Motivated by its successes in the existing literature focus on the use of linear appearance subspaces to describe multi-view object appearance and highlight the challenges involved in their application as a part of a practical system. A working solution which includes steps for data extraction and normalization is described. In experiments on real-world data the proposed methodology achieved promising results with a high correct recognition rate and few, meaningful errors (type II errors whereby genuinely similar targets are sometimes being confused with one another). Directions for future research and possible improvements of the proposed method are discussed

    Information and knowing when to forget it

    Get PDF
    In this paper we propose several novel approaches for incorporating forgetting mechanisms into sequential prediction based machine learning algorithms. The broad premise of our work, supported and motivated in part by recent findings stemming from neurology research on the development of human brains, is that knowledge acquisition and forgetting are complementary processes, and that learning can (perhaps unintuitively) benefit from the latter too. We demonstrate that if forgetting is implemented in a purposeful and date driven manner, there are a number of benefits which can be gained from discarding information. The framework we introduce is a general one and can be used with any baseline predictor of choice. Hence in this sense it is best described as a meta-algorithm. The method we described was developed through a series of steps which increase the adaptability of the model, while being data driven.We first discussed a weakly adaptive forgetting process which we termed passive forgetting. A fully adaptive framework, which we termed active forgetting was developed by enveloping a passive forgetting process with a monitoring, self-aware module which detects contextual changes and makes a statistically informed choice when the model parameters should be abruptly rather than gradually updated. The effectiveness of the proposed metaframework was demonstrated on a real world data set concerned with a challenge of major practical importance: that of predicting currency exchange rates. Our approach was shown to be highly effective, reducing prediction errors by nearly 40%.Postprin

    Towards computer vision based ancient coin recognition in the wild — automatic reliable image preprocessing and normalization

    Get PDF
    As an attractive area of application in the sphere of cultural heritage, in recent years automatic analysis of ancient coins has been attracting an increasing amount of research attention from the computer vision community. Recent work has demonstrated that the existing state of the art performs extremely poorly when applied on images acquired in realistic conditions. One of the reasons behind this lies in the (often implicit) assumptions made by many of the proposed algorithms — a lack of background clutter, and a uniform scale, orientation, and translation of coins across different images. These assumptions are not satisfied by default and before any further progress in the realm of more complex analysis is made, a robust method capable of preprocessing and normalizing images of coins acquired ‘in the wild’ is needed. In this paper we introduce an algorithm capable of localizing and accurately segmenting out a coin from a cluttered image acquired by an amateur collector. Specifically, we propose a two stage approach which first uses a simple shape hypothesis to localize the coin roughly and then arrives at the final, accurate result by refining this initial estimate using a statistical model learnt from large amounts of data. Our results on data collected ‘in the wild’ demonstrate excellent accuracy even when the proposed algorithm is applied on highly challenging images.Postprin

    Motion Segment Decomposition of RGB-D Sequences for Human Behavior Understanding

    Get PDF
    International audienceIn this paper, we propose a framework for analyzing and understanding human behavior from depth videos. The proposed solution first employs shape analysis of the human pose across time to decompose the full motion into short temporal segments representing elementary motions. Then, each segment is characterized by human motion and depth appearance around hand joints to describe the change in pose of the body and the interaction with objects. Finally , the sequence of temporal segments is modeled through a Dynamic Naive Bayes classifier, which captures the dynamics of elementary motions characterizing human behavior. Experiments on four challenging datasets evaluate the potential of the proposed approach in different contexts, including gesture or activity recognition and online activity detection. Competitive results in comparison with state of the art methods are reported

    Deep adaptive anomaly detection using an active learning framework

    Get PDF
    Anomaly detection is the process of finding unusual events in a given dataset. Anomaly detection is often performed on datasets with a fixed set of predefined features. As a result of this, if the normal features bear a close resemblance to the anomalous features, most anomaly detection algorithms exhibit poor performance. This work seeks to answer the question, can we deform these features so as to make the anomalies standout and hence improve the anomaly detection outcome? We employ a Deep Learning and an Active Learning framework to learn features for anomaly detection. In Active Learning, an Oracle (usually a domain expert) labels a small amount of data over a series of training rounds. The deep neural network is trained after each round to incorporate the feedback from the Oracle into the model. Results on the MNIST, CIFAR-10 and Galaxy Zoo datasets show that our algorithm, Ahunt, significantly outperforms other anomaly detection algorithms used on a fixed, static, set of features. Ahunt can therefore overcome a poor choice of features that happen to be suboptimal for detecting anomalies in the data, learning more appropriate features. We also explore the role of the loss function and Active Learning query strategy, showing these are important, especially when there is a significant variation in the anomalies

    Diagnosis prediction from electronic health records (EHR) using the binary diagnosis history vector representation

    Get PDF
    Large amounts of rich, heterogeneous information nowadays routinely collected by health care providers across the world possess remarkable potential for the extraction of novel medical data and the assessment of different practices in real-world conditions. Specifically in this work our goal is to use Electronic Health Records (EHRs) to predict progression patterns of future diagnoses of ailments for a particular patient, given the patient’s present diagnostic history. Following the highly promising results of a recently proposed approach which introduced the diagnosis history vector representation of a patient’s diagnostic record, we introduce a series of improvements to the model and conduct thorough experiments that demonstrate its scalability, accuracy, and practicability in the clinical context. We show that the model is able to capture well the interaction between a large number of ailments which correspond to the most frequent diagnoses, show how the original learning framework can be adapted to increase its prediction specificity, and describe a principled, probabilistic method for incorporating explicit, human clinical knowledge to overcome semantic limitations of the raw EHR data.PostprintPeer reviewe

    FROM VISUAL SALIENCY TO VIDEO BEHAVIOUR UNDERSTANDING

    Get PDF
    In a world of ever increasing amounts of video data, we are forced to abandon traditional methods of scene interpretation by fully manual means. Under such circumstances, some form of automation is highly desirable but this can be a very open ended issue with high complexity. Dealing with such large amounts of data is a non-trivial task that requires efficient selective extraction of parts of a scene which have the potential to develop a higher semantic meaning, alone, or in combination with others. In particular, the types of video data that are in need of automated analysis tend to be outdoor scenes with high levels of activity generated from either foreground or background. Such dynamic scenes add considerable complexity to the problem since we cannot rely on motion energy alone to detect regions of interest. Furthermore, the behaviour of these regions of motion can differ greatly, while still being highly dependent, both spatially and temporally on the movement of other objects within the scene. Modelling these dependencies, whilst eliminating as much redundancy from the feature extraction process as possible are the challenges addressed by this thesis. In the first half, finding the right mechanism to extract and represent meaningful features from dynamic scenes with no prior knowledge is investigated. Meaningful or salient information is treated as the parts of a scene that stand out or seem unusual or interesting to us. The novelty of the work is that it is able to select salient scales in both space and time in which a particular spatio-temporal volume is considered interesting relative to the rest of the scene. By quantifying the temporal saliency values of regions of motion, it is possible to consider their importance in terms of both the long and short-term. Variations in entropy over spatio-temporal scales are used to select a context dependent measure of the local scene dynamics. A method of quantifying temporal saliency is devised based on the variation of the entropy of the intensity distribution in a spatio-temporal volume over incraeasing scales. Entropy is used over traditional filter methods since the stability or predictability of the intensity distribution over scales of a local spatio-temporal region can be defined more robustly relative to the context of its neighbourhood, even for regions exhibiting high intensity variation due to being extremely textured. Results show that it is possible to extract both locally salient features as well as globally salient temporal features from contrasting scenerios. In the second part of the thesis, focus will shift towards binding these spatio-temporally salient features together so that some semantic meaning can be inferred from their interaction. Interaction in this sense, refers to any form of temporally correlated behaviour between any salient regions of motion in a scene. Feature binding as a mechanism for interactive behaviour understanding is particularly important if we consider that regions of interest may not be treated as particularly significant individually, but represent much more semantically when considered in combination. Temporally correlated behaviour is identified and classified using accumulated co-occurrences of salient features at two levels. Firstly, co-occurrences are accumulated for spatio-temporally proximate salient features to form a local representation. Then, at the next level, the co-occurrence of these locally spatio-temporally bound features are accumulated again in order to discover unusual behaviour in the scene. The novelty of this work is that there are no assumptions made about whether interacting regions should be spatially proximate. Furthermore, no prior knowledge of the scene topology is used. Results show that it is possible to detect unusual interactions between regions of motion, which can visually infer higher levels of semantics. In the final part of the thesis, a more specific investigation of human behaviour is addressed through classification and detection of interactions between 2 human subjects. Here, further modifications are made to the feature extraction process in order to quantify the spatiotemporal saliency of a region of motion. These features are then grouped to find the people in the scene. Then, a loose pose distribution model is extracted for each person for finding salient correlations between poses of two interacting people using canonical correlation analysis. These canonical factors can be formed into trajectories and used for classification. Levenshtein distance is then used to categorise the features. The novelty of the work is that the interactions do not have to be spatially connected or proximate for them to be recognised. Furthermore, the data used is outdoors and cluttered with non-stationary background. Results show that co-occurrence techniques have the potential to provide a more generalised, compact, and meaningful representation of dynamic interactive scene behaviour.EPRSC, part-funded by QinetiQ Ltd and a travel grant was also contributed by RAEng

    Staircases as contextual cues that help minimize energetic costs

    Get PDF
    Staircase climbs are habitually avoided, and staircase steepness is overestimated. Visual impressions of staircase slant reliably precede each taxing climb and may act as salient, visual cues, prompting behaviour that supports an ‘economy of action’ (Proffitt, 2006). The thesis adapted the contextual cueing paradigm with natural scenes (cf. Brockmole & Henderson, 2006b) to test for search and learning biases by scene content with staircases. For this, target letters, L and T, were placed near and far from staircases, and in scenes without staircases (three stimulus categories). Eighteen scenes were repeated across blocks, six of each stimulus category. Response latencies and eye movements were recorded. Chapter three investigated search biases in initial eye movements in response to the first presentation of novel, natural scenes of the three stimulus categories. Findings support the notion that early eye movements were biased towards the incidental scene content of staircases in 36 novel real-world scenes (N = 118); this bias was magnified for staircases with more steps, independent of target locations. Chapter two investigated contextual cueing by content of 18 natural scenes, six of each category, repeated across eight blocks (N = 64); for 27 of these participants, target locations were changed relative to staircase location in the ninth block. Steeper learning slopes across the eight repetitions were observed for targets located near staircases compared to the other stimulus categories. Interruptions to learning, due to changes in target locations in the ninth block, were a function of the distance to staircase location pre and post changes, consistent with the observed differential learning. Interruptions were equally strong within and between two nine-block learning sessions (N = 40) that were separated by a 24-hour break. This additional finding is obtained from a subsequent contextual cueing study, presented in chapter four, and speaks to a major involvement of episodic memory in the learning reported in this thesis. In sum, the findings highlight a capacity of staircase percepts to bias initial visual search, and to facilitate short- and longer-term associative learning near staircases. Overall, the results suggest staircases may be salient stimuli for cognitive processes that manage energetic resources
    • …
    corecore