1,906 research outputs found

    Explicit Edge Inconsistency Evaluation Model for Color-Guided Depth Map Enhancement

    Full text link
    © 2016 IEEE. Color-guided depth enhancement is used to refine depth maps according to the assumption that the depth edges and the color edges at the corresponding locations are consistent. In methods on such low-level vision tasks, the Markov random field (MRF), including its variants, is one of the major approaches that have dominated this area for several years. However, the assumption above is not always true. To tackle the problem, the state-of-the-art solutions are to adjust the weighting coefficient inside the smoothness term of the MRF model. These methods lack an explicit evaluation model to quantitatively measure the inconsistency between the depth edge map and the color edge map, so they cannot adaptively control the efforts of the guidance from the color image for depth enhancement, leading to various defects such as texture-copy artifacts and blurring depth edges. In this paper, we propose a quantitative measurement on such inconsistency and explicitly embed it into the smoothness term. The proposed method demonstrates promising experimental results compared with the benchmark and state-of-the-art methods on the Middlebury ToF-Mark, and NYU data sets

    Person re-Identification over distributed spaces and time

    Get PDF
    PhDReplicating the human visual system and cognitive abilities that the brain uses to process the information it receives is an area of substantial scientific interest. With the prevalence of video surveillance cameras a portion of this scientific drive has been into providing useful automated counterparts to human operators. A prominent task in visual surveillance is that of matching people between disjoint camera views, or re-identification. This allows operators to locate people of interest, to track people across cameras and can be used as a precursory step to multi-camera activity analysis. However, due to the contrasting conditions between camera views and their effects on the appearance of people re-identification is a non-trivial task. This thesis proposes solutions for reducing the visual ambiguity in observations of people between camera views This thesis first looks at a method for mitigating the effects on the appearance of people under differing lighting conditions between camera views. This thesis builds on work modelling inter-camera illumination based on known pairs of images. A Cumulative Brightness Transfer Function (CBTF) is proposed to estimate the mapping of colour brightness values based on limited training samples. Unlike previous methods that use a mean-based representation for a set of training samples, the cumulative nature of the CBTF retains colour information from underrepresented samples in the training set. Additionally, the bi-directionality of the mapping function is explored to try and maximise re-identification accuracy by ensuring samples are accurately mapped between cameras. Secondly, an extension is proposed to the CBTF framework that addresses the issue of changing lighting conditions within a single camera. As the CBTF requires manually labelled training samples it is limited to static lighting conditions and is less effective if the lighting changes. This Adaptive CBTF (A-CBTF) differs from previous approaches that either do not consider lighting change over time, or rely on camera transition time information to update. By utilising contextual information drawn from the background in each camera view, an estimation of the lighting change within a single camera can be made. This background lighting model allows the mapping of colour information back to the original training conditions and thus remove the need for 3 retraining. Thirdly, a novel reformulation of re-identification as a ranking problem is proposed. Previous methods use a score based on a direct distance measure of set features to form a correct/incorrect match result. Rather than offering an operator a single outcome, the ranking paradigm is to give the operator a ranked list of possible matches and allow them to make the final decision. By utilising a Support Vector Machine (SVM) ranking method, a weighting on the appearance features can be learned that capitalises on the fact that not all image features are equally important to re-identification. Additionally, an Ensemble-RankSVM is proposed to address scalability issues by separating the training samples into smaller subsets and boosting the trained models. Finally, the thesis looks at a practical application of the ranking paradigm in a real world application. The system encompasses both the re-identification stage and the precursory extraction and tracking stages to form an aid for CCTV operators. Segmentation and detection are combined to extract relevant information from the video, while several combinations of matching techniques are combined with temporal priors to form a more comprehensive overall matching criteria. The effectiveness of the proposed approaches is tested on datasets obtained from a variety of challenging environments including offices, apartment buildings, airports and outdoor public spaces

    Component-based synthesis of motion planning algorithms

    Get PDF
    Combinatory Logic Synthesis generates data or runnable programs according to formal type specifications. Synthesis results are composed based on a user-specified repository of components, which brings several advantages for representing spaces of high variability. This work suggests strategies to manage the resulting variations by proposing a domain-specific brute-force search and a machine learning-based optimization procedure. The brute-force search involves the iterative generation and evaluation of machining strategies. In contrast, machine learning optimization uses statistical models to enable the exploration of the design space. The approaches involve synthesizing programs and meta-programs that manipulate, run, and evaluate programs. The methodologies are applied to the domain of motion planning algorithms, and they include the configuration of programs belonging to different algorithmic families. The study of the domain led to the identification of variability points and possible variations. Proof-of-concept repositories represent these variability points and incorporate them into their semantic structure. The selected algorithmic families involve specific computation steps or data structures, and corresponding software components represent possible variations. Experimental results demonstrate that CLS enables synthesis-driven domain-specific optimization procedures to solve complex problems by exploring spaces of high variability.Combinatory Logic Synthesis (CLS) generiert Daten oder lauffähige Programme anhand von formalen Typspezifikationen. Die Ergebnisse der Synthese werden auf Basis eines benutzerdefinierten Repositories von Komponenten zusammengestellt, was diverse Vorteile für die Beschreibung von Räumen mit hoher Variabilität mit sich bringt. Diese Arbeit stellt Strategien für den Umgang mit den resultierenden Variationen vor, indem eine domänen-spezifische Brute-Force Suche und ein maschinelles Lernverfahren für die Untersuchung eines Optimierungsproblems aufgezeigt werden. Die Brute-Force Suche besteht aus der iterativen Generierung und Evaluation von Frässtrategien. Im Gegensatz dazu nutzt der Optimierungsansatz statistische Modelle zur Erkundung des Entwurfsraums. Beide Ansätze synthetisieren Programme und Metaprogramme, welche Programme bearbeiten, ausführen und evaluieren. Diese Methoden werden auf die Domäne der Bewegungsplanungsalgorithmen angewendet und sie beinhalten die Konfiguration von Programmen, welche zu unterschiedlichen algorithmischen Familien gehören. Die Untersuchung der Domäne führte zur Identifizierung der Variabilitätspunkte und der möglichen Variationen. Entsprechende Proof of Concept Implementierungen in Form von Repositories repräsentieren jene Variabilitätspunkte und beziehen diese in ihre semantische Struktur ein. Die gewählten algorithmischen Familien sehen bestimmte Berechnungsschritte oder Datenstrukturen vor, und entsprechende Software Komponenten stellen mögliche Variationen dar. Versuchsergebnisse belegen, dass CLS synthese-getriebene domänenspezifische Optimierungsverfahren ermöglicht, welche komplexe Probleme durch die Exploration von Räumen hoher Variabilität lösen

    Physical Activity Recognition and Identification System

    Get PDF
    Background: It is well-established that physical activity is beneficial to health. It is less known how the characteristics of physical activity impact health independently of total amount. This is due to the inability to measure these characteristics in an objective way that can be applied to large population groups. Accelerometry allows for objective monitoring of physical activity but is currently unable to identify type of physical activity accurately. Methods: This thesis details the creation of an activity classifier that can identify type from accelerometer data. The current research in activity classification was reviewed and methodological challenges were identified. The main challenge was the inability of classifiers to generalize to unseen data. Creating methods to mitigate this lack of generalisation represents the bulk of this thesis. Using the review, a classification pipeline was synthesised, representing the sequence of steps that all activity classifiers use. 1. Determination of device location and setting (Chapter 4) 2. Pre-processing (Chapter 5) 3. Segmenting into windows (Chapters 6) 4. Extracting features (Chapters 7,8) 5. Creating the classifier (Chapter 9) 6. Post-processing (Chapter 5) For each of these steps, methods were created and tested that allowed for a high level of generalisability without sacrificing overall performance. Results: The work in this thesis results in an activity classifier that had a good ability to generalize to unseen data. The classifier achieved an F1-score of 0.916 and 0.826 on data similar to its training data, which is statistically equivalent to the performance of current state of the art models (0.898, 0.765). On data dissimilar to its training data, the classifier achieved a significantly higher performance than current state of the art methods (0.759, 0.897 versus 0.352, 0.415). This shows that the classifier created in this work has a significantly greater ability to generalise to unseen data than current methods. Conclusion: This thesis details the creation of an activity classifier that allows for an improved ability to generalize to unseen data, thus allowing for identification of type from acceleration data. This should allow for more detailed investigation into the specific health effects of type in large population studies utilising accelerometers

    Modeling DNN as human learner

    Get PDF
    In previous experiments, human listeners demonstrated that they had the ability to adapt to unheard, ambiguous phonemes after some initial, relatively short exposures. At the same time, previous work in the speech community has shown that pre-trained deep neural network-based (DNN) ASR systems, like humans, also have the ability to adapt to unseen, ambiguous phonemes after retuning their parameters on a relatively small set. In the first part of this thesis, the time-course of phoneme category adaptation in a DNN is investigated in more detail. By retuning the DNNs with more and more tokens with ambiguous sounds and comparing classification accuracy of the ambiguous phonemes in a held-out test across the time-course, we found out that DNNs, like human listeners, also demonstrated fast adaptation: the accuracy curves were step-like in almost all cases, showing very little adaptation after seeing only one (out of ten) training bins. However, unlike our experimental setup mentioned above, in a typical lexically guided perceptual learning experiment, listeners are trained with individual words instead of individual phones, and thus to truly model such a scenario, we would require a model that could take the context of a whole utterance into account. Traditional speech recognition systems accomplish this through the use of hidden Markov models (HMM) and WFST decoding. In recent years, bidirectional long short-term memory (Bi-LSTM) trained under connectionist temporal classification (CTC) criterion has also attracted much attention. In the second part of this thesis, previous experiments on ambiguous phoneme recognition were carried out again on a new Bi-LSTM model, and phonetic transcriptions of words ending with ambiguous phonemes were used as training targets, instead of individual sounds that consisted of a single phoneme. We found out that despite the vastly different architecture, the new model showed highly similar behavior in terms of classification rate over the time course of incremental retuning. This indicated that ambiguous phonemes in a continuous context could also be quickly adapted by neural network-based models. In the last part of this thesis, our pre-trained Dutch Bi-LSTM from the previous part was treated as a Dutch second language learner and was asked to transcribe English utterances in a self-adaptation scheme. In other words, we used the Dutch model to generate phonetic transcriptions directly and retune the model on the transcriptions it generated, although ground truth transcriptions were used to choose a subset of all self-labeled transcriptions. Self-adaptation is of interest as a model of human second language learning, but also has great practical engineering value, e.g., it could be used to adapt speech recognition to a lowr-resource language. We investigated two ways to improve the adaptation scheme, with the first being multi-task learning with articulatory feature detection during training the model on Dutch and self-labeled adaptation, and the second being first letting the model adapt to isolated short words before feeding it with longer utterances.Ope

    Control of walking behavior by horizontal optic flow detectors in Drosophila

    Get PDF

    Improving Deep Reinforcement Learning Using Graph Convolution and Visual Domain Transfer

    Get PDF
    Recent developments in Deep Reinforcement Learning (DRL) have shown tremendous progress in robotics control, Atari games, board games such as Go, etc. However, model free DRL still has limited use cases due to its poor sampling efficiency and generalization on a variety of tasks. In this thesis, two particular drawbacks of DRL are investigated: 1) the poor generalization abilities of model free DRL. More specifically, how to generalize an agent\u27s policy to unseen environments and generalize to task performance on different data representations (e.g. image based or graph based) 2) The reality gap issue in DRL. That is, how to effectively transfer a policy learned in a simulator to the real world. This thesis makes several novel contributions to the field of DRL which are outlined sequentially in the following. Among these contributions is the generalized value iteration network (GVIN) algorithm, which is an end-to-end neural network planning module extending the work of Value Iteration Networks (VIN). GVIN emulates the value iteration algorithm by using a novel graph convolution operator, which enables GVIN to learn and plan on irregular spatial graphs. Additionally, this thesis proposes three novel, differentiable kernels as graph convolution operators and shows that the embedding-based kernel achieves the best performance. Furthermore, an improvement upon traditional nn-step QQ-learning that stabilizes training for VIN and GVIN is demonstrated. Additionally, the equivalence between GVIN and graph neural networks is outlined and shown that GVIN can be further extended to address both control and inference problems. The final subject which falls under the graph domain that is studied in this thesis is graph embeddings. Specifically, this work studies a general graph embedding framework GEM-F that unifies most of the previous graph embedding algorithms. Based on the contributions made during the analysis of GEM-F, a novel algorithm called WarpMap which outperforms DeepWalk and node2vec in the unsupervised learning settings is proposed. The aforementioned reality gap in DRL prohibits a significant portion of research from reaching the real world setting. The latter part of this work studies and analyzes domain transfer techniques in an effort to bridge this gap. Typically, domain transfer in RL consists of representation transfer and policy transfer. In this work, the focus is on representation transfer for vision based applications. More specifically, aligning the feature representation from source domain to target domain in an unsupervised fashion. In this approach, a linear mapping function is considered to fuse modules that are trained in different domains. Proposed are two improved adversarial learning methods to enhance the training quality of the mapping function. Finally, the thesis demonstrates the effectiveness of domain alignment among different weather conditions in the CARLA autonomous driving simulator
    corecore