251 research outputs found

    Prior knowledge-based deep learning method for indoor object recognition and application

    Get PDF
    Indoor object recognition is a key task for indoor navigation by mobile robots. Although previous work has produced impressive results in recognizing known and familiar objects, the research of indoor object recognition for robot is still insufficient. In order to improve the detection precision, our study proposed a prior knowledge-based deep learning method aimed to enable the robot to recognize indoor objects on sight. First, we integrate the public Indoor dataset and the private frames of videos (FoVs) dataset to train a convolutional neural network (CNN). Second, mean images, which are used as a type of colour knowledge, are generated for all the classes in the Indoor dataset. The distance between every mean image and the input image produces the class weight vector. Scene knowledge, which consists of frequencies of occurrence of objects in the scene, is then employed as another prior knowledge to determine the scene weight. Finally, when a detection request is launched, the two vectors together with a vector of classification probability instigated by the deep model are multiplied to produce a decision vector for classification. Experiments show that detection precision can be improved by employing the prior colour and scene knowledge. In addition, we applied the method to object recognition in a video. The results showed potential application of the method for robot vision

    Robust gait recognition under variable covariate conditions

    Get PDF
    PhDGait is a weak biometric when compared to face, fingerprint or iris because it can be easily affected by various conditions. These are known as the covariate conditions and include clothing, carrying, speed, shoes and view among others. In the presence of variable covariate conditions gait recognition is a hard problem yet to be solved with no working system reported. In this thesis, a novel gait representation, the Gait Flow Image (GFI), is proposed to extract more discriminative information from a gait sequence. GFI extracts the relative motion of body parts in different directions in separate motion descriptors. Compared to the existing model-free gait representations, GFI is more discriminative and robust to changes in covariate conditions. In this thesis, gait recognition approaches are evaluated without the assumption on cooperative subjects, i.e. both the gallery and the probe sets consist of gait sequences under different and unknown covariate conditions. The results indicate that the performance of the existing approaches drops drastically under this more realistic set-up. It is argued that selecting the gait features which are invariant to changes in covariate conditions is the key to developing a gait recognition system without subject cooperation. To this end, the Gait Entropy Image (GEnI) is proposed to perform automatic feature selection on each pair of gallery and probe gait sequences. Moreover, an Adaptive Component and Discriminant Analysis is formulated which seamlessly integrates the feature selection method with subspace analysis for fast and robust recognition. Among various factors that affect the performance of gait recognition, change in viewpoint poses the biggest problem and is treated separately. A novel approach to address this problem is proposed in this thesis by using Gait Flow Image in a cross view gait recognition framework with the view angle of a probe gait sequence unknown. A Gaussian Process classification technique is formulated to estimate the view angle of each probe gait sequence. To measure the similarity of gait sequences across view angles, the correlation of gait sequences from different views is modelled using Canonical Correlation Analysis and the correlation strength is used as a similarity measure. This differs from existing approaches, which reconstruct gait features in different views through 2D view transformation or 3D calibration. Without explicit reconstruction, the proposed method can cope with feature mis-match across view and is more robust against feature noise

    Contributions to hand gesture recognition on the basis of range data

    Full text link
    Tesis doctoral inédita, leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior. Fecha de lectura: octubre de 2012

    Computer integrated documentation

    Get PDF
    The main technical issues of the Computer Integrated Documentation (CID) project are presented. The problem of automation of documents management and maintenance is analyzed both from an artificial intelligence viewpoint and from a human factors viewpoint. Possible technologies for CID are reviewed: conventional approaches to indexing and information retrieval; hypertext; and knowledge based systems. A particular effort was made to provide an appropriate representation for contextual knowledge. This representation is used to generate context on hypertext links. Thus, indexing in CID is context sensitive. The implementation of the current version of CID is described. It includes a hypertext data base, a knowledge based management and maintenance system, and a user interface. A series is also presented of theoretical considerations as navigation in hyperspace, acquisition of indexing knowledge, generation and maintenance of a large documentation, and relation to other work

    On the synthesis and processing of high quality audio signals by parallel computers

    Get PDF
    This work concerns the application of new computer architectures to the creation and manipulation of high-quality audio bandwidth signals. The configuration of both the hardware and software in such systems falls under consideration in the three major sections which present increasing levels of algorithmic concurrency. In the first section, the programs which are described are distributed in identical copies across an array of processing elements; these programs run autonomously, generating data independently, but with control parameters peculiar to each copy: this type of concurrency is referred to as isonomic}The central section presents a structure which distributes tasks across an arbitrary network of processors; the flow of control in such a program is quasi- indeterminate, and controlled on a demand basis by the rate of completion of the slave tasks and their irregular interaction with the master. Whilst that interaction is, in principle, deterministic, it is also data-dependent; the dynamic nature of task allocation demands that no a priori knowledge of the rate of task completion be required. This type of concurrency is called dianomic? Finally, an architecture is described which will support a very high level of algorithmic concurrency. The programs which make efficient use of such a machine are designed not by considering flow of control, but by considering flow of data. Each atomic algorithmic unit is made as simple as possible, which results in the extensive distribution of a program over very many processing elements. Programs designed by considering only the optimum data exchange routes are said to exhibit systolic^ concurrency. Often neglected in the study of system design are those provisions necessary for practical implementations. It was intended to provide users with useful application programs in fulfilment of this study; the target group is electroacoustic composers, who use digital signal processing techniques in the context of musical composition. Some of the algorithms in use in this field are highly complex, often requiring a quantity of processing for each sample which exceeds that currently available even from very powerful computers. Consequently, applications tend to operate not in 'real-time' (where the output of a system responds to its input apparently instantaneously), but by the manipulation of sounds recorded digitally on a mass storage device. The first two sections adopt existing, public-domain software, and seek to increase its speed of execution significantly by parallel techniques, with the minimum compromise of functionality and ease of use. Those chosen are the general- purpose direct synthesis program CSOUND, from M.I.T., and a stand-alone phase vocoder system from the C.D.P..(^4) In each case, the desired aim is achieved: to increase speed of execution by two orders of magnitude over the systems currently in use by composers. This requires substantial restructuring of the programs, and careful consideration of the best computer architectures on which they are to run concurrently. The third section examines the rationale behind the use of computers in music, and begins with the implementation of a sophisticated electronic musical instrument capable of a degree of expression at least equal to its acoustic counterparts. It seems that the flexible control of such an instrument demands a greater computing resource than the sound synthesis part. A machine has been constructed with the intention of enabling the 'gestural capture' of performance information in real-time; the structure of this computer, which has one hundred and sixty high-performance microprocessors running in parallel, is expounded; and the systolic programming techniques required to take advantage of such an array are illustrated in the Occam programming language

    From Pixels to Spikes: Efficient Multimodal Learning in the Presence of Domain Shift

    Get PDF
    Computer vision aims to provide computers with a conceptual understanding of images or video by learning a high-level representation. This representation is typically derived from the pixel domain (i.e., RGB channels) for tasks such as image classification or action recognition. In this thesis, we explore how RGB inputs can either be pre-processed or supplemented with other compressed visual modalities, in order to improve the accuracy-complexity tradeoff for various computer vision tasks. Beginning with RGB-domain data only, we propose a multi-level, Voronoi based spatial partitioning of images, which are individually processed by a convolutional neural network (CNN), to improve the scale invariance of the embedding. We combine this with a novel and efficient approach for optimal bit allocation within the quantized cell representations. We evaluate this proposal on the content-based image retrieval task, which constitutes finding similar images in a dataset to a given query. We then move to the more challenging domain of action recognition, where a video sequence is classified according to its constituent action. In this case, we demonstrate how the RGB modality can be supplemented with a flow modality, comprising motion vectors extracted directly from the video codec. The motion vectors (MVs) are used both as input to a CNN and as an activity sensor for providing selective macroblock (MB) decoding of RGB frames instead of full-frame decoding. We independently train two CNNs on RGB and MV correspondences and then fuse their scores during inference, demonstrating faster end-to-end processing and competitive classification accuracy to recent work. In order to explore the use of more efficient sensing modalities, we replace the MV stream with a neuromorphic vision sensing (NVS) stream for action recognition. NVS hardware mimics the biological retina and operates with substantially lower power and at significantly higher sampling rates than conventional active pixel sensing (APS) cameras. Due to the lack of training data in this domain, we generate emulated NVS frames directly from consecutive RGB frames and use these to train a teacher-student framework that additionally leverages on the abundance of optical flow training data. In the final part of this thesis, we introduce a novel unsupervised domain adaptation method for further minimizing the domain shift between emulated (source) and real (target) NVS data domains
    • …