220 research outputs found

    Automatic annotation for weakly supervised learning of detectors

    Get PDF
    PhDObject detection in images and action detection in videos are among the most widely studied computer vision problems, with applications in consumer photography, surveillance, and automatic media tagging. Typically, these standard detectors are fully supervised, that is they require a large body of training data where the locations of the objects/actions in images/videos have been manually annotated. With the emergence of digital media, and the rise of high-speed internet, raw images and video are available for little to no cost. However, the manual annotation of object and action locations remains tedious, slow, and expensive. As a result there has been a great interest in training detectors with weak supervision where only the presence or absence of object/action in image/video is needed, not the location. This thesis presents approaches for weakly supervised learning of object/action detectors with a focus on automatically annotating object and action locations in images/videos using only binary weak labels indicating the presence or absence of object/action in images/videos. First, a framework for weakly supervised learning of object detectors in images is presented. In the proposed approach, a variation of multiple instance learning (MIL) technique for automatically annotating object locations in weakly labelled data is presented which, unlike existing approaches, uses inter-class and intra-class cue fusion to obtain the initial annotation. The initial annotation is then used to start an iterative process in which standard object detectors are used to refine the location annotation. Finally, to ensure that the iterative training of detectors do not drift from the object of interest, a scheme for detecting model drift is also presented. Furthermore, unlike most other methods, our weakly supervised approach is evaluated on data without manual pose (object orientation) annotation. Second, an analysis of the initial annotation of objects, using inter-class and intra-class cues, is carried out. From the analysis, a new method based on negative mining (NegMine) is presented for the initial annotation of both object and action data. The NegMine based approach is a much simpler formulation using only inter-class measure and requires no complex combinatorial optimisation but can still meet or outperform existing approaches including the previously pre3 sented inter-intra class cue fusion approach. Furthermore, NegMine can be fused with existing approaches to boost their performance. Finally, the thesis will take a step back and look at the use of generic object detectors as prior knowledge in weakly supervised learning of object detectors. These generic object detectors are typically based on sampling saliency maps that indicate if a pixel belongs to the background or foreground. A new approach to generating saliency maps is presented that, unlike existing approaches, looks beyond the current image of interest and into images similar to the current image. We show that our generic object proposal method can be used by itself to annotate the weakly labelled object data with surprisingly high accuracy

    An Evaluation of Deep CNN Baselines for Scene-Independent Person Re-Identification

    Full text link
    In recent years, a variety of proposed methods based on deep convolutional neural networks (CNNs) have improved the state of the art for large-scale person re-identification (ReID). While a large number of optimizations and network improvements have been proposed, there has been relatively little evaluation of the influence of training data and baseline network architecture. In particular, it is usually assumed either that networks are trained on labeled data from the deployment location (scene-dependent), or else adapted with unlabeled data, both of which complicate system deployment. In this paper, we investigate the feasibility of achieving scene-independent person ReID by forming a large composite dataset for training. We present an in-depth comparison of several CNN baseline architectures for both scene-dependent and scene-independent ReID, across a range of training dataset sizes. We show that scene-independent ReID can produce leading-edge results, competitive with unsupervised domain adaption techniques. Finally, we introduce a new dataset for comparing within-camera and across-camera person ReID.Comment: To be published in 2018 15th Conference on Computer and Robot Vision (CRV

    Image Down-Scaler Using the Box Filter Algorithm

    Get PDF
    One of the indispensable aspects of digital image processing is the requirement of varied image resolutions. To achieve varied resolution, scaling comes into picture. Two important applications of scaling is good pictorial quality for human interpretation and processing of digital images for storage, transmission and for representation of autonomous machine perception. This paper focuses on the transmission application. The size of the image if reduced occupies less space in the communication medium thus reducing the bandwidth requirement. And also the server space and the processing power of the image is reduced greatly. The standard for digital television transmission over terrestrial, cable and satellite networks is defined by Advanced Television Systems Committee (ATSC), with either 704 × 480 or 640 × 480 pixel resolutions, at 24, 30, or 60 progressive frames per second. This paper proposes a monochrome and colored image down scaling core with memory banks for accessing the image pixels. A 704 x 480 pixel resolution image was used. The core has minimum complexity and was developed in Hardware Descriptive Language (HDL). Its been benchmarked in various ASIC technologies

    Efficient Deep Feature Learning and Extraction via StochasticNets

    Full text link
    Deep neural networks are a powerful tool for feature learning and extraction given their ability to model high-level abstractions in highly complex data. One area worth exploring in feature learning and extraction using deep neural networks is efficient neural connectivity formation for faster feature learning and extraction. Motivated by findings of stochastic synaptic connectivity formation in the brain as well as the brain's uncanny ability to efficiently represent information, we propose the efficient learning and extraction of features via StochasticNets, where sparsely-connected deep neural networks can be formed via stochastic connectivity between neurons. To evaluate the feasibility of such a deep neural network architecture for feature learning and extraction, we train deep convolutional StochasticNets to learn abstract features using the CIFAR-10 dataset, and extract the learned features from images to perform classification on the SVHN and STL-10 datasets. Experimental results show that features learned using deep convolutional StochasticNets, with fewer neural connections than conventional deep convolutional neural networks, can allow for better or comparable classification accuracy than conventional deep neural networks: relative test error decrease of ~4.5% for classification on the STL-10 dataset and ~1% for classification on the SVHN dataset. Furthermore, it was shown that the deep features extracted using deep convolutional StochasticNets can provide comparable classification accuracy even when only 10% of the training data is used for feature learning. Finally, it was also shown that significant gains in feature extraction speed can be achieved in embedded applications using StochasticNets. As such, StochasticNets allow for faster feature learning and extraction performance while facilitate for better or comparable accuracy performances.Comment: 10 pages. arXiv admin note: substantial text overlap with arXiv:1508.0546

    UVM Verification of an SPI Master Core

    Get PDF
    In today’s world, more and more functionalities in the form of IP cores are integrated into a single chip or SOC. System-level verification of such large SOCs has become complex. The modern trend is to provide pre-designed IP cores with companion Verification IP. These Verification IPs are independent, scalable, and reusable verification components. The SystemVerilog language is based on object-oriented principles and is the most promising language to develop a complete verification environment with functional coverage, constrained random testing and assertions. The Universal Verification Methodology, written in SystemVerilog, is a base class library of reusable verification components. This paper discusses a Universal Verification Methodology based environment for testing a Wishbone compliant SPI master controller core. A multi-layer testbench was developed which consists of a Wishbone bus functional model, SPI slave model, driver, scoreboard, coverage analysis, and assertions developed using various properties of SystemVerilog an the UVM library. Later, constrained random testing using vectors driven into the DUT for higher functional coverage is discussed. The verification results shows the effectiveness and feasibility of the proposed verification environment

    On challenges in training recurrent neural networks

    Full text link
    Dans un problĂšme de prĂ©diction Ă  multiples pas discrets, la prĂ©diction Ă  chaque instant peut dĂ©pendre de l’entrĂ©e Ă  n’importe quel moment dans un passĂ© lointain. ModĂ©liser une telle dĂ©pendance Ă  long terme est un des problĂšmes fondamentaux en apprentissage automatique. En thĂ©orie, les RĂ©seaux de Neurones RĂ©currents (RNN) peuvent modĂ©liser toute dĂ©pendance Ă  long terme. En pratique, puisque la magnitude des gradients peut croĂźtre ou dĂ©croĂźtre exponentiellement avec la durĂ©e de la sĂ©quence, les RNNs ne peuvent modĂ©liser que les dĂ©pendances Ă  court terme. Cette thĂšse explore ce problĂšme dans les rĂ©seaux de neurones rĂ©currents et propose de nouvelles solutions pour celui-ci. Le chapitre 3 explore l’idĂ©e d’utiliser une mĂ©moire externe pour stocker les Ă©tats cachĂ©s d’un rĂ©seau Ă  MĂ©moire Long et Court Terme (LSTM). En rendant l’opĂ©ration d’écriture et de lecture de la mĂ©moire externe discrĂšte, l’architecture proposĂ©e rĂ©duit le taux de dĂ©croissance des gradients dans un LSTM. Ces opĂ©rations discrĂštes permettent Ă©galement au rĂ©seau de crĂ©er des connexions dynamiques sur de longs intervalles de temps. Le chapitre 4 tente de caractĂ©riser cette dĂ©croissance des gradients dans un rĂ©seau de neurones rĂ©current et propose une nouvelle architecture rĂ©currente qui, grĂące Ă  sa conception, rĂ©duit ce problĂšme. L’UnitĂ© RĂ©currente Non-saturante (NRUs) proposĂ©e n’a pas de fonction d’activation saturante et utilise la mise Ă  jour additive de cellules au lieu de la mise Ă  jour multiplicative. Le chapitre 5 discute des dĂ©fis de l’utilisation de rĂ©seaux de neurones rĂ©currents dans un contexte d’apprentissage continuel, oĂč de nouvelles tĂąches apparaissent au fur et Ă  mesure. Les dĂ©pendances dans l’apprentissage continuel ne sont pas seulement contenues dans une tĂąche, mais sont aussi prĂ©sentes entre les tĂąches. Ce chapitre discute de deux problĂšmes fondamentaux dans l’apprentissage continuel: (i) l’oubli catastrophique d’anciennes tĂąches et (ii) la capacitĂ© de saturation du rĂ©seau. De plus, une solution est proposĂ©e pour rĂ©gler ces deux problĂšmes lors de l’entraĂźnement d’un rĂ©seau de neurones rĂ©current.In a multi-step prediction problem, the prediction at each time step can depend on the input at any of the previous time steps far in the past. Modelling such long-term dependencies is one of the fundamental problems in machine learning. In theory, Recurrent Neural Networks (RNNs) can model any long-term dependency. In practice, they can only model short-term dependencies due to the problem of vanishing and exploding gradients. This thesis explores the problem of vanishing gradient in recurrent neural networks and proposes novel solutions for the same. Chapter 3 explores the idea of using external memory to store the hidden states of a Long Short Term Memory (LSTM) network. By making the read and write operations of the external memory discrete, the proposed architecture reduces the rate of gradients vanishing in an LSTM. These discrete operations also enable the network to create dynamic skip connections across time. Chapter 4 attempts to characterize all the sources of vanishing gradients in a recurrent neural network and proposes a new recurrent architecture which has significantly better gradient flow than state-of-the-art recurrent architectures. The proposed Non-saturating Recurrent Units (NRUs) have no saturating activation functions and use additive cell updates instead of multiplicative cell updates. Chapter 5 discusses the challenges of using recurrent neural networks in the context of lifelong learning. In the lifelong learning setting, the network is expected to learn a series of tasks over its lifetime. The dependencies in lifelong learning are not just within a task, but also across the tasks. This chapter discusses the two fundamental problems in lifelong learning: (i) catastrophic forgetting of old tasks, and (ii) network capacity saturation. Further, it proposes a solution to solve both these problems while training a recurrent neural network
    • 

    corecore