2,508 research outputs found

    Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding

    Full text link
    Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common method for training recurrent neural networks, back-propagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes computationally expensive or even infeasible when used with long sequences. Importantly, biological brains are unlikely to perform such detailed reverse replay over very long sequences of internal states (consider days, months, or years.) However, humans are often reminded of past memories or mental states which are associated with the current mental state. We consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state. Based on this principle, we study a novel algorithm which only back-propagates through a few of these temporal skip connections, realized by a learned attention mechanism that associates current states with relevant past states. We demonstrate in experiments that our method matches or outperforms regular BPTT and truncated BPTT in tasks involving particularly long-term dependencies, but without requiring the biologically implausible backward replay through the whole history of states. Additionally, we demonstrate that the proposed method transfers to longer sequences significantly better than LSTMs trained with BPTT and LSTMs trained with full self-attention.Comment: To appear as a Spotlight presentation at NIPS 201

    Visual object tracking

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Visual object tracking is a critical task in many computer-vision-related applications, such as surveillance and robotics. If the tracking target is provided in the first frame of a video, the tracker will predict the location and the shape of the target in the following frames. Despite the significant research effort that has been dedicated to this area for several years, this field remains challenging due to a number of issues, such as occlusion, shape variation and drifting, all of which adversely affect the performance of a tracking algorithm. This research focuses on incorporating the spatial and temporal context to tackle the challenging issues related to developing robust trackers. The spatial context is what surrounds a given object and the temporal context is what has been observed in the recent past at the same location. In particular, by considering the relationship between the target and its surroundings, the spatial context information helps the tracker to better distinguish the target from the background, especially when it suffers from scale change, shape variation, occlusion, and background clutter. Meanwhile, the temporal contextual cues are beneficial for building a stable appearance representation for the target, which enables the tracker to be robust against occlusion and drifting. In this regard, we attempt to develop effective methods that take advantage of the spatial and temporal context to improve the tracking algorithms. Our proposed methods can benefit three kinds of mainstream tracking frameworks, namely the template-based generative tracking framework, the pixel-wise tracking framework and the tracking-by-detection framework. For the template-based generative tracking framework, a novel template based tracker is proposed that enhances the existing appearance model of the target by introducing mask templates. In particular, mask templates store the temporal context represented by the frame difference in various time scales, and other templates encode the spatial context. Then, using pixel-wise analytic tools which provide richer details, which naturally accommodates tracking tasks, a finer and more accurate tracker is proposed. It makes use of two convolutional neural networks to capture both the spatial and temporal context. Lastly, for a visual tracker with a tracking-by-detection strategy, we propose an effective and efficient module that can improve the quality of the candidate windows sampled to identify the target. By utilizing the context around the object, our proposed module is able to refine the location and dimension of each candidate window, thus helping the tracker better focus on the target object

    3-D distributed memory polynomial behavioral model for concurrent dual-band envelope tracking power amplifier linearization

    Get PDF
    © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper presents a new 3-D behavioral model to compensate for the nonlinear distortion arising in concurrent dual-band (DB) Envelope Tracking (ET) Power Amplifiers (PAs). The advantage of the proposed 3-D distributed memory polynomial (3D-DMP) behavioral model, in comparison to the already published behavioral models used for concurrent dual-band envelope tracking PA linearization, is that it requires a smaller number of coefficients to achieve the same linearity performance, which reduces the overall identification and adaptation computational complexity. The proposed 3D-DMP digital predistorter (DPD) is tested under different ET supply modulation techniques. Moreover, further model order reduction of the 3D-DMP DPD is achieved by applying the principal component analysis (PCA) technique. Experimental results are shown considering a concurrent DB transmission of aWCDMA signal at 1.75GHz and a 10-MHz bandwidth LTE signal at 2.1 GHz. The performance of the proposed 3D-DMP DPD is evaluated in terms of linearity, drain power efficiency, and computational complexity.Peer ReviewedPostprint (author's final draft

    Attentive monitoring of multiple video streams driven by a Bayesian foraging strategy

    Full text link
    In this paper we shall consider the problem of deploying attention to subsets of the video streams for collating the most relevant data and information of interest related to a given task. We formalize this monitoring problem as a foraging problem. We propose a probabilistic framework to model observer's attentive behavior as the behavior of a forager. The forager, moment to moment, focuses its attention on the most informative stream/camera, detects interesting objects or activities, or switches to a more profitable stream. The approach proposed here is suitable to be exploited for multi-stream video summarization. Meanwhile, it can serve as a preliminary step for more sophisticated video surveillance, e.g. activity and behavior analysis. Experimental results achieved on the UCR Videoweb Activities Dataset, a publicly available dataset, are presented to illustrate the utility of the proposed technique.Comment: Accepted to IEEE Transactions on Image Processin

    Optical Gaze Tracking with Spatially-Sparse Single-Pixel Detectors

    Get PDF
    Gaze tracking is an essential component of next generation displays for virtual reality and augmented reality applications. Traditional camera-based gaze trackers used in next generation displays are known to be lacking in one or multiple of the following metrics: power consumption, cost, computational complexity, estimation accuracy, latency, and form-factor. We propose the use of discrete photodiodes and light-emitting diodes (LEDs) as an alternative to traditional camera-based gaze tracking approaches while taking all of these metrics into consideration. We begin by developing a rendering-based simulation framework for understanding the relationship between light sources and a virtual model eyeball. Findings from this framework are used for the placement of LEDs and photodiodes. Our first prototype uses a neural network to obtain an average error rate of 2.67{\deg} at 400Hz while demanding only 16mW. By simplifying the implementation to using only LEDs, duplexed as light transceivers, and more minimal machine learning model, namely a light-weight supervised Gaussian process regression algorithm, we show that our second prototype is capable of an average error rate of 1.57{\deg} at 250 Hz using 800 mW.Comment: 10 pages, 8 figures, published in IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 202

    2D Fast Vessel Visualization Using a Vessel Wall Mask Guiding Fine Vessel Detection

    Get PDF
    The paper addresses the fine retinal-vessel's detection issue that is faced in diagnostic applications and aims at assisting in better recognizing fine vessel anomalies in 2D. Our innovation relies in separating key visual features vessels exhibit in order to make the diagnosis of eventual retinopathologies easier to detect. This allows focusing on vessel segments which present fine changes detectable at different sampling scales. We advocate that these changes can be addressed as subsequent stages of the same vessel detection procedure. We first carry out an initial estimate of the basic vessel-wall's network, define the main wall-body, and then try to approach the ridges and branches of the vasculature's using fine detection. Fine vessel screening looks into local structural inconsistencies in vessels properties, into noise, or into not expected intensity variations observed inside pre-known vessel-body areas. The vessels are first modelled sufficiently but not precisely by their walls with a tubular model-structure that is the result of an initial segmentation. This provides a chart of likely Vessel Wall Pixels (VWPs) yielding a form of a likelihood vessel map mainly based on gradient filter's intensity and spatial arrangement parameters (e.g., linear consistency). Specific vessel parameters (centerline, width, location, fall-away rate, main orientation) are post-computed by convolving the image with a set of pre-tuned spatial filters called Matched Filters (MFs). These are easily computed as Gaussian-like 2D forms that use a limited range sub-optimal parameters adjusted to the dominant vessel characteristics obtained by Spatial Grey Level Difference statistics limiting the range of search into vessel widths of 16, 32, and 64 pixels. Sparse pixels are effectively eliminated by applying a limited range Hough Transform (HT) or region growing. Major benefits are limiting the range of parameters, reducing the search-space for post-convolution to only masked regions, representing almost 2% of the 2D volume, good speed versus accuracy/time trade-off. Results show the potentials of our approach in terms of time for detection ROC analysis and accuracy of vessel pixel (VP) detection
    corecore