2,508 research outputs found
Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding
Learning long-term dependencies in extended temporal sequences requires
credit assignment to events far back in the past. The most common method for
training recurrent neural networks, back-propagation through time (BPTT),
requires credit information to be propagated backwards through every single
step of the forward computation, potentially over thousands or millions of time
steps. This becomes computationally expensive or even infeasible when used with
long sequences. Importantly, biological brains are unlikely to perform such
detailed reverse replay over very long sequences of internal states (consider
days, months, or years.) However, humans are often reminded of past memories or
mental states which are associated with the current mental state. We consider
the hypothesis that such memory associations between past and present could be
used for credit assignment through arbitrarily long sequences, propagating the
credit assigned to the current state to the associated past state. Based on
this principle, we study a novel algorithm which only back-propagates through a
few of these temporal skip connections, realized by a learned attention
mechanism that associates current states with relevant past states. We
demonstrate in experiments that our method matches or outperforms regular BPTT
and truncated BPTT in tasks involving particularly long-term dependencies, but
without requiring the biologically implausible backward replay through the
whole history of states. Additionally, we demonstrate that the proposed method
transfers to longer sequences significantly better than LSTMs trained with BPTT
and LSTMs trained with full self-attention.Comment: To appear as a Spotlight presentation at NIPS 201
Visual object tracking
University of Technology Sydney. Faculty of Engineering and Information Technology.Visual object tracking is a critical task in many computer-vision-related applications, such as surveillance and robotics. If the tracking target is provided in the first frame of a video, the tracker will predict the location and the shape of the target in the following frames. Despite the significant research effort that has been dedicated to this area for several years, this field remains challenging due to a number of issues, such as occlusion, shape variation and drifting, all of which adversely affect the performance of a tracking algorithm.
This research focuses on incorporating the spatial and temporal context to tackle the challenging issues related to developing robust trackers. The spatial context is what surrounds a given object and the temporal context is what has been observed in the recent past at the same location. In particular, by considering the relationship between the target and its surroundings, the spatial context information helps the tracker to better distinguish the target from the background, especially when it suffers from scale change, shape variation, occlusion, and background clutter. Meanwhile, the temporal contextual cues are beneficial for building a stable appearance representation for the target, which enables the tracker to be robust against occlusion and drifting.
In this regard, we attempt to develop effective methods that take advantage of the spatial and temporal context to improve the tracking algorithms. Our proposed methods can benefit three kinds of mainstream tracking frameworks, namely the template-based generative tracking framework, the pixel-wise tracking framework and the tracking-by-detection framework. For the template-based generative tracking framework, a novel template based tracker is proposed that enhances the existing appearance model of the target by introducing mask templates. In particular, mask templates store the temporal context represented by the frame difference in various time scales, and other templates encode the spatial context. Then, using pixel-wise analytic tools which provide richer details, which naturally accommodates tracking tasks, a finer and more accurate tracker is proposed. It makes use of two convolutional neural networks to capture both the spatial and temporal context. Lastly, for a visual tracker with a tracking-by-detection strategy, we propose an effective and efficient module that can improve the quality of the candidate windows sampled to identify the target. By utilizing the context around the object, our proposed module is able to refine the location and dimension of each candidate window, thus helping the tracker better focus on the target object
3-D distributed memory polynomial behavioral model for concurrent dual-band envelope tracking power amplifier linearization
© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper presents a new 3-D behavioral model to compensate for the nonlinear distortion arising in concurrent dual-band (DB) Envelope Tracking (ET) Power Amplifiers (PAs). The advantage of the proposed 3-D distributed memory polynomial (3D-DMP) behavioral model, in comparison to the already published behavioral models used for concurrent dual-band envelope tracking PA linearization, is that it requires a smaller number of coefficients to achieve the same linearity performance, which reduces the overall identification and adaptation computational complexity. The proposed 3D-DMP digital predistorter (DPD) is tested under different ET supply modulation techniques. Moreover, further model order reduction of the 3D-DMP DPD is achieved by applying the principal component analysis (PCA) technique. Experimental results are shown considering a concurrent DB transmission of aWCDMA signal at 1.75GHz and a 10-MHz bandwidth LTE signal at 2.1 GHz. The performance of the proposed 3D-DMP DPD is evaluated in terms of linearity, drain power efficiency, and computational complexity.Peer ReviewedPostprint (author's final draft
Attentive monitoring of multiple video streams driven by a Bayesian foraging strategy
In this paper we shall consider the problem of deploying attention to subsets
of the video streams for collating the most relevant data and information of
interest related to a given task. We formalize this monitoring problem as a
foraging problem. We propose a probabilistic framework to model observer's
attentive behavior as the behavior of a forager. The forager, moment to moment,
focuses its attention on the most informative stream/camera, detects
interesting objects or activities, or switches to a more profitable stream. The
approach proposed here is suitable to be exploited for multi-stream video
summarization. Meanwhile, it can serve as a preliminary step for more
sophisticated video surveillance, e.g. activity and behavior analysis.
Experimental results achieved on the UCR Videoweb Activities Dataset, a
publicly available dataset, are presented to illustrate the utility of the
proposed technique.Comment: Accepted to IEEE Transactions on Image Processin
Optical Gaze Tracking with Spatially-Sparse Single-Pixel Detectors
Gaze tracking is an essential component of next generation displays for
virtual reality and augmented reality applications. Traditional camera-based
gaze trackers used in next generation displays are known to be lacking in one
or multiple of the following metrics: power consumption, cost, computational
complexity, estimation accuracy, latency, and form-factor. We propose the use
of discrete photodiodes and light-emitting diodes (LEDs) as an alternative to
traditional camera-based gaze tracking approaches while taking all of these
metrics into consideration. We begin by developing a rendering-based simulation
framework for understanding the relationship between light sources and a
virtual model eyeball. Findings from this framework are used for the placement
of LEDs and photodiodes. Our first prototype uses a neural network to obtain an
average error rate of 2.67{\deg} at 400Hz while demanding only 16mW. By
simplifying the implementation to using only LEDs, duplexed as light
transceivers, and more minimal machine learning model, namely a light-weight
supervised Gaussian process regression algorithm, we show that our second
prototype is capable of an average error rate of 1.57{\deg} at 250 Hz using 800
mW.Comment: 10 pages, 8 figures, published in IEEE International Symposium on
Mixed and Augmented Reality (ISMAR) 202
2D Fast Vessel Visualization Using a Vessel Wall Mask Guiding Fine Vessel Detection
The paper addresses the fine retinal-vessel's detection issue that is faced in diagnostic applications and aims at assisting in better recognizing fine vessel anomalies in 2D. Our innovation relies in separating key visual features vessels exhibit in order to make the diagnosis of eventual retinopathologies easier to detect. This allows focusing on vessel segments which present fine
changes detectable at different sampling scales. We advocate that these changes can be addressed as subsequent stages of the same
vessel detection procedure. We first carry out an initial estimate of the basic vessel-wall's network, define the main wall-body,
and then try to approach the ridges and branches of the vasculature's using fine detection. Fine vessel screening looks into local structural inconsistencies in vessels properties, into noise, or into not expected intensity variations observed inside pre-known vessel-body areas. The vessels are first modelled sufficiently but not precisely by their walls with a tubular model-structure that is the result of an initial segmentation. This provides a chart of likely Vessel Wall Pixels (VWPs) yielding a form of a likelihood vessel map mainly based on gradient filter's intensity and spatial arrangement parameters (e.g., linear consistency). Specific vessel parameters (centerline, width, location, fall-away rate, main orientation) are post-computed by convolving the image with a set of pre-tuned spatial filters called Matched Filters (MFs). These are easily computed as Gaussian-like 2D forms that use a limited range sub-optimal parameters adjusted to the dominant vessel characteristics obtained by Spatial Grey Level Difference statistics limiting the range of search into vessel widths of 16, 32, and 64 pixels. Sparse pixels are effectively eliminated by applying a limited range Hough Transform (HT) or region growing. Major benefits are limiting the range of parameters, reducing the search-space for post-convolution to only masked regions, representing almost 2% of the 2D volume, good speed versus accuracy/time trade-off. Results show the potentials of our approach in terms of time for detection ROC analysis and accuracy of vessel pixel (VP) detection
- …