103,546 research outputs found
Rhythmic Representations: Learning Periodic Patterns for Scalable Place Recognition at a Sub-Linear Storage Cost
Robotic and animal mapping systems share many challenges and characteristics:
they must function in a wide variety of environmental conditions, enable the
robot or animal to navigate effectively to find food or shelter, and be
computationally tractable from both a speed and storage perspective. With
regards to map storage, the mammalian brain appears to take a diametrically
opposed approach to all current robotic mapping systems. Where robotic mapping
systems attempt to solve the data association problem to minimise
representational aliasing, neurons in the brain intentionally break data
association by encoding large (potentially unlimited) numbers of places with a
single neuron. In this paper, we propose a novel method based on supervised
learning techniques that seeks out regularly repeating visual patterns in the
environment with mutually complementary co-prime frequencies, and an encoding
scheme that enables storage requirements to grow sub-linearly with the size of
the environment being mapped. To improve robustness in challenging real-world
environments while maintaining storage growth sub-linearity, we incorporate
both multi-exemplar learning and data augmentation techniques. Using large
benchmark robotic mapping datasets, we demonstrate the combined system
achieving high-performance place recognition with sub-linear storage
requirements, and characterize the performance-storage growth trade-off curve.
The work serves as the first robotic mapping system with sub-linear storage
scaling properties, as well as the first large-scale demonstration in
real-world environments of one of the proposed memory benefits of these
neurons.Comment: Pre-print of article that will appear in the IEEE Robotics and
Automation Letter
Linguistically-driven framework for computationally efficient and scalable sign recognition
We introduce a new general framework for sign recognition from monocular video using limited quantities of annotated data. The novelty of the hybrid framework we describe here is that we exploit state-of-the art learning methods while also incorporating features based on what we know about the linguistic composition of lexical signs. In particular, we analyze hand shape, orientation, location, and motion trajectories, and then use CRFs to combine this linguistically significant information for purposes of sign recognition. Our robust modeling and recognition of these sub-components of sign production allow an efficient parameterization of the sign recognition problem as compared with purely data-driven methods. This parameterization enables a scalable and extendable time-series learning approach that advances the state of the art in sign recognition, as shown by the results reported here for recognition of isolated, citation-form, lexical signs from American Sign Language (ASL)
On the impact of the GOP size in a temporal H.264/AVC-to-SVC transcoder in baseline and main profile
Scalable video coding is a recent extension of the advanced video coding H.264/AVC standard developed jointly by ISO/IEC and ITU-T, which allows adapting the bitstream easily by dropping parts of it named layers. This adaptation makes it possible for a single bitstream to meet the requirements for reliable delivery of video to diverse clients over heterogeneous networks using temporal, spatial or quality scalability, combined or separately. Since the scalable video coding design requires scalability to be provided at the encoder side, existing content cannot benefit from it. Efficient techniques for converting contents without scalability to a scalable format are desirable. In this paper, an approach for temporal scalability transcoding from H.264/AVC to scalable video coding in baseline and main profile is presented and the impact of the GOP size is analyzed. Independently of the GOP size chosen, time savings of around 63 % for baseline profile and 60 % for main profile are achieved while maintaining the coding efficiency
- …