Search CORE

131 research outputs found

Present and Future of SLAM in Extreme Underground Environments

This paper reports on the state of the art in underground SLAM by discussing different SLAM strategies and results across six teams that participated in the three-year-long SubT competition. In particular, the paper has four main goals. First, we review the algorithms, architectures, and systems adopted by the teams; particular emphasis is put on lidar-centric SLAM solutions (the go-to approach for virtually all teams in the competition), heterogeneous multi-robot operation (including both aerial and ground robots), and real-world underground operation (from the presence of obscurants to the need to handle tight computational constraints). We do not shy away from discussing the dirty details behind the different SubT SLAM systems, which are often omitted from technical papers. Second, we discuss the maturity of the field by highlighting what is possible with the current SLAM systems and what we believe is within reach with some good systems engineering. Third, we outline what we believe are fundamental open problems, that are likely to require further research to break through. Finally, we provide a list of open-source SLAM implementations and datasets that have been produced during the SubT challenge and related efforts, and constitute a useful resource for researchers and practitioners.Comment: 21 pages including references. This survey paper is submitted to IEEE Transactions on Robotics for pre-approva

arXiv.org e-Print Archive

Content-prioritised video coding for British Sign Language communication.

Author: Muir Laura Joy
Publication venue
Publication date: 31/10/2007
Field of study

Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people

Open Access Institutional Repository at Robert Gordon University

Recommended from our members

Automatic sound synthesizer programming: techniques and applications

Author: Yee-King Matthew John
Publication venue
Publication date: 26/09/2011
Field of study

The aim of this thesis is to investigate techniques for, and applications of automatic sound synthesizer programming. An automatic sound synthesizer programmer is a system which removes the requirement to explicitly specify parameter settings for a sound synthesis algorithm from the user. Two forms of these systems are discussed in this thesis: tone matching programmers and synthesis space explorers. A tone matching programmer takes at its input a sound synthesis algorithm and a desired target sound. At its output it produces a configuration for the sound synthesis algorithm which causes it to emit a similar sound to the target. The techniques for achieving this that are investigated are genetic algorithms, neural networks, hill climbers and data driven approaches. A synthesis space explorer provides a user with a representation of the space of possible sounds that a synthesizer can produce and allows them to interactively explore this space. The applications of automatic sound synthesizer programming that are investigated include studio tools, an autonomous musical agent and a self-reprogramming drum machine. The research employs several methodologies: the development of novel software frameworks and tools, the examination of existing software at the source code and performance levels and user trials of the tools and software. The main contributions made are: a method for visualisation of sound synthesis space and low dimensional control of sound synthesizers; a general purpose framework for the deployment and testing of sound synthesis and optimisation algorithms in the SuperCollider language sclang; a comparison of a variety of optimisation techniques for sound synthesizer programming; an analysis of sound synthesizer error surfaces; a general purpose sound synthesizer programmer compatible with industry standard tools; an automatic improviser which passes a loose equivalent of the Turing test for Jazz musicians, i.e. being half of a man-machine duet which was rated as one of the best sessions of 2009 on the BBC's 'Jazz on 3' programme

Sussex Research Online

Extraction and representation of semantic information in digital media

Author: Martens Gaëtan
Publication venue: Ghent University. Faculty of Engineering
Publication date: 01/01/2011
Field of study

Ghent University Academic Bibliography

Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

Author: Fernandez-Chaves David
Gonzalez-Jimenez Javier
Matez-Bandera Jose Luis
Monroy Javier
Petkov Nicolai
Ruiz-Sarmiento Jose Raul
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Multimodal Adversarial Learning

Author: Osahor Uche
Publication venue: 'West Virginia University Libraries'
Publication date: 01/01/2022
Field of study

Deep Convolutional Neural Networks (DCNN) have proven to be an exceptional tool for object recognition, generative modelling, and multi-modal learning in various computer vision applications. However, recent findings have shown that such state-of-the-art models can be easily deceived by inserting slight imperceptible perturbations to key pixels in the input. A good target detection systems can accurately identify targets by localizing their coordinates on the input image of interest. This is ideally achieved by labeling each pixel in an image as a background or a potential target pixel. However, prior research still confirms that such state of the art targets models are susceptible to adversarial attacks. In the case of generative models, facial sketches drawn by artists mostly used by law enforcement agencies depend on the ability of the artist to clearly replicate all the key facial features that aid in capturing the true identity of a subject. Recent works have attempted to synthesize these sketches into plausible visual images to improve visual recognition and identification. However, synthesizing photo-realistic images from sketches proves to be an even more challenging task, especially for sensitive applications such as suspect identification. However, the incorporation of hybrid discriminators, which perform attribute classification of multiple target attributes, a quality guided encoder that minimizes the perceptual dissimilarity of the latent space embedding of the synthesized and real image at different layers in the network have shown to be powerful tools towards better multi modal learning techniques. In general, our overall approach was aimed at improving target detection systems and the visual appeal of synthesized images while incorporating multiple attribute assignment to the generator without compromising the identity of the synthesized image. We synthesized sketches using XDOG filter for the CelebA, Multi-modal and CelebA-HQ datasets and from an auxiliary generator trained on sketches from CUHK, IIT-D and FERET datasets. Our results overall for different model applications are impressive compared to current state of the art

The Research Repository @ WVU (West Virginia University)

Interactively skimming recorded speech

Author: Arons Barry Michael
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1994
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1994.Includes bibliographical references (p. 143-156).Barry Michael Arons.Ph.D

DSpace@MIT

Investigating the build-up of precedence effect using reflection masking

Author: Buchholz Jörg
Hartcher-O'Brien Jessica
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2006
Field of study

The auditory processing level involved in the build‐up of precedence [Freyman et al., J. Acoust. Soc. Am. 90, 874–884 (1991)] has been investigated here by employing reflection masked threshold (RMT) techniques. Given that RMT techniques are generally assumed to address lower levels of the auditory signal processing, such an approach represents a bottom‐up approach to the buildup of precedence. Three conditioner configurations measuring a possible buildup of reflection suppression were compared to the baseline RMT for four reflection delays ranging from 2.5–15 ms. No buildup of reflection suppression was observed for any of the conditioner configurations. Buildup of template (decrease in RMT for two of the conditioners), on the other hand, was found to be delay dependent. For five of six listeners, with reflection delay=2.5 and 15 ms, RMT decreased relative to the baseline. For 5‐ and 10‐ms delay, no change in threshold was observed. It is concluded that the low‐level auditory processing involved in RMT is not sufficient to realize a buildup of reflection suppression. This confirms suggestions that higher level processing is involved in PE buildup. The observed enhancement of reflection detection (RMT) may contribute to active suppression at higher processing levels

Online Research Database In Technology

MPG.PuRe

Recommended from our members

The role of sensory history and stimulus context in human time perception. Adaptive and integrative distortions of perceived duration

Author: Fulcher Corinne
Publication venue: Faculty of Life Sciences
Publication date: 01/01/2017
Field of study

This thesis documents a series of experiments designed to investigate the mechanisms subserving sub-second duration processing in humans. Firstly, duration aftereffects were generated by adapting to consistent duration information. If duration aftereffects represent encoding by neurons selective for both stimulus duration and non-temporal stimulus features, adapt-test changes in these features should prevent duration aftereffect generation. Stimulus characteristics were chosen which selectively target differing stages of the visual processing hierarchy. The duration aftereffect showed robust interocular transfer and could be generated using a stimulus whose duration was defined by stimuli invisible to monocular mechanisms, ruling out a pre-cortical locus. The aftereffects transferred across luminance-defined visual orientation and facial identity. Conversely, the duration encoding mechanism was selective for changes in the contrast-defined envelope size of a Gabor and showed broad spatial selectivity which scaled proportionally with adapting stimulus size. These findings are consistent with a second stage visual spatial mechanism that pools input across proportionally smaller, spatially abutting filters. A final series of experiments investigated the pattern of interaction between concurrently presented cross-modal durations. When duration discrepancies were small, multisensory judgements were biased towards the modality with higher precision. However, when duration discrepancies were large, perceived duration was compressed by both longer and shorter durations from the opposite modality, irrespective of unimodal temporal reliability. Taken together, these experiments provide support for a duration encoding mechanism that is tied to mid-level visual spatial processing. Following this localised encoding, supramodal mechanisms then dictate the combination of duration information across the senses

Bradford Scholars

Sensitivity to interaural timing differences within the envelopes of acoustic waveforms

Author: Greenberg DL
Publication venue: UCL (University College London)
Publication date: 28/04/2014
Field of study

Interaural-timing-differences (ITDs) are a cue for sound-source localisation and can be conveyed in the temporal-fine-structure (TFS) of low-frequency tones or in the envelope of high-frequency, amplitude-modulated sounds such as sinusoidally amplitude-modulated (SAM) and transposed-tones. Sensitivity to these cues has been measured in human psychophysical experiments and has revealed that the tranposed-tone elicits just-noticeable-differences (JNDs) in ITDs that are equivalent to those of low-frequency pure-tones when the modulation frequency is below 512-Hz. At modulation frequencies above 512-Hz performance rapidly declines for the transposed-tone while sensitivity to ITDs in pure-tones is robust until around 1200-Hz. Furthermore, transposed-tones elicit JNDs smaller than SAM tones. In the present study, ITD JNDs are assessed psychophysically for pure-tones and transposed-tones using off-midline reference locations. The results demonstrate that frequency, whether the ITD is conveyed in the TFS or the envelope, and location, all have a significant effect on human ITD JNDs and suggest that a difference exists in how ITDs are coded neuronally when conveyed by either high- or low-frequency sounds. ITD-sensitive neurons located within several brainstem nuclei display a high degree of phase-locking to both the TFS of low-frequency pure-tones and the envelopes of SAM and transposed-tones. Echoing the psychophysical findings, phase-locking to the waveform envelope at low modulation frequencies is equivalent to that of low-frequency pure-tones, while declining at high rates of modulation to a lesser degree for tranposed-tones than SAM tones. In order to assess factors critical to the localisation of high-frequency sounds a series of electrophysiology experiments were conducted. Recordings were made from single neurons within the inferior colliculus of the guinea pig in response to ITDs conveyed by 18 unique envelope shapes to evaluate how the envelope segments; Pause, Attack, Sustain and Decay each effect ITD JNDs. Amplitude-modulations with envelope shapes comprising relatively long Pause but short Attack durations have been found to elicit the greatest ITD discrimination of high-frequency sounds

UCL Discovery