55 research outputs found

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

    Neural Radiance Fields: Past, Present, and Future

    Full text link
    The various aspects like modeling and interpreting 3D environments and surroundings have enticed humans to progress their research in 3D Computer Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in Computer Graphics, Robotics, Computer Vision, and the possible scope of High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D models have gained traction from res with more than 1000 preprints related to NeRFs published. This paper serves as a bridge for people starting to study these fields by building on the basics of Mathematics, Geometry, Computer Vision, and Computer Graphics to the difficulties encountered in Implicit Representations at the intersection of all these disciplines. This survey provides the history of rendering, Implicit Learning, and NeRFs, the progression of research on NeRFs, and the potential applications and implications of NeRFs in today's world. In doing so, this survey categorizes all the NeRF-related research in terms of the datasets used, objective functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation

    Metadata-driven multimedia access

    Get PDF
    With the growing ubiquity and mobility of multimedia-enabled devices, universal multimedia access (UMA) is emerging as one of the important components for the next generation of multimedia applications. The basic concept underlying UMA is universal or seamless access to multimedia content, by automatic selection and adaptation of content based on the user's environment. UMA promises an integration of these different perspectives into a new class of content adaptive applications that could allow users to access multimedia content without concern for specific coding formats, terminal capabilities, or network conditions. We discuss methods that support UMA and the tools provided by MPEG-7 to achieve this. We also discuss the inclusion of metadata in JPEG 2000 encoded images. We present these methods in the typical order that they may be used in an actual application. Therefore, we first discuss the (personalized) selection of desired content from all available content, followed by the organization of related variations of a single piece of content. Then, we discuss segmentation and summarization of audio video (AV) content, and finally, transcoding of AV content

    Proceedings of the Augmented VIsual Display (AVID) Research Workshop

    Get PDF
    The papers, abstracts, and presentations were presented at a three day workshop focused on sensor modeling and simulation, and image enhancement, processing, and fusion. The technical sessions emphasized how sensor technology can be used to create visual imagery adequate for aircraft control and operations. Participants from industry, government, and academic laboratories contributed to panels on Sensor Systems, Sensor Modeling, Sensor Fusion, Image Processing (Computer and Human Vision), and Image Evaluation and Metrics

    Optical Camera Communications: Principles, Modulations, Potential and Challenges

    Get PDF
    Optical wireless communications (OWC) are emerging as cost-effective and practical solutions to the congested radio frequency-based wireless technologies. As part of OWC, optical camera communications (OCC) have become very attractive, considering recent developments in cameras and the use of fitted cameras in smart devices. OCC together with visible light communications (VLC) is considered within the framework of the IEEE 802.15.7m standardization. OCCs based on both organic and inorganic light sources as well as cameras are being considered for low-rate transmissions and localization in indoor as well as outdoor short-range applications and within the framework of the IEEE 802.15.7m standardization together with VLC. This paper introduces the underlying principles of OCC and gives a comprehensive overview of this emerging technology with recent standardization activities in OCC. It also outlines the key technical issues such as mobility, coverage, interference, performance enhancement, etc. Future research directions and open issues are also presented

    Scalable video compression with optimized visual performance and random accessibility

    Full text link
    This thesis is concerned with maximizing the coding efficiency, random accessibility and visual performance of scalable compressed video. The unifying theme behind this work is the use of finely embedded localized coding structures, which govern the extent to which these goals may be jointly achieved. The first part focuses on scalable volumetric image compression. We investigate 3D transform and coding techniques which exploit inter-slice statistical redundancies without compromising slice accessibility. Our study shows that the motion-compensated temporal discrete wavelet transform (MC-TDWT) practically achieves an upper bound to the compression efficiency of slice transforms. From a video coding perspective, we find that most of the coding gain is attributed to offsetting the learning penalty in adaptive arithmetic coding through 3D code-block extension, rather than inter-frame context modelling. The second aspect of this thesis examines random accessibility. Accessibility refers to the ease with which a region of interest is accessed (subband samples needed for reconstruction are retrieved) from a compressed video bitstream, subject to spatiotemporal code-block constraints. We investigate the fundamental implications of motion compensation for random access efficiency and the compression performance of scalable interactive video. We demonstrate that inclusion of motion compensation operators within the lifting steps of a temporal subband transform incurs a random access penalty which depends on the characteristics of the motion field. The final aspect of this thesis aims to minimize the perceptual impact of visible distortion in scalable reconstructed video. We present a visual optimization strategy based on distortion scaling which raises the distortion-length slope of perceptually significant samples. This alters the codestream embedding order during post-compression rate-distortion optimization, thus allowing visually sensitive sites to be encoded with higher fidelity at a given bit-rate. For visual sensitivity analysis, we propose a contrast perception model that incorporates an adaptive masking slope. This versatile feature provides a context which models perceptual significance. It enables scene structures that otherwise suffer significant degradation to be preserved at lower bit-rates. The novelty in our approach derives from a set of "perceptual mappings" which account for quantization noise shaping effects induced by motion-compensated temporal synthesis. The proposed technique reduces wavelet compression artefacts and improves the perceptual quality of video

    Cross-Layer Techniques for Efficient Medium Access in Wi-Fi Networks

    Get PDF
    IEEE 802.11 (Wi-Fi) wireless networks share the wireless medium using a Carrier Sense Multiple Access (CSMA) Medium Access Control (MAC) protocol. The MAC protocol is a central determiner of Wi-Fi networks’ efficiency–the fraction of the capacity available in the physical layer that Wi-Fi-equipped hosts can use in practice. The MAC protocol’s design is intended to allow senders to share the wireless medium fairly while still allowing high utilisation. This thesis develops techniques that allow Wi-Fi senders to send more data using fewer medium acquisitions, reducing the overhead of idle periods, and thus improving end-to-end goodput. Our techniques address the problems we identify with Wi-Fi’s status quo. Today’s commodity Linux Wi-Fi/IP software stack and Wi-Fi cards waste medium acquisitions as they fail to queue enough packets that would allow for effective sending of multiple frames per wireless medium acquisition. In addition, for bi-directional protocols such as TCP, TCP data and TCP ACKs contend for the wireless channel, wasting medium acquisitions (and thus capacity). Finally, the probing mechanism used for bit-rate adaptation in Wi-Fi networks increases channel acquisition overhead. We describe the design and implementation of Aggregate Aware Queueing (AAQ), a fair queueing discipline, that coordinates scheduling of frame transmission with the aggregation layer in the Wi-Fi stack, allowing more frames per channel acquisition. Furthermore, we describe Hierarchical Acknowledgments (HACK) and Transmission Control Protocol Acknowledgment Optimisation (TAO), techniques that reduce channel acquisitions for TCP flows, further improving goodput. Finally, we design and implement Aggregate Aware Rate Control (AARC), a bit-rate adaptation algorithm that reduces channel acquisition overheads incurred by the probing mechanism common in today’s commodity Wi-Fi systems. We implement our techniques on real Wi-Fi hardware to demonstrate their practicality, and measure their performance on real testbeds, using off-the-shelf commodity Wi-Fi hardware where possible, and software-defined radio hardware for those techniques that require modification of the Wi-Fi implementation unachievable on commodity hardware. The techniques described in this thesis offer up to 2x aggregate goodput improvement compared to the stock Linux Wi-Fi stack

    Super Resolution of Wavelet-Encoded Images and Videos

    Get PDF
    In this dissertation, we address the multiframe super resolution reconstruction problem for wavelet-encoded images and videos. The goal of multiframe super resolution is to obtain one or more high resolution images by fusing a sequence of degraded or aliased low resolution images of the same scene. Since the low resolution images may be unaligned, a registration step is required before super resolution reconstruction. Therefore, we first explore in-band (i.e. in the wavelet-domain) image registration; then, investigate super resolution. Our motivation for analyzing the image registration and super resolution problems in the wavelet domain is the growing trend in wavelet-encoded imaging, and wavelet-encoding for image/video compression. Due to drawbacks of widely used discrete cosine transform in image and video compression, a considerable amount of literature is devoted to wavelet-based methods. However, since wavelets are shift-variant, existing methods cannot utilize wavelet subbands efficiently. In order to overcome this drawback, we establish and explore the direct relationship between the subbands under a translational shift, for image registration and super resolution. We then employ our devised in-band methodology, in a motion compensated video compression framework, to demonstrate the effective usage of wavelet subbands. Super resolution can also be used as a post-processing step in video compression in order to decrease the size of the video files to be compressed, with downsampling added as a pre-processing step. Therefore, we present a video compression scheme that utilizes super resolution to reconstruct the high frequency information lost during downsampling. In addition, super resolution is a crucial post-processing step for satellite imagery, due to the fact that it is hard to update imaging devices after a satellite is launched. Thus, we also demonstrate the usage of our devised methods in enhancing resolution of pansharpened multispectral images

    Quality of service based distributed control of wireless networks

    Get PDF
    • 

    corecore