21 research outputs found

    Enhanced Statistical Modelling For Variable Bit Rate Video Traffic Generated From Scalable Video Codec

    Get PDF
    Mereka bentuk rangkaian yang berkesan dan berprestasi tinggi memerlukan pencirian dan pemodela punca trafik rangkaian yang tepat. Tesis ini menyediakan satu kajian tentang penghantaran, pemodelan dan analisis video variable bit rate (VBR) yang merupakan asas reka bentuk protokol dan penggunaan rangkaian yang cekap dalam penghantaran video. Dengan ini, satu model trafik video VBR yang dikodkan oleh scalable video codec (SVC) telah dicadangkan. EDAR (1) dapat menjana siri video dengan tepat di mana siri ini bersifat seakan-akan trafik video yang sebenar. Model ini telah disahkan dengan menggunakan pelbagai statistik untuk membandingkan jejak simulasi da asal. Pengesahan ini telah dilakukan melalui pengukuran grafik (Quantile-Quantile plot) dan statistik (Kolmogorov-Smirnov, Jumlah Ralat Berganda (SSE), dan Kecekapan Relatif (RE)) serta pengesahan secara bersilang. Designing an effective and high performance network requires an accurate characterization and modelling of the network traffic. This work involves the analysis and modelling of the Variable Bit Rate (VBR) of video traffic, usually described as the core of the protocol design and efficient network utilization for video transmissions. In this context, an Enhanced Discrete Autoregressive (EDAR (1)) model for the VBR video traffic model, which is encoded by a Scalable Video Codec (SVC), has been proposed. The EDAR (1) model was able to accurately generate video sequences, which are very close to the actual video traffic in terms of accuracy. The model is validated using statistical tests in order to compare simulated and original traces. The validation is done using graphical (Quantile-Quantile plot) and statistical measurements (Kolmogorov-Smirnov, Sum of Squared Error, and Relative Efficiency), as well as cross-validation

    Optimisation énergétique de processus de traitement du signal et ses applications au décodage vidéo

    Get PDF
    Consumer electronics offer today more and more features (video, audio, GPS, Internet) and connectivity means (multi-radio systems with WiFi, Bluetooth, UMTS, HSPA, LTE-advanced ... ). The power demand of these devices is growing for the digital part especially for the processing chip. To support this ever increasing computing demand, processor architectures have evolved with multicore processors, graphics processors (GPU) and ether dedicated hardware accelerators. However, the evolution of battery technology is itself slower. Therefore, the autonomy of embedded systems is now under a great pressure. Among the new functionalities supported by mobile devices, video services take a prominent place. lndeed, recent analyzes show that they will represent 70% of mobile Internet traffic by 2016. Accompanying this growth, new technologies are emerging for new services and applications. Among them HEVC (High Efficiency Video Coding) can double the data compression while maintaining a subjective quality equivalent to its predecessor, the H.264 standard. ln a digital circuit, the total power consumption is made of static power and dynamic power. Most of modern hardware architectures implement means to control the power consumption of the system. Dynamic Voltage and Frequency Scaling (DVFS) mainly reduces the dynamic power of the circuit. This technique aims to adapt the power of the processor (and therefore its consumption) to the actual load needed by the application. To control the static power, Dynamic Power Management (DPM or sleep modes) aims to stop the voltage supplies associated with specific areas of the chip. ln this thesis, we first present a model of the energy consumed by the circuit integrating DPM and DVFS modes. This model is generalized to multi-core integrated circuits and to a rapid prototyping tool. Thus, the optimal operating point of a circuit, i.e. the operating frequency and the number of active cores, is identified. Secondly, the HEVC application is integrated to a multicore architecture coupled with a sophisticated DVFS mechanism. We show that this application can be implemented efficiently on general purpose processors (GPP) while minimizing the power consumption. Finally, and to get further energy gain, we propose a modified HEVC decoder that is capable to tune its energy gains together with a decoding quality trade-off.Aujourd'hui, les appareils électroniques offrent de plus en plus de fonctionnalités (vidéo, audio, GPS, internet) et des connectivités variées (multi-systèmes de radio avec WiFi, Bluetooth, UMTS, HSPA, LTE-advanced ... ). La demande en puissance de ces appareils est donc grandissante pour la partie numérique et notamment le processeur de calcul. Pour répondre à ce besoin sans cesse croissant de nouvelles fonctionnalités et donc de puissance de calcul, les architectures des processeurs ont beaucoup évolué : processeurs multi-coeurs, processeurs graphiques (GPU) et autres accélérateurs matériels dédiés. Cependant, alors que de nouvelles architectures matérielles peinent à répondre aux exigences de performance, l'évolution de la technologie des batteries est quant à elle encore plus lente. En conséquence, l'autonomie des systèmes embarqués est aujourd'hui sous pression. Parmi les nouveaux services supportés par les terminaux mobiles, la vidéo prend une place prépondérante. En effet, des analyses récentes de tendance montrent qu'elle représentera 70 % du trafic internet mobile dès 2016. Accompagnant cette croissance, de nouvelles technologies émergent permettant de nouveaux services et applications. Parmi elles, HEVC (High Efficiency Video Coding) permet de doubler la compression de données tout en garantissant une qualité subjective équivalente à son prédécesseur, la norme H.264. Dans un circuit numérique, la consommation provient de deux éléments: la puissance statique et la puissance dynamique. La plupart des architectures matérielles récentes mettent en oeuvre des procédés permettant de contrôler la puissance du système. Le changement dynamique du couple tension/fréquence appelé Dynamic Voltage and Frequency Scaling (DVFS) agit principalement sur la puissance dynamique du circuit. Cette technique permet d'adapter la puissance du processeur (et donc sa consommation) à la charge réelle nécessaire pour une application. Pour contrôler la puissance statique, le Dynamic Power Management (DPM, ou modes de veille) consistant à arrêter les alimentations associées à des zones spécifiques de la puce. Dans cette thèse, nous présentons d'abord une modélisation de l'énergie consommée par le circuit intégrant les modes DVFS et DPM. Cette modélisation est généralisée au circuit multi-coeurs et intégrée à un outil de prototypage rapide. Ainsi le point de fonctionnement optimal d'un circuit, la fréquence de fonctionnement et le nombre de coeurs actifs, est identifié. Dans un second temps, l'application HEVC est intégrée à une architecture multi-coeurs avec une adaptation dynamique de la fréquence de développement. Nous montrons que cette application peut être implémentée efficacement sur des processeurs généralistes (GPP) tout en minimisant la puissance consommée. Enfin, et pour aller plus loin dans les gains en énergie, nous proposons une modification du décodeur HEVC qui permet à un décodeur de baisser encore plus sa consommation en fonction du budget énergétique disponible localement

    Remote Sensing Data Compression

    Get PDF
    A huge amount of data is acquired nowadays by different remote sensing systems installed on satellites, aircrafts, and UAV. The acquired data then have to be transferred to image processing centres, stored and/or delivered to customers. In restricted scenarios, data compression is strongly desired or necessary. A wide diversity of coding methods can be used, depending on the requirements and their priority. In addition, the types and properties of images differ a lot, thus, practical implementation aspects have to be taken into account. The Special Issue paper collection taken as basis of this book touches on all of the aforementioned items to some degree, giving the reader an opportunity to learn about recent developments and research directions in the field of image compression. In particular, lossless and near-lossless compression of multi- and hyperspectral images still remains current, since such images constitute data arrays that are of extremely large size with rich information that can be retrieved from them for various applications. Another important aspect is the impact of lossless compression on image classification and segmentation, where a reasonable compromise between the characteristics of compression and the final tasks of data processing has to be achieved. The problems of data transition from UAV-based acquisition platforms, as well as the use of FPGA and neural networks, have become very important. Finally, attempts to apply compressive sensing approaches in remote sensing image processing with positive outcomes are observed. We hope that readers will find our book useful and interestin

    From Pixels to Spikes: Efficient Multimodal Learning in the Presence of Domain Shift

    Get PDF
    Computer vision aims to provide computers with a conceptual understanding of images or video by learning a high-level representation. This representation is typically derived from the pixel domain (i.e., RGB channels) for tasks such as image classification or action recognition. In this thesis, we explore how RGB inputs can either be pre-processed or supplemented with other compressed visual modalities, in order to improve the accuracy-complexity tradeoff for various computer vision tasks. Beginning with RGB-domain data only, we propose a multi-level, Voronoi based spatial partitioning of images, which are individually processed by a convolutional neural network (CNN), to improve the scale invariance of the embedding. We combine this with a novel and efficient approach for optimal bit allocation within the quantized cell representations. We evaluate this proposal on the content-based image retrieval task, which constitutes finding similar images in a dataset to a given query. We then move to the more challenging domain of action recognition, where a video sequence is classified according to its constituent action. In this case, we demonstrate how the RGB modality can be supplemented with a flow modality, comprising motion vectors extracted directly from the video codec. The motion vectors (MVs) are used both as input to a CNN and as an activity sensor for providing selective macroblock (MB) decoding of RGB frames instead of full-frame decoding. We independently train two CNNs on RGB and MV correspondences and then fuse their scores during inference, demonstrating faster end-to-end processing and competitive classification accuracy to recent work. In order to explore the use of more efficient sensing modalities, we replace the MV stream with a neuromorphic vision sensing (NVS) stream for action recognition. NVS hardware mimics the biological retina and operates with substantially lower power and at significantly higher sampling rates than conventional active pixel sensing (APS) cameras. Due to the lack of training data in this domain, we generate emulated NVS frames directly from consecutive RGB frames and use these to train a teacher-student framework that additionally leverages on the abundance of optical flow training data. In the final part of this thesis, we introduce a novel unsupervised domain adaptation method for further minimizing the domain shift between emulated (source) and real (target) NVS data domains

    Adaptive Streaming: From Bitrate Maximization to Rate-Distortion Optimization

    Get PDF
    The fundamental conflict between the increasing consumer demand for better Quality-of-Experience (QoE) and the limited supply of network resources has become significant challenges to modern video delivery systems. State-of-the-art adaptive bitrate (ABR) streaming algorithms are dedicated to drain available bandwidth in hope to improve viewers' QoE, resulting in inefficient use of network resources. In this thesis, we develop an alternative design paradigm, namely rate-distortion optimized streaming (RDOS), to balance the contrast demands from video consumers and service providers. Distinct from the traditional bitrate maximization paradigm, RDOS must operate at any given point along the rate-distortion curve, as specified by a trade-off parameter. The new paradigm has found plausible explanations in information theory, economics, and visual perception. To instantiate the new philosophy, we decompose adaptive streaming algorithms into three mutually independent components, including throughput predictor, reward function, and bitrate selector. We provide a unified framework to understand the connections among all existing ABR algorithms. The new perspective also illustrates the fundamental limitations of each algorithm by going behind its underlying assumptions. Based on the insights, we propose novel improvements to each of the three functional components. To alleviate a series of unrealistic assumptions behind bitrate-based QoE models, we develop a theoretically-grounded objective QoE model. The new objective QoE model combines the information from subject-rated streaming videos and the prior knowledge about human visual system (HVS) in a principled way. By analyzing a corpus of psychophysical experiments, we show the QoE function estimation can be formulated as a projection onto convex sets problem. The proposed model presents strong generalization capability over a broad range of source contents, video encoders, and viewing conditions. Most importantly, the QoE model disentangles bitrate with quality, making it an ideal component in the RDOS framework. In contrast to the existing throughput estimators that approximate the marginal probability distribution over all connections, we optimize the throughput predictor conditioned on each client. Although there are lack of training data for each Internet Protocol connection, we can leverage the latest advances in meta learning to incorporate the knowledge embedded in similar tasks. With a deliberately designed objective function, the algorithm learns to identify similar structures among different network characteristics from millions of realistic throughput traces. During the test phase, the model can quickly adapt to connection-level network characteristics with only a small amount of training data from novel streaming video clients with a small number of gradient steps. The enormous space of streaming videos, constantly progressing encoding schemes, and great diversity of throughput characteristics make it extremely challenging for modern data-driven bitrate selectors that are trained with limited samples to generalize well. To this end, we propose a Bayesian bitrate selection algorithm by adaptively fusing an online, robust, and short-term optimal controller with an offline, susceptible, and long-term optimal planner. Depending on the reliability of the two controllers in certain system states, the algorithm dynamically prioritizes the one of the two decision rules to obtain the optimal decision. To faithfully evaluate the performance of RDOS, we construct a large-scale streaming video dataset -- the Waterloo Streaming Video database. It contains a wide variety of high quality source contents, encoders, encoding profiles, realistic throughput traces, and viewing devices. Extensive objective evaluation demonstrates the proposed algorithm can deliver identical QoE to state-of-the-art ABR algorithms at a much lower cost. The improvement is also supported by so-far the largest subjective video quality assessment experiment

    Recording, compression and representation of dense light fields

    Get PDF
    The concept of light fields allows image based capture of scenes, providing, on a recorded dataset, many of the features available in computer graphics, like simulation of different viewpoints, or change of core camera parameters, including depth of field. Due to the increase in the recorded dimension from two for a regular image to four for a light field recording, previous works mainly concentrate on small or undersampled light field recordings. This thesis is concerned with the recording of a dense light field dataset, including the estimation of suitable sampling parameters, as well as the implementation of the required capture, storage and processing methods. Towards this goal, the influence of an optical system on the, possibly bandunlimited, light field signal is examined, deriving the required sampling rates from the bandlimiting effects of the camera and optics. To increase storage capacity and bandwidth a very fast image compression methods is introduced, providing an order of magnitude faster compression than previous methods, reducing the I/O bottleneck for light field processing. A fiducial marker system is provided for the calibration of the recorded dataset, which provides a higher number of reference points than previous methods, improving camera pose estimation. In conclusion this work demonstrates the feasibility of dense sampling of a large light field, and provides a dataset which may be used for evaluation or as a reference for light field processing tasks like interpolation, rendering and sampling.Das Konzept des Lichtfelds erlaubt eine bildbasierte Erfassung von Szenen und ermöglicht es, auf den erfassten Daten viele Effekte aus der Computergrafik zu berechnen, wie das Simulieren alternativer Kamerapositionen oder die Veränderung zentraler Parameter, wie zum Beispiel der Tiefenschärfe. Aufgrund der enorm vergrößerte Datenmenge die für eine Aufzeichnung benötigt wird, da Lichtfelder im Vergleich zu den zwei Dimensionen herkömmlicher Kameras über vier Dimensionen verfügen, haben frühere Arbeiten sich vor allem mit kleinen oder unterabgetasteten Lichtfeldaufnahmen beschäftigt. Diese Arbeit hat das Ziel eine dichte Aufnahme eines Lichtfeldes vorzunehmen. Dies beinhaltet die Berechnung adäquater Abtastparameter, sowie die Implementierung der benötigten Aufnahme-, Verarbeitungs- und Speicherprozesse. In diesem Zusammenhang werden die bandlimitierenden Effekte des optischen Aufnahmesystems auf das möglicherweise nicht bandlimiterte Signal des Lichtfeldes untersucht und die benötigten Abtastraten davon abgeleitet. Um die Bandbreite und Kapazität des Speichersystems zu erhöhen wird ein neues, extrem schnelles Verfahren der Bildkompression eingeführt, welches um eine Größenordnung schneller operiert als bisherige Methoden. Für die Kalibrierung der Kamerapositionen des aufgenommenen Datensatzes wird ein neues System von sich selbst identifizierenden Passmarken vorgestellt, welches im Vergleich zu früheren Methoden mehr Referenzpunkte auf gleichem Raum zu Verfügung stellen kann und so die Kamerakalibrierung verbessert. Kurz zusammengefasst demonstriert diese Arbeit die Durchführbarkeit der Aufnahme eines großen und dichten Lichtfeldes, und stellt einen entsprechenden Datensatz zu Verfügung. Der Datensatz ist geeignet als Referenz für die Untersuchung von Methoden zur Verarbeitung von Lichtfeldern, sowie für die Evaluation von Methoden zur Interpolation, zur Abtastung und zum Rendern
    corecore