761 research outputs found

    Efficient Visual Computing with Camera RAW Snapshots

    Get PDF
    Conventional cameras capture image irradiance (RAW) on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP is not necessary for visual computing. In this paper, we propose a novel ρ-Vision framework to perform high-level semantic understanding and low-level compression using RAW images without the ISP subsystem used for decades. Considering the scarcity of available RAW image datasets, we first develop an unpaired CycleR2R network based on unsupervised CycleGAN to train modular unrolled ISP and inverse ISP (invISP) models using unpaired RAW and RGB images. We can then flexibly generate simulated RAW images (simRAW) using any existing RGB image dataset and finetune different models originally trained in the RGB domain to process real-world camera RAW images. We demonstrate object detection and image compression capabilities in RAW-domain using RAW-domain YOLOv3 and RAW image compressor (RIC) on camera snapshots. Quantitative results reveal that RAW-domain task inference provides better detection accuracy and compression efficiency compared to that in the RGB domain. Furthermore, the proposed ρ-Vision generalizes across various camera sensors and different task-specific models. An added benefit of employing the ρ-Vision is the elimination of the need for ISP, leading to potential reductions in computations and processing times

    Quality of experience and access network traffic management of HTTP adaptive video streaming

    Get PDF
    The thesis focuses on Quality of Experience (QoE) of HTTP adaptive video streaming (HAS) and traffic management in access networks to improve the QoE of HAS. First, the QoE impact of adaptation parameters and time on layer was investigated with subjective crowdsourcing studies. The results were used to compute a QoE-optimal adaptation strategy for given video and network conditions. This allows video service providers to develop and benchmark improved adaptation logics for HAS. Furthermore, the thesis investigated concepts to monitor video QoE on application and network layer, which can be used by network providers in the QoE-aware traffic management cycle. Moreover, an analytic and simulative performance evaluation of QoE-aware traffic management on a bottleneck link was conducted. Finally, the thesis investigated socially-aware traffic management for HAS via Wi-Fi offloading of mobile HAS flows. A model for the distribution of public Wi-Fi hotspots and a platform for socially-aware traffic management on private home routers was presented. A simulative performance evaluation investigated the impact of Wi-Fi offloading on the QoE and energy consumption of mobile HAS.Die Doktorarbeit beschäftigt sich mit Quality of Experience (QoE) – der subjektiv empfundenen Dienstgüte – von adaptivem HTTP Videostreaming (HAS) und mit Verkehrsmanagement, das in Zugangsnetzwerken eingesetzt werden kann, um die QoE des adaptiven Videostreamings zu verbessern. Zuerst wurde der Einfluss von Adaptionsparameters und der Zeit pro Qualitätsstufe auf die QoE von adaptivem Videostreaming mittels subjektiver Crowdsourcingstudien untersucht. Die Ergebnisse wurden benutzt, um die QoE-optimale Adaptionsstrategie für gegebene Videos und Netzwerkbedingungen zu berechnen. Dies ermöglicht Dienstanbietern von Videostreaming verbesserte Adaptionsstrategien für adaptives Videostreaming zu entwerfen und zu benchmarken. Weiterhin untersuchte die Arbeit Konzepte zum Überwachen von QoE von Videostreaming in der Applikation und im Netzwerk, die von Netzwerkbetreibern im Kreislauf des QoE-bewussten Verkehrsmanagements eingesetzt werden können. Außerdem wurde eine analytische und simulative Leistungsbewertung von QoE-bewusstem Verkehrsmanagement auf einer Engpassverbindung durchgeführt. Schließlich untersuchte diese Arbeit sozialbewusstes Verkehrsmanagement für adaptives Videostreaming mittels WLAN Offloading, also dem Auslagern von mobilen Videoflüssen über WLAN Netzwerke. Es wurde ein Modell für die Verteilung von öffentlichen WLAN Zugangspunkte und eine Plattform für sozialbewusstes Verkehrsmanagement auf privaten, häuslichen WLAN Routern vorgestellt. Abschließend untersuchte eine simulative Leistungsbewertung den Einfluss von WLAN Offloading auf die QoE und den Energieverbrauch von mobilem adaptivem Videostreaming

    Fast and Efficient Entropy Coding Architectures for Massive Data Compression

    Get PDF
    The compression of data is fundamental to alleviating the costs of transmitting and storing massive datasets employed in myriad fields of our society. Most compression systems employ an entropy coder in their coding pipeline to remove the redundancy of coded symbols. The entropy-coding stage needs to be efficient, to yield high compression ratios, and fast, to process large amounts of data rapidly. Despite their widespread use, entropy coders are commonly assessed for some particular scenario or coding system. This work provides a general framework to assess and optimize different entropy coders. First, the paper describes three main families of entropy coders, namely those based on variable-to-variable length codes (V2VLC), arithmetic coding (AC), and tabled asymmetric numeral systems (tANS). Then, a low-complexity architecture for the most representative coder(s) of each family is presented-more precisely, a general version of V2VLC, the MQ, M, and a fixed-length version of AC and two different implementations of tANS. These coders are evaluated under different coding conditions in terms of compression efficiency and computational throughput. The results obtained suggest that V2VLC and tANS achieve the highest compression ratios for most coding rates and that the AC coder that uses fixed-length codewords attains the highest throughput. The experimental evaluation discloses the advantages and shortcomings of each entropy-coding scheme, providing insights that may help to select this stage in forthcoming compression systems

    Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding

    Full text link
    Neural video compression (NVC) is a rapidly evolving video coding research area, with some models achieving superior coding efficiency compared to the latest video coding standard Versatile Video Coding (VVC). In conventional video coding standards, the hierarchical B-frame coding, which utilizes a bidirectional prediction structure for higher compression, had been well-studied and exploited. In NVC, however, limited research has investigated the hierarchical B scheme. In this paper, we propose an NVC model exploiting hierarchical B-frame coding with temporal layer-adaptive optimization. We first extend an existing unidirectional NVC model to a bidirectional model, which achieves -21.13% BD-rate gain over the unidirectional baseline model. However, this model faces challenges when applied to sequences with complex or large motions, leading to performance degradation. To address this, we introduce temporal layer-adaptive optimization, incorporating methods such as temporal layer-adaptive quality scaling (TAQS) and temporal layer-adaptive latent scaling (TALS). The final model with the proposed methods achieves an impressive BD-rate gain of -39.86% against the baseline. It also resolves the challenges in sequences with large or complex motions with up to -49.13% more BD-rate gains than the simple bidirectional extension. This improvement is attributed to the allocation of more bits to lower temporal layers, thereby enhancing overall reconstruction quality with smaller bits. Since our method has little dependency on a specific NVC model architecture, it can serve as a general tool for extending unidirectional NVC models to the ones with hierarchical B-frame coding

    Aproximaciones en la preparación de contenido de vídeo para la transmisión de vídeo bajo demanda (VOD) con DASH

    Get PDF
    El consumo de contenido multimedia a través de Internet, especialmente el vídeo, está experimentado un crecimiento constante, convirtiéndose en una actividad cotidiana entre individuos de todo el mundo. En este contexto, en los últimos años se han desarrollado numerosos estudios enfocados en la preparación, distribución y transmisión de contenido multimedia, especialmente en el ámbito del vídeo bajo demanda (VoD). Esta tesis propone diferentes contribuciones en el campo de la codificación de vídeo para VoD que será transmitido usando el estándar Dynamic Adaptive Streaming over HTTP (DASH). El objetivo es encontrar un equilibrio entre el uso eficiente de recursos computacionales y la garantía de ofrecer una calidad experiencia (QoE) alta para el espectador final. Como punto de partida, se ofrece un estudio exhaustivo sobre investigaciones relacionadas con técnicas de codificación y transcodificación de vídeo en la nube, enfocándose especialmente en la evolución del streaming y la relevancia del proceso de codificación. Además, se examinan las propuestas en función del tipo de virtualización y modalidades de entrega de contenido. Se desarrollan dos enfoques de codificación adaptativa basada en la calidad, con el objetivo de ajustar la calidad de toda la secuencia de vídeo a un nivel deseado. Los resultados indican que las soluciones propuestas pueden reducir el tamaño del vídeo manteniendo la misma calidad a lo largo de todos los segmentos del vídeo. Además, se propone una solución de codificación basada en escenas y se analiza el impacto de utilizar vídeo a baja resolución (downscaling) para detectar escenas en términos de tiempo, calidad y tamaño. Los resultados muestran que se reduce el tiempo total de codificación, el consumo de recursos computacionales y el tamaño del vídeo codificado. La investigación también presenta una arquitectura que paraleliza los trabajos involucrados en la preparación de contenido DASH utilizando el paradigma FaaS (Function-as-a-Service), en una plataforma serverless. Se prueba esta arquitectura con tres funciones encapsuladas en contenedores, para codificar y analizar la calidad de los vídeos, obteniendo resultados prometedores en términos de escalabilidad y distribución de trabajos. Finalmente, se crea una herramienta llamada VQMTK, que integra 14 métricas de calidad de vídeo en un contenedor con Docker, facilitando la evaluación de la calidad del vídeo en diversos entornos. Esta herramienta puede ser de gran utilidad en el ámbito de la codificación de vídeo, en la generación de conjuntos de datos para entrenar redes neuronales profundas y en entornos científicos como educativos. En resumen, la tesis ofrece soluciones y herramientas innovadoras para mejorar la eficiencia y la calidad en la preparación y transmisión de contenido multimedia en la nube, proporcionando una base sólida para futuras investigaciones y desarrollos en este campo que está en constante evolución.The consumption of multimedia content over the Internet, especially video, is growing steadily, becoming a daily activity among people around the world. In this context, several studies have been developed in recent years focused on the preparation, distribution, and transmission of multimedia content, especially in the field of video on demand (VoD). This thesis proposes different contributions in the field of video coding for transmission in VoD scenarios using Dynamic Adaptive Streaming over HTTP (DASH) standard. The goal is to find a balance between the efficient use of computational resources and the guarantee of delivering a high-quality experience (QoE) for the end viewer. As a starting point, a comprehensive survey on research related to video encoding and transcoding techniques in the cloud is provided, focusing especially on the evolution of streaming and the relevance of the encoding process. In addition, proposals are examined as a function of the type of virtualization and content delivery modalities. Two quality-based adaptive coding approaches are developed with the objective of adjusting the quality of the entire video sequence to a desired level. The results indicate that the proposed solutions can reduce the video size while maintaining the same quality throughout all video segments. In addition, a scene-based coding solution is proposed and the impact of using downscaling video to detect scenes in terms of time, quality and size is analyzed. The results show that the required encoding time, computational resource consumption and the size of the encoded video are reduced. The research also presents an architecture that parallelizes the jobs involved in content preparation using the FaaS (Function-as-a-Service) paradigm, on a serverless platform. This architecture is tested with three functions encapsulated in containers, to encode and analyze the quality of the videos, obtaining promising results in terms of scalability and job distribution. Finally, a tool called VQMTK is developed, which integrates 14 video quality metrics in a container with Docker, facilitating the evaluation of video quality in various environments. This tool can be of great use in the field of video coding, in the generation of datasets to train deep neural networks, and in scientific environments such as educational. In summary, the thesis offers innovative solutions and tools to improve efficiency and quality in the preparation and transmission of multimedia content in the cloud, providing a solid foundation for future research and development in this constantly evolving field

    Cybersecurity applications of Blockchain technologies

    Get PDF
    With the increase in connectivity, the popularization of cloud services, and the rise of the Internet of Things (IoT), decentralized approaches for trust management are gaining momentum. Since blockchain technologies provide a distributed ledger, they are receiving massive attention from the research community in different application fields. However, this technology does not provide cybersecurity by itself. Thus, this thesis first aims to provide a comprehensive review of techniques and elements that have been proposed to achieve cybersecurity in blockchain-based systems. The analysis is intended to target area researchers, cybersecurity specialists and blockchain developers. We present a series of lessons learned as well. One of them is the rise of Ethereum as one of the most used technologies. Furthermore, some intrinsic characteristics of the blockchain, like permanent availability and immutability made it interesting for other ends, namely as covert channels and malicious purposes. On the one hand, the use of blockchains by malwares has not been characterized yet. Therefore, this thesis also analyzes the current state of the art in this area. One of the lessons learned is that covert communications have received little attention. On the other hand, although previous works have analyzed the feasibility of covert channels in a particular blockchain technology called Bitcoin, no previous work has explored the use of Ethereum to establish a covert channel considering all transaction fields and smart contracts. To foster further defence-oriented research, two novel mechanisms are presented on this thesis. First, Zephyrus takes advantage of all Ethereum fields and smartcontract bytecode. Second, Smart-Zephyrus is built to complement Zephyrus by leveraging smart contracts written in Solidity. We also assess the mechanisms feasibility and cost. Our experiments show that Zephyrus, in the best case, can embed 40 Kbits in 0.57 s. for US1.64,andretrievethemin2.8s.SmartZephyrus,however,isabletohidea4Kbsecretin41s.Whilebeingexpensive(aroundUS 1.64, and retrieve them in 2.8 s. Smart-Zephyrus, however, is able to hide a 4 Kb secret in 41 s. While being expensive (around US 1.82 per bit), the provided stealthiness might be worth the price for attackers. Furthermore, these two mechanisms can be combined to increase capacity and reduce costs.Debido al aumento de la conectividad, la popularización de los servicios en la nube y el auge del Internet de las cosas (IoT), los enfoques descentralizados para la gestión de la confianza están cobrando impulso. Dado que las tecnologías de cadena de bloques (blockchain) proporcionan un archivo distribuido, están recibiendo una atención masiva por parte de la comunidad investigadora en diferentes campos de aplicación. Sin embargo, esta tecnología no proporciona ciberseguridad por sí misma. Por lo tanto, esta tesis tiene como primer objetivo proporcionar una revisión exhaustiva de las técnicas y elementos que se han propuesto para lograr la ciberseguridad en los sistemas basados en blockchain. Este análisis está dirigido a investigadores del área, especialistas en ciberseguridad y desarrolladores de blockchain. A su vez, se presentan una serie de lecciones aprendidas, siendo una de ellas el auge de Ethereum como una de las tecnologías más utilizadas. Asimismo, algunas características intrínsecas de la blockchain, como la disponibilidad permanente y la inmutabilidad, la hacen interesante para otros fines, concretamente como canal encubierto y con fines maliciosos. Por una parte, aún no se ha caracterizado el uso de la blockchain por parte de malwares. Por ello, esta tesis también analiza el actual estado del arte en este ámbito. Una de las lecciones aprendidas al analizar los datos es que las comunicaciones encubiertas han recibido poca atención. Por otro lado, aunque trabajos anteriores han analizado la viabilidad de los canales encubiertos en una tecnología blockchain concreta llamada Bitcoin, ningún trabajo anterior ha explorado el uso de Ethereum para establecer un canal encubierto considerando todos los campos de transacción y contratos inteligentes. Con el objetivo de fomentar una mayor investigación orientada a la defensa, en esta tesis se presentan dos mecanismos novedosos. En primer lugar, Zephyrus aprovecha todos los campos de Ethereum y el bytecode de los contratos inteligentes. En segundo lugar, Smart-Zephyrus complementa Zephyrus aprovechando los contratos inteligentes escritos en Solidity. Se evalúa, también, la viabilidad y el coste de ambos mecanismos. Los resultados muestran que Zephyrus, en el mejor de los casos, puede ocultar 40 Kbits en 0,57 s. por 1,64 US$, y recuperarlos en 2,8 s. Smart-Zephyrus, por su parte, es capaz de ocultar un secreto de 4 Kb en 41 s. Si bien es cierto que es caro (alrededor de 1,82 dólares por bit), el sigilo proporcionado podría valer la pena para los atacantes. Además, estos dos mecanismos pueden combinarse para aumentar la capacidad y reducir los costesPrograma de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: José Manuel Estévez Tapiador.- Secretario: Jorge Blasco Alís.- Vocal: Luis Hernández Encina

    BVI-VFI: A Video Quality Database for Video Frame Interpolation

    Full text link
    Video frame interpolation (VFI) is a fundamental research topic in video processing, which is currently attracting increased attention across the research community. While the development of more advanced VFI algorithms has been extensively researched, there remains little understanding of how humans perceive the quality of interpolated content and how well existing objective quality assessment methods perform when measuring the perceived quality. In order to narrow this research gap, we have developed a new video quality database named BVI-VFI, which contains 540 distorted sequences generated by applying five commonly used VFI algorithms to 36 diverse source videos with various spatial resolutions and frame rates. We collected more than 10,800 quality ratings for these videos through a large scale subjective study involving 189 human subjects. Based on the collected subjective scores, we further analysed the influence of VFI algorithms and frame rates on the perceptual quality of interpolated videos. Moreover, we benchmarked the performance of 33 classic and state-of-the-art objective image/video quality metrics on the new database, and demonstrated the urgent requirement for more accurate bespoke quality assessment methods for VFI. To facilitate further research in this area, we have made BVI-VFI publicly available at https://github.com/danier97/BVI-VFI-database

    Region-Based Template Matching Prediction for Intra Coding

    Get PDF
    Copy prediction is a renowned category of prediction techniques in video coding where the current block is predicted by copying the samples from a similar block that is present somewhere in the already decoded stream of samples. Motion-compensated prediction, intra block copy, template matching prediction etc. are examples. While the displacement information of the similar block is transmitted to the decoder in the bit-stream in the first two approaches, it is derived at the decoder in the last one by repeating the same search algorithm which was carried out at the encoder. Region-based template matching is a recently developed prediction algorithm that is an advanced form of standard template matching. In this method, the reference area is partitioned into multiple regions and the region to be searched for the similar block(s) is conveyed to the decoder in the bit-stream. Further, its final prediction signal is a linear combination of already decoded similar blocks from the given region. It was demonstrated in previous publications that region-based template matching is capable of achieving coding efficiency improvements for intra as well as inter-picture coding with considerably less decoder complexity than conventional template matching. In this paper, a theoretical justification for region-based template matching prediction subject to experimental data is presented. Additionally, the test results of the aforementioned method on the latest H.266/Versatile Video Coding (VVC) test model (version VTM-14.0) yield an average Bjøntegaard-Delta (BD) bit-rate savings of −0.75% using all intra (AI) configuration with 130% encoder run-time and 104% decoder run-time for a particular parameter selection

    Perceptual Video Coding for Machines via Satisfied Machine Ratio Modeling

    Full text link
    Video Coding for Machines (VCM) aims to compress visual signals for machine analysis. However, existing methods only consider a few machines, neglecting the majority. Moreover, the machine perceptual characteristics are not effectively leveraged, leading to suboptimal compression efficiency. In this paper, we introduce Satisfied Machine Ratio (SMR) to address these issues. SMR statistically measures the quality of compressed images and videos for machines by aggregating satisfaction scores from them. Each score is calculated based on the difference in machine perceptions between original and compressed images. Targeting image classification and object detection tasks, we build two representative machine libraries for SMR annotation and construct a large-scale SMR dataset to facilitate SMR studies. We then propose an SMR prediction model based on the correlation between deep features differences and SMR. Furthermore, we introduce an auxiliary task to increase the prediction accuracy by predicting the SMR difference between two images in different quality levels. Extensive experiments demonstrate that using the SMR models significantly improves compression performance for VCM, and the SMR models generalize well to unseen machines, traditional and neural codecs, and datasets. In summary, SMR enables perceptual coding for machines and advances VCM from specificity to generality. Code is available at \url{https://github.com/ywwynm/SMR}
    corecore