26 research outputs found

    Обнаружение объектов на изображениях с большим разрешением на основе их пирамидально-блочной обработки

    Get PDF
    In the paper the algorithm for object detection in high resolution images is proposed. The approach uses multiscale image representation followed by block processing with the overlapping value. For each block the object detection with convolutional neural network was performed. Number of pyramid layers is limited by the Convolutional Neural Network layer size and input image resolution. Overlapping blocks splitting to improve the classification and detection accuracy is performed on each layer of pyramid except the highest one. Detected areas are merged into one if they have high overlapping value and the same class. Experimental results for the algorithm are presented in the paper.Предлагается алгоритм для обнаружения объектов на изображениях с большим разрешением, основанный на многомасштабном представлении изображения, пирамидально-блочной обработке с перекрытием, применении сверточной нейронной сети для каждого блока и объединении обнаруженных областей. Количество слоев пирамиды определяется размерами изображения и входного слоя используемой сверточной нейронной сети. На всех уровнях, кроме самого верхнего, выполняется блочное разбиение, а применение при этом перекрытия позволяет улучшить правильную классификацию объектов, которые разделяются на фрагменты и расположены в соседних блоках. Решение об объединении таких областей принимается на основе анализа метрики пересечения над объединением для них и принадлежности к одному классу. Представленные результаты тестирования алгоритма подтверждают, что рассмотренный подход позволяет повысить точность обнаружения объектов небольших размеров на изображениях с большим разрешением

    Fast and accurate object detection in high resolution 4K and 8K video using GPUs

    Full text link
    Machine learning has celebrated a lot of achievements on computer vision tasks such as object detection, but the traditionally used models work with relatively low resolution images. The resolution of recording devices is gradually increasing and there is a rising need for new methods of processing high resolution data. We propose an attention pipeline method which uses two staged evaluation of each image or video frame under rough and refined resolution to limit the total number of necessary evaluations. For both stages, we make use of the fast object detection model YOLO v2. We have implemented our model in code, which distributes the work across GPUs. We maintain high accuracy while reaching the average performance of 3-6 fps on 4K video and 2 fps on 8K video.Comment: 6 pages, 12 figures, Best Paper Finalist at IEEE High Performance Extreme Computing Conference (HPEC) 2018; copyright 2018 IEEE; (DOI will be filled when known

    Llama : Towards Low Latency Live Adaptive Streaming

    Get PDF
    Multimedia streaming, including on-demand and live delivery of content, has become the largest service, in terms of traffic volume, delivered over the Internet. The ever-increasing demand has led to remarkable advancements in multimedia delivery technology over the past three decades, facilitated by the concurrent pursuit of efficient and quality encoding of digital media. Today, the most prominent technology for online multimedia delivery is HTTP Adaptive Streaming (HAS), which utilises the stateless HTTP architecture - allowing for scalable streaming sessions that can be delivered to millions of viewers around the world using Content Delivery Networks. In HAS, the content is encoded at multiple encoding bitrates, and fragmented into segments of equal duration. The client simply fetches the consecutive segments from the server, at the desired encoding bitrate determined by an ABR algorithm which measures the network conditions and adjusts the bitrate accordingly. This method introduces new challenges to live streaming, where the content is generated in real-time, as it suffers from high end-to-end latency when compared to traditional broadcast methods due to the required buffering at client. This thesis aims to investigate low latency live adaptive streaming, focusing on the reduction of the end-to-end latency. We investigate the impact of latency on the performance of ABR algorithms in low latency scenarios by developing a simulation model and testing prominent on-demand adaptation solutions. Additionally, we conduct extensive subjective testing to further investigate the impact of bitrate changes on the perceived Quality of Experience (QoE) by users. Based on these investigations, we design an ABR algorithm suitable for low latency scenarios which can operate with a small client buffer. We evaluate the proposed low latency adaption solution against on-demand ABR algorithms and the state-of-the-art low latency ABR algorithms, under realistic network conditions using a variety of client and latency settings

    How to Perform AMP? Cubic Adjustments for Improving the QoE

    Full text link
    [EN] Adaptive Media Playout (AMP) consists of smoothly and dynamically adjusting the media playout rate to recover from undesired (e.g., buffer overflow/underflow or out-of-sync) situations. The existing AMP solutions are mainly characterized by two main aspects. The first one is their goal (e.g., keeping the buffers¿ occupancy into safe ranges or enabling media synchronization). The second one is the criteria that determine the need for triggering the playout adjustments (e.g., buffer fullness or asynchrony levels). This paper instead focuses on a third key aspect, which has not been sufficiently investigated yet: the specific adjustment strategy to be performed. In particular, we propose a novel AMP strategy, called Cubic AMP, which is based on employing a cubic interpolation method to adjust a deviated playout point to a given reference. On the one hand, mathematical analysis and graphical examples show that our proposal provides superior performance than other existing linear and quadratic AMP strategies in terms of the smoothness of the playout curve, while significantly outperforming the quadratic AMP strategy regarding the duration of the adjustment period and without increasing the computational complexity. It has also been proved and discussed that higher-order polynomial interpolation methods are less convenient than cubic ones. On the other hand, the results of subjective tests confirm that our proposal provides better Quality of Experience (QoE) than the other existing AMP strategies.This work has been funded, partially, by the “Fondo Europeo de Desarrollo Regional (FEDER)” and the Spanish Ministry of Economy and Competitiveness, under its R&D&I Support Program, in project with Ref. TEC2013-45492-R.Montagud, M.; Boronat, F.; Roig, B.; Sapena Piera, A. (2017). How to Perform AMP? Cubic Adjustments for Improving the QoE. Computer Communications. 103:61-73. https://doi.org/10.1016/j.comcom.2017.01.017S617310

    Qualitätstaxonomie für skalierbare Algorithmen von Free Viewpoint Video Objekten

    Get PDF
    Diese Dissertation beabsichtigt einen Beitrag zur Qualitätsbeurteilung von Algorithmen für Bildanalyse und Bildsynthese im Anwendungskontext Videokommunikationssysteme zu leisten. In der vorliegenden Arbeit werden Möglichkeiten und Hindernisse der nutzerzentrierten Definition von subjektiver Qualitätswahrnehmung in diesem speziellen Anwendungsfall untersucht. Qualitätsbeurteilung von aufkommender Visualisierungs-Technologie und neuen Verfahren zur Erzeugung einer dreidimensionalen Repräsentation unter der Nutzung von Bildinformation zweier Kameras für Videokommunikationssysteme wurde bisher noch nicht umfangreich behandelt und passende Ansätze dazu fehlen. Die Herausforderungen sind es qualitätsbeeinflussende Faktoren zu definieren, passende Maße zu formulieren, sowie die Qualitätsevaluierung mit den Erstellungsalgorithmen, welche noch in Entwicklung sind, zu verbinden. Der Vorteil der Verlinkung von Qualitätswahrnehmung und Servicequalität ist die Unterstützung der technischen Realisierungsprozesse hinsichtlich ihrer Anpassungsfähigkeit (z.B. an das vom Nutzer verwendete System) und Skalierbarkeit (z.B. Beachtung eines Aufwands- oder Ressourcenlimits) unter Berücksichtigung des Endnutzers und dessen Qualitätsanforderungen. Die vorliegende Arbeit beschreibt den theoretischen Hintergrund und einen Vorschlag für eine Qualitätstaxonomie als verlinkendes Modell. Diese Arbeit beinhaltet eine Beschreibung des Projektes Skalalgo3d, welches den Rahmen der Anwendung darstellt. Präsentierte Ergebnisse bestehen aus einer systematischen Definition von qualitätsbeeinflussenden Faktoren inklusive eines Forschungsrahmens und Evaluierungsaktivitäten die mehr als 350 Testteilnehmer inkludieren, sowie daraus heraus definierte Qualitätsmerkmale der evaluierten Qualität der visuellen Repräsentation für Videokommunikationsanwendungen. Ein darauf basierendes Modell um diese Ergebnisse mit den technischen Erstellungsschritten zu verlinken wird zum Schluss anhand eines formalisierten Qualitätsmaßes präsentiert. Ein Flussdiagramm und ein Richtungsfeld zur grafischen Annäherung an eine differenzierbare Funktion möglicher Zusammenhänge werden daraufhin für weitere Untersuchungen vorgeschlagen.The thesis intends to make a contribution to the quality assessment of free viewpoint video objects within the context of video communication systems. The current work analyzes opportunities and obstacles, focusing on users' subjective quality of experience in this special case. Quality estimation of emerging free viewpoint video object technology in video communication has not yet been assessed and adequate approaches are missing. The challenges are to define factors that influence quality, to formulate an adequate measure of quality, and to link the quality of experience to the technical realization within an undefined and ever-changing technical realization process. There are two advantages of interlinking the quality of experience with the quality of service: First, it can benefit the technical realization process, in order to allow adaptability (e.g., based on systems used by the end users). Second, it provides an opportunity to support scalability in a user-centered way, e.g., based on a cost or resources limitation. The thesis outlines the theoretical background and introduces a user-centered quality taxonomy in the form of an interlinking model. A description of the related project Skalalgo3d is included, which offered a framework for application. The outlined results consist of a systematic definition of factors that influence quality, including a research framework, and evaluation activities involving more than 350 participants. The thesis includes the presentation of quality features, defined by evaluations of free viewpoint video object quality, for video communication application. Based on these quality features, a model that links these results with the technical creation process, including a formalized quality measure, is presented. Based on this, a flow chart and slope field are proposed. These intend the visualization of these potential relationships and may work as a starting point for further investigations thereon and to differentiate relations in form of functions

    MediaSync: Handbook on Multimedia Synchronization

    Get PDF
    This book provides an approachable overview of the most recent advances in the fascinating field of media synchronization (mediasync), gathering contributions from the most representative and influential experts. Understanding the challenges of this field in the current multi-sensory, multi-device, and multi-protocol world is not an easy task. The book revisits the foundations of mediasync, including theoretical frameworks and models, highlights ongoing research efforts, like hybrid broadband broadcast (HBB) delivery and users' perception modeling (i.e., Quality of Experience or QoE), and paves the way for the future (e.g., towards the deployment of multi-sensory and ultra-realistic experiences). Although many advances around mediasync have been devised and deployed, this area of research is getting renewed attention to overcome remaining challenges in the next-generation (heterogeneous and ubiquitous) media ecosystem. Given the significant advances in this research area, its current relevance and the multiple disciplines it involves, the availability of a reference book on mediasync becomes necessary. This book fills the gap in this context. In particular, it addresses key aspects and reviews the most relevant contributions within the mediasync research space, from different perspectives. Mediasync: Handbook on Multimedia Synchronization is the perfect companion for scholars and practitioners that want to acquire strong knowledge about this research area, and also approach the challenges behind ensuring the best mediated experiences, by providing the adequate synchronization between the media elements that constitute these experiences

    Evaluating the Influence of Room Illumination on Camera-Based Physiological Measurements for the Assessment of Screen-Based Media

    Get PDF
    Camera-based solutions can be a convenient means of collecting physiological measurements indicative of psychological responses to stimuli. However, the low illumination playback conditions commonly associated with viewing screen-based media oppose the bright conditions recommended for accurately recording physiological data with a camera. A study was designed to determine the feasibility of obtaining physiological data, for psychological insight, in illumination conditions representative of real world viewing experiences. In this study, a novel method was applied for testing a first-of-its-kind system for measuring both heart rate and facial actions from video footage recorded with a single discretely-placed camera. Results suggest that conditions representative of a bright domestic setting should be maintained when using this technology, despite this being considered a sub-optimal playback condition. Further analyses highlight that even within this bright condition, both the camera-measured facial action and heart rate data contained characteristic errors. In future research, the influence of these performance issues on psychological insights may be mitigated by reducing the temporal resolution of the heart rate measurements and ignoring fast and low-intensity facial movements

    Entrega de conteúdos multimédia em over-the-top: caso de estudo das gravações automáticas

    Get PDF
    Doutoramento em Engenharia EletrotécnicaOver-The-Top (OTT) multimedia delivery is a very appealing approach for providing ubiquitous, exible, and globally accessible services capable of low-cost and unrestrained device targeting. In spite of its appeal, the underlying delivery architecture must be carefully planned and optimized to maintain a high Qualityof- Experience (QoE) and rational resource usage, especially when migrating from services running on managed networks with established quality guarantees. To address the lack of holistic research works on OTT multimedia delivery systems, this Thesis focuses on an end-to-end optimization challenge, considering a migration use-case of a popular Catch-up TV service from managed IP Television (IPTV) networks to OTT. A global study is conducted on the importance of Catch-up TV and its impact in today's society, demonstrating the growing popularity of this time-shift service, its relevance in the multimedia landscape, and tness as an OTT migration use-case. Catch-up TV consumption logs are obtained from a Pay-TV operator's live production IPTV service containing over 1 million subscribers to characterize demand and extract insights from service utilization at a scale and scope not yet addressed in the literature. This characterization is used to build demand forecasting models relying on machine learning techniques to enable static and dynamic optimization of OTT multimedia delivery solutions, which are able to produce accurate bandwidth and storage requirements' forecasts, and may be used to achieve considerable power and cost savings whilst maintaining a high QoE. A novel caching algorithm, Most Popularly Used (MPU), is proposed, implemented, and shown to outperform established caching algorithms in both simulation and experimental scenarios. The need for accurate QoE measurements in OTT scenarios supporting HTTP Adaptive Streaming (HAS) motivates the creation of a new QoE model capable of taking into account the impact of key HAS aspects. By addressing the complete content delivery pipeline in the envisioned content-aware OTT Content Delivery Network (CDN), this Thesis demonstrates that signi cant improvements are possible in next-generation multimedia delivery solutions.A entrega de conteúdos multimédia em Over-The-Top (OTT) e uma proposta atractiva para fornecer um serviço flexível e globalmente acessível, capaz de alcançar qualquer dispositivo, com uma promessa de baixos custos. Apesar das suas vantagens, e necessario um planeamento arquitectural detalhado e optimizado para manter níveis elevados de Qualidade de Experiência (QoE), em particular aquando da migração dos serviços suportados em redes geridas com garantias de qualidade pré-estabelecidas. Para colmatar a falta de trabalhos de investigação na área de sistemas de entrega de conteúdos multimédia em OTT, esta Tese foca-se na optimização destas soluções como um todo, partindo do caso de uso de migração de um serviço popular de Gravações Automáticas suportado em redes de Televisão sobre IP (IPTV) geridas, para um cenário de entrega em OTT. Um estudo global para aferir a importância das Gravações Automáticas revela a sua relevância no panorama de serviços multimédia e a sua adequação enquanto caso de uso de migração para cenários OTT. São obtidos registos de consumos de um serviço de produção de Gravações Automáticas, representando mais de 1 milhão de assinantes, para caracterizar e extrair informação de consumos numa escala e âmbito não contemplados ate a data na literatura. Esta caracterização e utilizada para construir modelos de previsão de carga, tirando partido de sistemas de machine learning, que permitem optimizações estáticas e dinâmicas dos sistemas de entrega de conteúdos em OTT através de previsões das necessidades de largura de banda e armazenamento, potenciando ganhos significativos em consumo energético e custos. Um novo mecanismo de caching, Most Popularly Used (MPU), demonstra um desempenho superior as soluções de referencia, quer em cenários de simulação quer experimentais. A necessidade de medição exacta da QoE em streaming adaptativo HTTP motiva a criaçao de um modelo capaz de endereçar aspectos específicos destas tecnologias adaptativas. Ao endereçar a cadeia completa de entrega através de uma arquitectura consciente dos seus conteúdos, esta Tese demonstra que são possíveis melhorias de desempenho muito significativas nas redes de entregas de conteúdos em OTT de próxima geração
    corecore