517 research outputs found

    Cartography

    Get PDF
    The terrestrial space is the place of interaction of natural and social systems. The cartography is an essential tool to understand the complexity of these systems, their interaction and evolution. This brings the cartography to an important place in the modern world. The book presents several contributions at different areas and activities showing the importance of the cartography to the perception and organization of the territory. Learning with the past or understanding the present the use of cartography is presented as a way of looking to almost all themes of the knowledge

    Network and Content Intelligence for 360 Degree Video Streaming Optimization

    Get PDF
    In recent years, 360° videos, a.k.a. spherical frames, became popular among users creating an immersive streaming experience. Along with the advances in smart- phones and Head Mounted Devices (HMD) technology, many content providers have facilitated to host and stream 360° videos in both on-demand and live stream- ing modes. Therefore, many different applications have already arisen leveraging these immersive videos, especially to give viewers an impression of presence in a digital environment. For example, with 360° videos, now it is possible to connect people in a remote meeting in an interactive way which essentially increases the productivity of the meeting. Also, creating interactive learning materials using 360° videos for students will help deliver the learning outcomes effectively. However, streaming 360° videos is not an easy task due to several reasons. First, 360° video frames are 4–6 times larger than normal video frames to achieve the same quality as a normal video. Therefore, delivering these videos demands higher bandwidth in the network. Second, processing relatively larger frames requires more computational resources at the end devices, particularly for end user devices with limited resources. This will impact not only the delivery of 360° videos but also many other applications running on shared resources. Third, these videos need to be streamed with very low latency requirements due their interactive nature. Inability to satisfy these requirements can result in poor Quality of Experience (QoE) for the user. For example, insufficient bandwidth incurs frequent rebuffer- ing and poor video quality. Also, inadequate computational capacity can cause faster battery draining and unnecessary heating of the device, causing discomfort to the user. Motion or cyber–sickness to the user will be prevalent if there is an unnecessary delay in streaming. These circumstances will hinder providing im- mersive streaming experiences to the much-needed communities, especially those who do not have enough network resources. To address the above challenges, we believe that enhancements to the three main components in video streaming pipeline, server, network and client, are essential. Starting from network, it is beneficial for network providers to identify 360° video flows as early as possible and understand their behaviour in the network to effec- tively allocate sufficient resources for this video delivery without compromising the quality of other services. Content servers, at one end of this streaming pipeline, re- quire efficient 360° video frame processing mechanisms to support adaptive video streaming mechanisms such as ABR (Adaptive Bit Rate) based streaming, VP aware streaming, a streaming paradigm unique to 360° videos that select only part of the larger video frame that fall within the user-visible region, etc. On the other end, the client can be combined with edge-assisted streaming to deliver 360° video content with reduced latency and higher quality. Following the above optimization strategies, in this thesis, first, we propose a mech- anism named 360NorVic to extract 360° video flows from encrypted video traffic and analyze their traffic characteristics. We propose Machine Learning (ML) mod- els to classify 360° and normal videos under different scenarios such as offline, near real-time, VP-aware streaming and Mobile Network Operator (MNO) level stream- ing. Having extracted 360° video traffic traces both in packet and flow level data at higher accuracy, we analyze and understand the differences between 360° and normal video patterns in the encrypted traffic domain that is beneficial for effec- tive resource optimization for enhancing 360° video delivery. Second, we present a WGAN (Wesserstien Generative Adversarial Network) based data generation mechanism (namely VideoTrain++) to synthesize encrypted network video traffic, taking minimal data. Leveraging synthetic data, we show improved performance in 360° video traffic analysis, especially in ML-based classification in 360NorVic. Thirdly, we propose an effective 360° video frame partitioning mechanism (namely VASTile) at the server side to support VP-aware 360° video streaming with dy- namic tiles (or variable tiles) of different sizes and locations on the frame. VASTile takes a visual attention map on the video frames as the input and applies a com- putational geometric approach to generate a non-overlapping tile configuration to cover the video frames adaptive to the visual attention. We present VASTile as a scalable approach for video frame processing at the servers and a method to re- duce bandwidth consumption in network data transmission. Finally, by applying VASTile to the individual user VP at the client side and utilizing cache storage of Multi Access Edge Computing (MEC) servers, we propose OpCASH, a mech- anism to personalize the 360° video streaming with dynamic tiles with the edge assistance. While proposing an ILP based solution to effectively select cached variable tiles from MEC servers that might not be identical to the requested VP tiles by user, but still effectively cover the same VP region, OpCASH maximize the cache utilization and reduce the number of requests to the content servers in congested core network. With this approach, we demonstrate the gain in latency and bandwidth saving and video quality improvement in personalized 360° video streaming

    SSD의 긴 꼬리 지연시간 문제 완화를 위한 강화학습의 적용

    Get PDF
    학위논문(박사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2020. 2. 유승주.NAND flash memory is widely used in a variety of systems, from realtime embedded systems to high-performance enterprise server systems. Flash memory has (1) erase-before-write (write-once) and (2) endurance problems. To handle the erase-before-write feature, apply a flash-translation layer (FTL). Currently, the page-level mapping method is mainly used to reduce the latency increase caused by the write-once and block erase characteristics of flash memory. Garbage collection (GC) is one of the leading causes of long-tail latency, which increases more than 100 times the average latency at 99th percentile. Therefore, real-time systems or quality-critical systems cannot satisfy given requirements such as QoS restrictions. As flash memory capacity increases, GC latency also tends to increase. This is because the block size (the number of pages included in one block) of the flash memory increases as the capacity of the flash memory increases. GC latency is determined by valid page copy and block erase time. Therefore, as block size increases, GC latency also increases. Especially, the block size gets increased from 2D to 3D NAND flash memory, e.g., 256 pages/block in 2D planner NAND flash memory and 768 pages/block in 3D NAND flash memory. Even in 3D NAND flash memory, the block size is expected to continue to increase. Thus, the long write latency problem incurred by GC can become more serious in 3D NAND flash memory-based storage. In this dissertation, we propose three versions of the novel GC scheduling method based on reinforcement learning. The purpose of this method is to reduce the long tail latency caused by GC by utilizing the idle time of the storage system. Also, we perform a quantitative analysis for the RL-assisted GC solution. RL-assisted GC scheduling technique was proposed which learns the storage access behavior online and determines the number of GC operations to exploit the idle time. We also presented aggressive methods, which helps in further reducing the long tail latency by aggressively performing fine-grained GC operations. We also proposed a technique that dynamically manages key states in RL-assisted GC to reduce the long-tail latency. This technique uses many fine-grained pieces of information as state candidates and manages key states that suitably represent the characteristics of the workload using a relatively small amount of memory resource. Thus, the proposed method can reduce the long-tail latency even further. In addition, we presented a Q-value prediction network that predicts the initial Q-value of a newly inserted state in the Q-table cache. The integrated solution of the Q-table cache and Q-value prediction network can exploit the short-term history of the system with a low-cost Q-table cache. It is also equipped with a small network called Q-value prediction network to make use of the long-term history and provide good Q-value initialization for the Q-table cache. The experiments show that our proposed method reduces by 25%-37% the long tail latency compared to the state-of-the-art method.낸드 플래시 메모리는 실시간 임베디드 시스템으로부터 고성능의 엔터프라이즈 서버 시스템까지 다양한 시스템에서 널리 사용 되고 있다. 플래시 메모리는 (1) erase-before-write (write-once)와 (2) endurance 문제를 갖고 있다. Erase-before-write 특성을 다루기 위해 flash-translation layer (FTL)을 적용 한다. 현재 플래시 메모리의 write-once 특성과 block erase특성으로 인한 latency 증가를 감소 시키기 위하여 page-level mapping방식이 주로 사용 된다. Garbage collection (GC)은 99th percentile에서 평균 지연시간의 100배 이상 증가하는 long tail latency를 유발시키는 주요 원인 중 하나이다. 따라서 실시간 시스템이나 quality-critical system에서는 Quality of Service (QoS) 제한과 같은 주어진 요구 조건을 만족 시킬 수 없다. 플래시 메모리의 용량이 증가함에 따라 GC latency도 증가하는 경향을 보인다. 이것은 플래시 메모리의 용량이 증가 함에 따라 플래시 메모리의 블록 크기 (하나의 블록이 포함하고 있는 페이지의 수)가 증가 하기 때문이다. GC latency는 valid page copy와 block erase 시간에 의해 결정 된다. 따라서, 블록 크기가 증가하면, GC latency도 증가 한다. 특히, 최근 2D planner 플래시 메모리에서 3D vertical 플래시 메모리 구조로 전환됨에 따라 블록 크기는 증가 하였다. 심지어 3D vertical 플래시 메모리에서도 블록 크기가 지속적으로 증가 하고 있다. 따라서 3D vertical 플래시 메모리에서 long tail latency 문제는 더욱 심각해 진다. 본 논문에서 우리는 강화학습(Reinforcement learning, RL)을 이용한 세 가지 버전의 새로운 GC scheduling 기법을 제안하였다. 제안된 기술의 목적은 스토리지 시스템의 idle 시간을 활용하여 GC에 의해 발생된 long tail latency를 감소 시키는 것이다. 또한, 우리는 RL-assisted GC 솔루션을 위한 정량 분석 하였다. 우리는 스토리지의 access behavior를 온라인으로 학습하고, idle 시간을 활용할 수 있는 GC operation의 수를 결정하는 RL-assisted GC scheduling 기술을 제안 하였다. 추가적으로 우리는 공격적인 방법을 제시 하였다. 이 방법은 작은 단위의 GC operation들을 공격적으로 수행 함으로써, long tail latency를 더욱 감소 시킬 수 있도록 도움을 준다. 또한 우리는 long tail latency를 더욱 감소시키기 위하여 RL-assisted GC의 key state들을 동적으로 관리할 수 있는 Q-table cache 기술을 제안 하였다. 이 기술은 state 후보로 매우 많은 수의 세밀한 정보들을 사용 하고, 상대적으로 작은 메모리 공간을 이용하여 workload의 특성을 적절하게 표현 할 수 있는 key state들을 관리 한다. 따라서, 제안된 방법은 long tail latency를 더욱 감소 시킬 수 있다. 추가적으로, 우리는 Q-table cache에 새롭게 추가되는 state의 초기값을 예측하는 Q-value prediction network (QP Net)를 제안 하였다. Q-table cache와 QP Net의 통합 솔루션은 저 비용의 Q-table cache를 이용하여 단기간의 과거 정보를 활용 할 수 있다. 또한 이것은 QP Net이라고 부르는 작은 신경망을 이용하여 학습한 장기간의 과거 정보를 사용하여 Q-table cache에 새롭게 삽입되는 state에 대해 좋은 Q-value 초기값을 제공한다. 실험결과는 제안한 방법이 state-of-the-art 방법에 비교하여 25%-37%의 long tail latency를 감소 시켰음을 보여준다.Chapter 1 Introduction 1 Chapter 2 Background 6 2.1 System Level Tail Latency 6 2.2 Solid State Drive 10 2.2.1 Flash Storage Architecture and Garbage Collection 10 2.3 Reinforcement Learning 13 Chapter 3 Related Work 17 Chapter 4 Small Q-table based Solution to Reduce Long Tail Latency 23 4.1 Problem and Motivation 23 4.1.1 Long Tail Problem in Flash Storage Access Latency 23 4.1.2 Idle Time in Flash Storage 24 4.2 Design and Implementation 26 4.2.1 Solution Overview 26 4.2.2 RL-assisted Garbage Collection Scheduling 27 4.2.3 Aggressive RL-assisted Garbage Collection Scheduling 33 4.3 Evaluation 35 4.3.1 Evaluation Setup 35 4.3.2 Results and Discussion 39 Chapter 5 Q-table Cache to Exploit a Large Number of States at Small Cost 52 5.1 Motivation 52 5.2 Design and Implementation 56 5.2.1 Solution Overview 56 5.2.2 Dynamic Key States Management 61 5.3 Evaluation 67 5.3.1 Evaluation Setup 67 5.3.2 Results and Discussion 67 Chapter 6 Combining Q-table cache and Neural Network to Exploit both Long and Short-term History 73 6.1 Motivation and Problem 73 6.1.1 More State Information can Further Reduce Long Tail Latency 73 6.1.2 Locality Behavior of Workload 74 6.1.3 Zero Initialization Problem 75 6.2 Design and Implementation 77 6.2.1 Solution Overview 77 6.2.2 Q-table Cache for Action Selection 80 6.2.3 Q-value Prediction 83 6.3 Evaluation 87 6.3.1 Evaluation Setup 87 6.3.2 Storage-Intensive Workloads 89 6.3.3 Latency Comparison: Overall 92 6.3.4 Q-value Prediction Network Effects on Latency 97 6.3.5 Q-table Cache Analysis 110 6.3.6 Immature State Analysis 113 6.3.7 Miscellaneous Analysis 116 6.3.8 Multi Channel Analysis 121 Chapter 7 Conculsion and Future Work 138 7.1 Conclusion 138 7.2 Future Work 140 Bibliography 143 국문초록 154Docto

    PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives

    Full text link
    Deep Neural Networks (DNNs) have revolutionized many aspects of our lives. The use of DNNs is becoming ubiquitous including in softwares for image recognition, speech recognition, speech synthesis, language translation, to name a few. he training of DNN architectures however is computationally expensive. Once the model is created, its use in the intended application - the inference task, is computationally heavy too and the inference needs to be fast for real time use. For obtaining high performance today, the code of Deep Learning (DL) primitives optimized for specific architectures by expert programmers exposed via libraries is the norm. However, given the constant emergence of new DNN architectures, creating hand optimized code is expensive, slow and is not scalable. To address this performance-productivity challenge, in this paper we present compiler algorithms to automatically generate high performance implementations of DL primitives that closely match the performance of hand optimized libraries. We develop novel data reuse analysis algorithms using the polyhedral model to derive efficient execution schedules automatically. In addition, because most DL primitives use some variant of matrix multiplication at their core, we develop a flexible framework where it is possible to plug in library implementations of the same in lieu of a subset of the loops. We show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance. We develop compiler algorithms to also perform operator fusions that reduce data movement through the memory hierarchy of the computer system.Comment: arXiv admin note: substantial text overlap with arXiv:2002.0214

    Combining Prior Knowledge and Data: Beyond the Bayesian Framework

    Get PDF
    For many tasks such as text categorization and control of robotic systems, state-of-the art learning systems can produce results comparable in accuracy to those of human subjects. However, the amount of training data needed for such systems can be prohibitively large for many practical problems. A text categorization system, for example, may need to see many text postings manually tagged with their subjects before it learns to predict the subject of the next posting with high accuracy. A reinforcement learning (RL) system learning how to drive a car needs a lot of experimentation with the actual car before acquiring the optimal policy. An optimizing compiler targeting a certain platform has to construct, compile, and execute many versions of the same code with different optimization parameters to determine which optimizations work best. Such extensive sampling can be time-consuming, expensive (in terms of both expense of the human expertise needed to label data and wear and tear on the robotic equipment used for exploration in case of RL), and sometimes dangerous (e.g., an RL agent driving the car off the cliff to see if it survives the crash). The goal of this work is to reduce the amount of training data an agent needs in order to learn how to perform a task successfully. This is done by providing the system with prior knowledge about its domain. The knowledge is used to bias the agent towards useful solutions and limit the amount of training needed. We explore this task in three contexts: classification (determining the subject of a newsgroup posting), control (learning to perform tasks such as driving a car up the mountain in simulation), and optimization (optimizing performance of linear algebra operations on different hardware platforms). For the text categorization problem, we introduce a novel algorithm which efficiently integrates prior knowledge into large margin classification. We show that prior knowledge simplifies the problem by reducing the size of the hypothesis space. We also provide formal convergence guarantees for our algorithm. For reinforcement learning, we introduce a novel framework for defining planning problems in terms of qualitative statements about the world (e.g., ``the faster the car is going, the more likely it is to reach the top of the mountain''). We present an algorithm based on policy iteration for solving such qualitative problems and prove its convergence. We also present an alternative framework which allows the user to specify prior knowledge quantitatively in form of a Markov Decision Process (MDP). This prior is used to focus exploration on those regions of the world in which the optimal policy is most sensitive to perturbations in transition probabilities and rewards. Finally, in the compiler optimization problem, the prior is based on an analytic model which determines good optimization parameters for a given platform. This model defines a Bayesian prior which, combined with empirical samples (obtained by measuring the performance of optimized code segments), determines the maximum-a-posteriori estimate of the optimization parameters

    Aplicación de técnicas de aprendizaje automático a la gestión y optimización de cachés de teselas para la aceleración de servicios de mapas en las infraestructuras de datos espaciales

    Get PDF
    La gran proliferación en el uso de servicios de mapas a través de la Web ha motivado la necesidad de disponer de servicios cada vez más escalables. Como respuesta a esta necesidad, los servicios de mapas basados en teselado se han perfilado como una alternativa escalable frente a los servicios de mapas tradicionales, permitiendo la actuación de mecanismos de caché o incluso la prestación del servicio mediante una colección de imágenes pregeneradas. Sin embargo, los requisitos de almacenamiento y tiempo de puesta en marcha de estos servicios resultan a menudo prohibitivos cuando la cartografía a servir cubre una zona geográfica extensa para un elevado número de escalas. Por ello, habitualmente estos servicios se ofrecen recurriendo a cachés parciales que contienen tan solo un subconjunto de la cartografía. Para garantizar una Calidad de Servicio (QoS - Quality of Service) aceptable es necesaria la actuación de adecuadas políticas de mantenimiento y gestión de estas cachés de mapas: 1) Estrategias de población inicial ó seeding de la caché. 2) Algoritmos de carga dinámica ante las peticiones de los usuarios. 3) Políticas de reemplazo de caché. Sin embargo, existe un reducido número de estas estrategias que sean específicas para los servicios de mapas. La mayor parte de estrategias aplicadas a estos servicios son extraídas de otros ámbitos, como los proxies Web tradicionales, las cuáles no tienen en cuenta la componente espacial de los objetos de mapa que gestionan. En la presente tesis se aborda este punto de mejora, diseñando nuevos algoritmos específicos para este dominio de aplicación que permitan optimizar el rendimiento de los servicios de mapas. Dado el elevado número de objetos gestionados por estas cachés y la heterogeneidad de los mismos en cuanto a capas, escalas de representación, etc., se ha hecho un esfuerzo para que las estrategias diseñadas sean automáticas o semi-automáticas, requiriendo la menor intervención humana posible. Así, se han propuesto dos novedosas estrategias para la población inicial de una caché de mapas. Una de ellas utiliza un modelo descriptivo mediante los registros de peticiones pasadas del servicio. La otra se basa en un modelo predictivo para la identificación de fenómenos geográficos directores de las peticiones de los usuarios, parametrizado o bien mediante un análisis regresivo OLS (Ordinary Least Squares) o mediante un sistema inteligente con redes neuronales. Asimismo, se han llevado a cabo importantes contribuciones en relación con las estrategias de reemplazo de estas cachés. Por una parte, se ha propuesto un sistema inteligente basado en redes neuronales, que estima la popularidad de acceso futuro en base a ciertas propiedades de los objetos que gestiona: actualidad de referencia, frecuencia de referencia, y el tamaño de la tesela referenciada. Por otra parte, se ha propuesto una estrategia, bautizada como Spatial-LFU, la cual es una variante de la estrategia Perfect-LFU, simplificada aprovechando la correlación espacial existente entre las peticiones.Departamento de Teoría de la Señal y Comunicaciones e Ingeniería Telemátic

    Adaptive mechanisms for self-aware multicore systems

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 90-92).As the push for extreme scale performance continues to make computer architectures increasingly complex, there has been a call for better programming models, and the systems to support them. Todays microprocessors now expose more system resources than ever to software, leaving it up to the application programmer to manage them. Studies have shown that the energy efficiency of future technologies may eventually affect the ultimate performance of multicore processors, and so programmers are forced to optimize systems for both performance and energy in the midst of countless configurable parameters - an extremely difficult task. Self-aware systems can configure themselves through introspection, providing performance and energy optimization without pressing an unrealistic burden on the programmer. However, to build effective self-aware systems, we must identify useful sources of adaptivity. This thesis will show the effectiveness of a number of adaptive mechanisms for self-aware multicore systems. We show that adding these mechanisms improves efficiency, and then make a case for coordinated adaptive systems. Coordinated systems treat adaptivity as a first-class object, and can outperform all non-adaptive, statically configured, and uncoordinated adaptive systems that do not possess a general view of system-wide adaptivity.by Eric Lau.S.M

    Fifth NASA Goddard Conference on Mass Storage Systems and Technologies

    Get PDF
    This document contains copies of those technical papers received in time for publication prior to the Fifth Goddard Conference on Mass Storage Systems and Technologies held September 17 - 19, 1996, at the University of Maryland, University Conference Center in College Park, Maryland. As one of an ongoing series, this conference continues to serve as a unique medium for the exchange of information on topics relating to the ingestion and management of substantial amounts of data and the attendant problems involved. This year's discussion topics include storage architecture, database management, data distribution, file system performance and modeling, and optical recording technology. There will also be a paper on Application Programming Interfaces (API) for a Physical Volume Repository (PVR) defined in Version 5 of the Institute of Electrical and Electronics Engineers (IEEE) Reference Model (RM). In addition, there are papers on specific archives and storage products
    corecore