280 research outputs found
Optimizing Replacement Policies for Content Delivery Network Caching: Beyond Belady to Attain A Seemingly Unattainable Byte Miss Ratio
When facing objects/files of differing sizes in content delivery networks
(CDNs) caches, pursuing an optimal object miss ratio (OMR) by approximating
Belady no longer ensures an optimal byte miss ratio (BMR), creating confusion
about how to achieve a superior BMR in CDNs. To address this issue, we
experimentally observe that there exists a time window to delay the eviction of
the object with the longest reuse distance to improve BMR without increasing
OMR. As a result, we introduce a deep reinforcement learning (RL) model to
capture this time window by dynamically monitoring the changes in OMR and BMR,
and implementing a BMR-friendly policy in the time window. Based on this
policy, we propose a Belady and Size Eviction (LRU-BaSE) algorithm, reducing
BMR while maintaining OMR. To make LRU-BaSE efficient and practical, we address
the feedback delay problem of RL with a two-pronged approach. On the one hand,
our observation of a rear section of the LRU cache queue containing most of the
eviction candidates allows LRU-BaSE to shorten the decision region. On the
other hand, the request distribution on CDNs makes it feasible to divide the
learning region into multiple sub-regions that are each learned with reduced
time and increased accuracy. In real CDN systems, compared to LRU, LRU-BaSE can
reduce "backing to OS" traffic and access latency by 30.05\% and 17.07\%,
respectively, on average. The results on the simulator confirm that LRU-BaSE
outperforms the state-of-the-art cache replacement policies, where LRU-BaSE's
BMR is 0.63\% and 0.33\% less than that of Belady and Practical Flow-based
Offline Optimal (PFOO), respectively, on average. In addition, compared to
Learning Relaxed Belady (LRB), LRU-BaSE can yield relatively stable performance
when facing workload drift
Deep Learning for Edge Computing Applications: A State-of-the-Art Survey
With the booming development of Internet-of-Things (IoT) and communication technologies such as 5G, our future world is envisioned as an interconnected entity where billions of devices will provide uninterrupted service to our daily lives and the industry. Meanwhile, these devices will generate massive amounts of valuable data at the network edge, calling for not only instant data processing but also intelligent data analysis in order to fully unleash the potential of the edge big data. Both the traditional cloud computing and on-device computing cannot sufficiently address this problem due to the high latency and the limited computation capacity, respectively. Fortunately, the emerging edge computing sheds a light on the issue by pushing the data processing from the remote network core to the local network edge, remarkably reducing the latency and improving the efficiency. Besides, the recent breakthroughs in deep learning have greatly facilitated the data processing capacity, enabling a thrilling development of novel applications, such as video surveillance and autonomous driving. The convergence of edge computing and deep learning is believed to bring new possibilities to both interdisciplinary researches and industrial applications. In this article, we provide a comprehensive survey of the latest efforts on the deep-learning-enabled edge computing applications and particularly offer insights on how to leverage the deep learning advances to facilitate edge applications from four domains, i.e., smart multimedia, smart transportation, smart city, and smart industry. We also highlight the key research challenges and promising research directions therein. We believe this survey will inspire more researches and contributions in this promising field
Edge Replication Strategies for Wide-Area Distributed Processing
The rapid digitalization across industries comes with many challenges. One key problem is how the ever-growing and volatile data generated at distributed locations can be efficiently processed to inform decision making and improve products. Unfortunately, wide-area network capacity cannot cope with the growth of the data at the network edges. Thus, it is imperative to decide which data should be processed in-situ at the edge and which should be transferred and analyzed in data centers.
In this paper, we study two families of proactive online data replication strategies, namely ski-rental and machine learning algorithms, to decide which data is processed at the edge, close to where it is generated, and which is transferred to a data center. Our analysis using real query traces from a Global 2000 company shows that such online replication strategies can significantly reduce data transfer volume in many cases up to 50% compared to naive approaches and achieve close to optimal performance. After analyzing their shortcomings for ease of use and performance, we propose a hybrid strategy that combines the advantages of both competitive and machine learning algorithms.EC/H2020/679158/EU/Resolving the Tussle in the Internet: Mapping, Architecture, and Policy Making/ResolutioNetBMBF, 01IS18025A, Verbundprojekt BIFOLD-BBDC: Berlin Institute for the Foundations of Learning and DataBMBF, 01IS18037A, Verbundprojekt BIFOLD-BZML: Berlin Institute for the Foundations of Learning and Dat
SoK: Distributed Computing in ICN
Information-Centric Networking (ICN), with its data-oriented operation and
generally more powerful forwarding layer, provides an attractive platform for
distributed computing. This paper provides a systematic overview and
categorization of different distributed computing approaches in ICN
encompassing fundamental design principles, frameworks and orchestration,
protocols, enablers, and applications. We discuss current pain points in legacy
distributed computing, attractive ICN features, and how different systems use
them. This paper also provides a discussion of potential future work for
distributed computing in ICN.Comment: 10 pages, 3 figures, 1 table. Accepted by ACM ICN 202
Optimized and Automated Machine Learning Techniques Towards IoT Data Analytics and Cybersecurity
The Internet-of-Things (IoT) systems have emerged as a prevalent technology in our daily lives. With the wide spread of sensors and smart devices in recent years, the data generation volume and speed of IoT systems have increased dramatically. In most IoT systems, massive volumes of data must be processed, transformed, and analyzed on a frequent basis to enable various IoT services and functionalities. Machine Learning (ML) approaches have shown their capacity for IoT data analytics. However, applying ML models to IoT data analytics tasks still faces many difficulties and challenges. The first challenge is to process large amounts of dynamic IoT data to make accurate and informed decisions. The second challenge is to automate and optimize the data analytics process. The third challenge is to protect IoT devices and systems against various cyber threats and attacks. To address the IoT data analytics challenges, this thesis proposes various ML-based frameworks and data analytics approaches in several applications.
Specifically, the first part of the thesis provides a comprehensive review of applying Automated Machine Learning (AutoML) techniques to IoT data analytics tasks. It discusses all procedures of the general ML pipeline. The second part of the thesis proposes several supervised ML-based novel Intrusion Detection Systems (IDSs) to improve the security of the Internet of Vehicles (IoV) systems and connected vehicles. Optimization techniques are used to obtain optimized ML models with high attack detection accuracy. The third part of the thesis developed unsupervised ML algorithms to identify network anomalies and malicious network entities (e.g., attacker IPs, compromised machines, and polluted files/content) to protect Content Delivery Networks (CDNs) from service targeting attacks, including distributed denial of service and cache pollution attacks. The proposed framework is evaluated on real-world CDN access log data to illustrate its effectiveness. The fourth part of the thesis proposes adaptive online learning algorithms for addressing concept drift issues (i.e., data distribution changes) and effectively handling dynamic IoT data streams in order to provide reliable IoT services. The development of drift adaptive learning methods can effectively adapt to data distribution changes and avoid data analytics model performance degradation
Recommended from our members
Towards Optimized Traffic Provisioning and Adaptive Cache Management for Content Delivery
Content delivery networks (CDNs) deploy hundreds of thousands of servers around the world to cache and serve trillions of user requests every day for a diverse set of content such as web pages, videos, software downloads and images. In this dissertation, we propose algorithms to provision traffic across cache servers and manage the content they host to achieve performance objectives such as maximizing the cache hit rate, minimizing the bandwidth cost of the network and minimizing the energy consumption of the servers.
Traffic provisioning is the process of determining the set of content domains hosted on the servers. We propose footprint descriptors that effectively capture the popularity characteristics and caching performance of different content classes. We also propose a footprint descriptor calculus that can be used to decide how content should be mixed or partitioned to efficiently provision traffic. To automate traffic provisioning, we propose optimization models to provision traffic such that the cache miss traffic from the network is minimized without overloading the servers. We find that such optimization models produce significant reductions in the cache miss traffic when compared with traffic provisioning algorithms in use today.
Cache management is the process of deciding how content is cached in the servers of a CDN. We propose TTL-based caching algorithms that provably achieve performance targets specified by a CDN operator. We show that the proposed algorithms converge to the target hit rate and target cache size with low error. Finally, we propose cache management algorithms to make the servers energy-efficient using disk shutdown. We find that disk shutdown is well suited for CDN servers and provides energy savings without significantly impacting cache hit rates
Responsible, Automated Data Gathering for Timely Citizen Dashboard Provision During a Global Pandemic (COVID-19
Creating a public understanding of the dynamics of a pandemic, such as COVID-19, is vital for introducing restrictive regulations. Gathering diverse data responsibly and sharing it with experts and citizens in a timely manner is challenging. This article reviews methodologies of COVID-19 dashboard design and discusses both technical and non-technical challenges associated. Advice and lessons learned from building a citizen-focused, automated county-precision dashboard for Germany are shared. Within four months, the web-based tool had 5 million unique visitors and 70 million sessions. Three developers set up the basic version in less than one week. Early on, data was screen scraped. An iterative process improved timeliness by adding more fine-grained data sources. A collaborative online table editor enabled near real-time corrections. Alerting was setup for errors, and statistics apply for sanity checking. Static site generation and a content delivery network help to serve large user loads in a timely manner. The flexible design allowed to iteratively integrate more complex statistics based on expert knowledge built on top of the collected data and secondary data sources such as ICU beds and citizen movement
- …