Search CORE

10 research outputs found

Extending Proxy Caching Capability: Issues and Performance

Author: A. Datta
D. Wessels
Farokh Bastani
I-Ling Yen
Ing-Ray Chen
J. Roy
Jiang He
Jicheng Fu
L. Fan
M. Arlitt
P. Mohapatra
Wei Hao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Congestion-aware caching and search in information-centric Networks

Author: Anand Seetharam
Jim Kurose
Mikhail Badov
Soumendra Nanda
Victor Firoiu
Publication venue
Publication date: 01/01/2015
Field of study

ABSTRACT The performance of in-network caching in informationcentric networks, and of cache networks more generally, is typically characterized by network-centric performance metrics such as hit rate and hop count, with approaches to locating and caching content evaluated and optimized for these metrics. We believe that user-centric performance metrics, in particular the delay from when a content request is made by the user to the time at which the requested content has been completely downloaded, are also important. For such metrics, performance is often determined by link capacity constraints and network congestion. We investigate network cache management and search policies that account for path-level (content-server to content-requestor) congestion and file popularity in order to directly minimize user-centric, content-download delay. Through simulation, we find that our policies yield significantly better download delay performance than existing policies, even though these existing policies provide better performance according to traditional metrics such as cache hit rate and hop count

CiteSeerX

Content Caching and Delivery over Heterogeneous Wireless Networks

Author: Diggavi Suhas
Hachem Jad
Karamchandani Nikhil
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/03/2015
Field of study

Emerging heterogeneous wireless architectures consist of a dense deployment of local-coverage wireless access points (APs) with high data rates, along with sparsely-distributed, large-coverage macro-cell base stations (BS). We design a coded caching-and-delivery scheme for such architectures that equips APs with storage, enabling content pre-fetching prior to knowing user demands. Users requesting content are served by connecting to local APs with cached content, as well as by listening to a BS broadcast transmission. For any given content popularity profile, the goal is to design the caching-and-delivery scheme so as to optimally trade off the transmission cost at the BS against the storage cost at the APs and the user cost of connecting to multiple APs. We design a coded caching scheme for non-uniform content popularity that dynamically allocates user access to APs based on requested content. We demonstrate the approximate optimality of our scheme with respect to information-theoretic bounds. We numerically evaluate it on a YouTube dataset and quantify the trade-off between transmission rate, storage, and access cost. Our numerical results also suggest the intriguing possibility that, to gain most of the benefits of coded caching, it suffices to divide the content into a small number of popularity classes.Comment: A shorter version is to appear in IEEE INFOCOM 201

arXiv.org e-Print Archive

Crossref

On Optimal Caching and Model Multiplexing for Large Model Inference

Author: Barrett Clark
Jiao Jiantao
Jordan Michael I.
Sheng Ying
Zheng Lianmin
Zhu Banghua
Publication venue
Publication date: 03/06/2023
Field of study

Large Language Models (LLMs) and other large foundation models have achieved noteworthy success, but their size exacerbates existing resource consumption and latency challenges. In particular, the large-scale deployment of these models is hindered by the significant resource requirements during inference. In this paper, we study two approaches for mitigating these challenges: employing a cache to store previous queries and learning a model multiplexer to choose from an ensemble of models for query processing. Theoretically, we provide an optimal algorithm for jointly optimizing both approaches to reduce the inference cost in both offline and online tabular settings. By combining a caching algorithm, namely Greedy Dual Size with Frequency (GDSF) or Least Expected Cost (LEC), with a model multiplexer, we achieve optimal rates in both offline and online settings. Empirically, simulations show that the combination of our caching and model multiplexing algorithms greatly improves over the baselines, with up to

50\times

improvement over the baseline when the ratio between the maximum cost and minimum cost is

100

. Experiments on real datasets show a

4.3\times

improvement in FLOPs over the baseline when the ratio for FLOPs is

10

, and a

1.8\times

improvement in latency when the ratio for average latency is

1.85

arXiv.org e-Print Archive

Popularity-aware greedy dual-size Web proxy caching algorithms

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

Crossref

Modeling and acceleration of content delivery in world wide web

Author: YUAN JUNLI
Publication venue
Publication date: 24/11/2005
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Estudio, análisis y desarrollo de una red de distribución de contenido y su algoritmo de redirección de usuarios para servicios web y streaming

Author: Molina Moreno Benjamin
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 02/09/2013
Field of study

Esta tesis se ha creado en el marco de la línea de investigación de Mecanismos de Distribución de Contenidos en Redes IP, que ha desarrollado su actividad en diferentes proyectos de investigación y en la asignatura ¿Mecanismos de Distribución de Contenidos en Redes IP¿ del programa de doctorado ¿Telecomunicaciones¿ impartido por el Departamento de Comunicaciones de la UPV y, actualmente en el Máster Universitario en Tecnologías, Sistemas y Redes de Comunicación. El crecimiento de Internet es ampliamente conocido, tanto en número de clientes como en tráfico generado. Esto permite acercar a los clientes una interfaz multimedia, donde pueden concurrir datos, voz, video, música, etc. Si bien esto representa una oportunidad de negocio desde múltiples dimensiones, se debe abordar seriamente el aspecto de la escalabilidad, que pretende que el rendimiento medio de un sistema no se vea afectado conforme aumenta el número de clientes o el volumen de información solicitada. El estudio y análisis de la distribución de contenido web y streaming empleando CDNs es el objeto de este proyecto. El enfoque se hará desde una perspectiva generalista, ignorando soluciones de capa de red como IP multicast, así como la reserva de recursos, al no estar disponibles de forma nativa en la infraestructura de Internet. Esto conduce a la introducción de la capa de aplicación como marco coordinador en la distribución de contenido. Entre estas redes, también denominadas overlay networks, se ha escogido el empleo de una Red de Distribución de Contenido (CDN, Content Delivery Network). Este tipo de redes de nivel de aplicación son altamente escalables y permiten un control total sobre los recursos y funcionalidad de todos los elementos de su arquitectura. Esto permite evaluar las prestaciones de una CDN que distribuya contenidos multimedia en términos de: ancho de banda necesario, tiempo de respuesta obtenido por los clientes, calidad percibida, mecanismos de distribución, tiempo de vida al utilizar caching, etc. Las CDNs nacieron a finales de la década de los noventa y tenían como objetivo principal la eliminación o atenuación del denominado efecto flash-crowd, originado por una afluencia masiva de clientes. Actualmente, este tipo de redes está orientando la mayor parte de sus esfuerzos a la capacidad de ofrecer streaming media sobre Internet. Para un análisis minucioso, esta tesis propone un modelo inicial de CDN simplificado, tanto a nivel teórico como práctico. En el aspecto teórico se expone un modelo matemático que permite evaluar analíticamente una CDN. Este modelo introduce una complejidad considerable conforme se introducen nuevas funcionalidades, por lo que se plantea y desarrolla un modelo de simulación que permite por un lado, comprobar la validez del entorno matemático y, por otro lado, establecer un marco comparativo para la implementación práctica de la CDN, tarea que se realiza en la fase final de la tesis. De esta forma, los resultados obtenidos abarcan el ámbito de la teoría, la simulación y la práctica.Molina Moreno, B. (2013). Estudio, análisis y desarrollo de una red de distribución de contenido y su algoritmo de redirección de usuarios para servicios web y streaming [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/31637TESI

RiuNet