12 research outputs found
Stochastic Dynamic Cache Partitioning for Encrypted Content Delivery
In-network caching is an appealing solution to cope with the increasing
bandwidth demand of video, audio and data transfer over the Internet.
Nonetheless, an increasing share of content delivery services adopt encryption
through HTTPS, which is not compatible with traditional ISP-managed approaches
like transparent and proxy caching. This raises the need for solutions
involving both Internet Service Providers (ISP) and Content Providers (CP): by
design, the solution should preserve business-critical CP information (e.g.,
content popularity, user preferences) on the one hand, while allowing for a
deeper integration of caches in the ISP architecture (e.g., in 5G femto-cells)
on the other hand.
In this paper we address this issue by considering a content-oblivious
ISP-operated cache. The ISP allocates the cache storage to various content
providers so as to maximize the bandwidth savings provided by the cache: the
main novelty lies in the fact that, to protect business-critical information,
ISPs only need to measure the aggregated miss rates of the individual CPs and
do not need to be aware of the objects that are requested, as in classic
caching. We propose a cache allocation algorithm based on a perturbed
stochastic subgradient method, and prove that the algorithm converges close to
the allocation that maximizes the overall cache hit rate. We use extensive
simulations to validate the algorithm and to assess its convergence rate under
stationary and non-stationary content popularity. Our results (i) testify the
feasibility of content-oblivious caches and (ii) show that the proposed
algorithm can achieve within 10\% from the global optimum in our evaluation
ReFIoV: a novel reputation framework for information-centric vehicular applications
In this article, a novel reputation framework for information-centric vehicular applications leveraging on machine learning and the artificial immune system (AIS), also known as ReFIoV, is proposed. Specifically, Bayesian learning and classification allow each node to learn as newly observed data of the behavior of other nodes become available and hence classify these nodes, meanwhile, the K-Means clustering algorithm allows to integrate recommendations from other nodes even if they behave in an unpredictable manner. AIS is used to enhance misbehavior detection. The proposed ReFIoV can be implemented in a distributed manner as each node decides with whom to interact. It provides incentives for nodes to cache and forward others’ mobile data as well as achieves robustness against false accusations and praise. The performance evaluation shows that ReFIoV outperforms state-of-the-art reputation systems for the metrics considered. That is, it presents a very low number of misbehaving nodes incorrectly classified in comparison to another reputation scheme. The proposed AIS mechanism presents a low overhead. The incorporation of recommendations enabled the framework to reduce even further detection time
Impact of Traffic Characteristics on Request Aggregation in an NDN Router
The paper revisits the performance evaluation of caching in a Named Data
Networking (NDN) router where the content store (CS) is supplemented by a
pending interest table (PIT). The PIT aggregates requests for a given content
that arrive within the download delay and thus brings an additional reduction
in upstream bandwidth usage beyond that due to CS hits. We extend prior work on
caching with non-zero download delay (non-ZDD) by proposing a novel
mathematical framework that is more easily applicable to general traffic models
and by considering alternative cache insertion policies. Specifically we
evaluate the use of an LRU filter to improve CS hit rate performance in this
non-ZDD context. We also consider the impact of time locality in demand due to
finite content lifetimes. The models are used to quantify the impact of the PIT
on upstream bandwidth reduction, demonstrating notably that this is significant
only for relatively small content catalogues or high average request rate per
content. We further explore how the effectiveness of the filter with finite
content lifetimes depends on catalogue size and traffic intensity
On the design of efficient caching systems
Content distribution is currently the prevalent Internet use case, accounting for the majority of global Internet traffic and growing exponentially. There is general consensus that the most effective method to deal with the large amount of content demand is through the deployment of massively distributed caching infrastructures as the means to localise content delivery traffic. Solutions based on caching have been already widely deployed through Content Delivery Networks. Ubiquitous caching is also a fundamental aspect of the emerging Information-Centric Networking paradigm which aims to rethink the current Internet architecture for long term evolution. Distributed content caching systems are expected to grow substantially in the future, in terms of both footprint and traffic carried and, as such, will become substantially more complex and costly. This thesis addresses the problem of designing scalable and cost-effective distributed caching systems that will be able to efficiently support the expected massive growth of content traffic and makes three distinct contributions. First, it produces an extensive theoretical characterisation of sharding, which is a widely used technique to allocate data items to resources of a distributed system according to a hash function. Based on the findings unveiled by this analysis, two systems are designed contributing to the abovementioned objective. The first is a framework and related algorithms for enabling efficient load-balanced content caching. This solution provides qualitative advantages over previously proposed solutions, such as ease of modelling and availability of knobs to fine-tune performance, as well as quantitative advantages, such as 2x increase in cache hit ratio and 19-33% reduction in load imbalance while maintaining comparable latency to other approaches. The second is the design and implementation of a caching node enabling 20 Gbps speeds based on inexpensive commodity hardware. We believe these contributions advance significantly the state of the art in distributed caching systems
Load Imbalance and Caching Performance of Sharded Systems
Sharding is a method for allocating data items to nodes of a distributed caching or storage system based on the result of a hash function computed on the item’s identifier. It is ubiquitously used in key-value stores, CDNs and many other applications. Despite considerable work that has focused on the design and implementation of such systems, there is limited understanding of their performance in realistic operational conditions from a theoretical standpoint. In this paper we fill this gap by providing a thorough modeling of sharded caching systems, focusing particularly on load balancing and caching performance aspects. Our analysis provides important insights that can be applied to optimize the design and configuration of sharded caching systems
Entrega de conteúdos multimédia em over-the-top: caso de estudo das gravações automáticas
Doutoramento em Engenharia EletrotécnicaOver-The-Top (OTT) multimedia delivery is a very appealing approach for providing
ubiquitous,
exible, and globally accessible services capable of low-cost
and unrestrained device targeting. In spite of its appeal, the underlying delivery
architecture must be carefully planned and optimized to maintain a high Qualityof-
Experience (QoE) and rational resource usage, especially when migrating from
services running on managed networks with established quality guarantees. To address
the lack of holistic research works on OTT multimedia delivery systems, this
Thesis focuses on an end-to-end optimization challenge, considering a migration
use-case of a popular Catch-up TV service from managed IP Television (IPTV)
networks to OTT. A global study is conducted on the importance of Catch-up
TV and its impact in today's society, demonstrating the growing popularity of
this time-shift service, its relevance in the multimedia landscape, and tness as
an OTT migration use-case. Catch-up TV consumption logs are obtained from
a Pay-TV operator's live production IPTV service containing over 1 million subscribers
to characterize demand and extract insights from service utilization at a
scale and scope not yet addressed in the literature. This characterization is used
to build demand forecasting models relying on machine learning techniques to enable
static and dynamic optimization of OTT multimedia delivery solutions, which
are able to produce accurate bandwidth and storage requirements' forecasts, and
may be used to achieve considerable power and cost savings whilst maintaining a
high QoE. A novel caching algorithm, Most Popularly Used (MPU), is proposed,
implemented, and shown to outperform established caching algorithms in both
simulation and experimental scenarios. The need for accurate QoE measurements
in OTT scenarios supporting HTTP Adaptive Streaming (HAS) motivates the creation
of a new QoE model capable of taking into account the impact of key HAS
aspects. By addressing the complete content delivery pipeline in the envisioned
content-aware OTT Content Delivery Network (CDN), this Thesis demonstrates
that signi cant improvements are possible in next-generation multimedia delivery
solutions.A entrega de conteúdos multimédia em Over-The-Top (OTT) e uma proposta
atractiva para fornecer um serviço flexível e globalmente acessível, capaz de alcançar qualquer dispositivo, com uma promessa de baixos custos. Apesar das suas vantagens, e necessario um planeamento arquitectural detalhado e optimizado para manter níveis elevados de Qualidade de Experiência (QoE), em particular aquando da migração dos serviços suportados em redes geridas com garantias de qualidade pré-estabelecidas. Para colmatar a falta de trabalhos de investigação na área de sistemas de entrega de conteúdos multimédia em OTT, esta Tese foca-se na optimização destas soluções como um todo, partindo do caso de uso de migração de um serviço popular de Gravações Automáticas suportado em redes de Televisão sobre IP (IPTV) geridas, para um cenário de entrega em OTT. Um estudo global para aferir a importância das Gravações Automáticas revela a sua relevância no panorama de serviços multimédia e a sua adequação enquanto caso de uso de
migração para cenários OTT. São obtidos registos de consumos de um serviço
de produção de Gravações Automáticas, representando mais de 1 milhão de assinantes,
para caracterizar e extrair informação de consumos numa escala e âmbito
não contemplados ate a data na literatura. Esta caracterização e utilizada para
construir modelos de previsão de carga, tirando partido de sistemas de machine
learning, que permitem optimizações estáticas e dinâmicas dos sistemas de entrega
de conteúdos em OTT através de previsões das necessidades de largura de banda e
armazenamento, potenciando ganhos significativos em consumo energético e custos.
Um novo mecanismo de caching, Most Popularly Used (MPU), demonstra um
desempenho superior as soluções de referencia, quer em cenários de simulação quer
experimentais. A necessidade de medição exacta da QoE em streaming adaptativo
HTTP motiva a criaçao de um modelo capaz de endereçar aspectos específicos
destas tecnologias adaptativas. Ao endereçar a cadeia completa de entrega através
de uma arquitectura consciente dos seus conteúdos, esta Tese demonstra que são
possíveis melhorias de desempenho muito significativas nas redes de entregas de
conteúdos em OTT de próxima geração
Using Context to Improve Network-based Exploit Kit Detection
Today, our computers are routinely compromised while performing seemingly innocuous activities like reading articles on trusted websites (e.g., the NY Times). These compromises are perpetrated via complex interactions involving the advertising networks that monetize these sites. Web-based compromises such as exploit kits are similar to any other scam -- the attacker wants to lure an unsuspecting client into a trap to steal private information, or resources -- generating 10s of millions of dollars annually. Exploit kits are web-based services specifically designed to capitalize on vulnerabilities in unsuspecting client computers in order to install malware without a user's knowledge. Sadly, it only takes a single successful infection to ruin a user's financial life, or lead to corporate breaches that result in millions of dollars of expense and loss of customer trust. Exploit kits use a myriad of techniques to obfuscate each attack instance, making current network-based defenses such as signature-based network intrusion detection systems far less effective than in years past. Dynamic analysis or honeyclient analysis on these exploits plays a key role in identifying new attacks for signature generation, but provides no means of inspecting end-user traffic on the network to identify attacks in real time. As a result, defenses designed to stop such malfeasance often arrive too late or not at all resulting in high false positive and false negative (error) rates. In order to deal with these drawbacks, three new detection approaches are presented. To deal with the issue of a high number of errors, a new technique for detecting exploit kit interactions on a network is proposed. The technique capitalizes on the fact that an exploit kit leads its potential victim through a process of exploitation by forcing the browser to download multiple web resources from malicious servers. This process has an inherent structure that can be captured in HTTP traffic and used to significantly reduce error rates. The approach organizes HTTP traffic into tree-like data structures, and, using a scalable index of exploit kit traces as samples, models the detection process as a subtree similarity search problem. The technique is evaluated on 3,800 hours of web traffic on a large enterprise network, and results show that it reduces false positive rates by four orders of magnitude over current state-of-the-art approaches. While utilizing structure can vastly improve detection rates over current approaches, it does not go far enough in helping defenders detect new, previously unseen attacks. As a result, a new framework that applies dynamic honeyclient analysis directly on network traffic at scale is proposed. The framework captures and stores a configurable window of reassembled HTTP objects network wide, uses lightweight content rendering to establish the chain of requests leading up to a suspicious event, then serves the initial response content back to the honeyclient in an isolated network. The framework is evaluated on a diverse collection of exploit kits as they evolve over a 1 year period. The empirical evaluation suggests that the approach offers significant operational value, and a single honeyclient can support a campus deployment of thousands of users. While the above approaches attempt to detect exploit kits before they have a chance to infect the client, they cannot protect a client that has already been infected. The final technique detects signs of post infection behavior by intrusions that abuses the domain name system (DNS) to make contact with an attacker. Contemporary detection approaches utilize the structure of a domain name and require hundreds of DNS messages to detect such malware. As a result, these detection mechanisms cannot detect malware in a timely manner and are susceptible to high error rates. The final technique, based on sequential hypothesis testing, uses the DNS message patterns of a subset of DNS traffic to detect malware in as little as four DNS messages, and with orders of magnitude reduction in error rates. The results of this work can make a significant operational impact on network security analysis, and open several exciting future directions for network security research.Doctor of Philosoph