367 research outputs found
Web Caching and Prefetching with Cyclic Model Analysis of Web Object Sequences
Web caching is the process in which web objects are temporarily stored to reduce bandwidth consumption, server load and latency. Web prefetching is the process of fetching web objects from the server before they are actually requested by the client. Integration of caching and prefetching can be very beneficial as the two techniques can support each other. By implementing this integrated scheme in a client-side proxy, the perceived latency can be reduced for not one but many users. In this paper, we propose a new integrated caching and prefetching policy called the WCP-CMA which makes use of a profit-driven caching policy that takes into account the periodicity and cyclic behaviour of the web access sequences for deriving prefetching rules. Our experimental results have shown a 10%-15% increase in the hit ratios of the cached objects and 5%-10% decrease in delay compared to the existing schem
Web Proxy Cache Replacement Policies Using Decision Tree (DT) Machine Learning Technique for Enhanced Performance of Web Proxy
Web cache is a mechanism for the temporary storage (caching) of web documents, such as HTML pages and images, to reduce bandwidth usage, server load, and perceived lag. A web cache stores the copies of documents passing through it and any subsequent requests may be satisfied from the cache if certain conditions are met. In this paper, Decision Tree (DT ) a machine learning technique has been used to increase the performance of traditional Web proxy caching policies such as SIZE, and Hybrid. Decision Tree (DT) is used and integrated with traditional Web proxy caching techniques to form better caching approaches known as DT - SIZE and DT - Hybrid. The proposed approaches are evaluated by trace - driven simulation and compared with traditional Web proxy caching techniques. Experimental results have revealed that the proposed DT - SIZE and DT - Hybrid significantly increased Pure Hit - Ratio, Byte Hit - Ratio and reduced the latency when compared with SIZE and Hybrid
Active caching for recommender systems
Web users are often overwhelmed by the amount of information available while carrying out browsing and searching tasks. Recommender systems substantially reduce the information overload by suggesting a list of similar documents that users might find interesting. However, generating these ranked lists requires an enormous amount of resources that often results in access latency. Caching frequently accessed data has been a useful technique for reducing stress on limited resources and improving response time. Traditional passive caching techniques, where the focus is on answering queries based on temporal locality or popularity, achieve a very limited performance gain. In this dissertation, we are proposing an ‘active caching’ technique for recommender systems as an extension of the caching model. In this approach estimation is used to generate an answer for queries whose results are not explicitly cached, where the estimation makes use of the partial order lists cached for related queries. By answering non-cached queries along with cached queries, the active caching system acts as a form of query processor and offers substantial improvement over traditional caching methodologies. Test results for several data sets and recommendation techniques show substantial improvement in the cache hit rate, byte hit rate and CPU costs, while achieving reasonable recall rates. To ameliorate the performance of proposed active caching solution, a shared neighbor similarity measure is introduced which improves the recall rates by eliminating the dependence on monotinicity in the partial order lists. Finally, a greedy balancing cache selection policy is also proposed to select most appropriate data objects for the cache that help to improve the cache hit rate and recall further
Web pre-fetching schemes using Machine Learning for Mobile Cloud Computing
Pre-fetching is one of the technologies used in reducing latency on network traffic on the Internet. We propose this technology to utilise Mobile Cloud Computing (MCC) environment to handle latency issues in context of data management. However, overaggressive use of the pre-fetching technique causes overhead and slows down the system performance since pre-fetching the wrong objects data wastes the storage capacity of a mobile device. Many studies have been using Machine Learning (ML) to solve such issues. However, in MCC environment, the pre-fetching using ML is not widely used. Therefore, this research aims to implement ML techniques to classify the web objects that require decision rules. These decision rules are generated using few ML algorithms such as J48, Random Tree (RT), Naive Bayes (NB) and Rough Set (RS).These rules represent the characteristics of the input data accordingly. The experimental results reveal that J48 performs well in classifying the web objects for all three different datasets with testing accuracy of 95.49%, 98.28% and 97.9% for the UTM blog data, IRCache, and Proxy Cloud Computing (CC) datasets respectively. It shows that J48 algorithm is capable to handle better cloud data management with good recommendation to users with or without the cloud storage
Topics in Power Usage in Network Services
The rapid advance of computing technology has created a world powered
by millions of computers. Often these computers are idly consuming energy
unnecessarily in spite of all the efforts of hardware manufacturers. This thesis
examines proposals to determine when to power down computers without
negatively impacting on the service they are used to deliver, compares and
contrasts the efficiency of virtualisation with containerisation, and investigates
the energy efficiency of the popular cryptocurrency Bitcoin.
We begin by examining the current corpus of literature and defining the key
terms we need to proceed.
Then we propose a technique for improving the energy consumption of servers
by moving them into a sleep state and employing a low powered device to act
as a proxy in its place.
After this we move on to investigate the energy efficiency of virtualisation and
compare the energy efficiency of two of the most common means used to do
this.
Moving on from this we look at the cryptocurrency Bitcoin. We consider the
energy consumption of bitcoin mining and if this compared with the value of
bitcoin makes this profitable.
Finally we conclude by summarising the results and findings of this thesis.
This work increases our understanding of some of the challenges of energy
efficient computation as well as proposing novel mechanisms to save energy
Techniques of data prefetching, replication, and consistency in the Internet
Internet has become a major infrastructure for information sharing in our daily life, and indispensable to critical and large applications in industry, government, business, and education. Internet bandwidth (or the network speed to transfer data) has been dramatically increased, however, the latency time (or the delay to physically access data) has been reduced in a much slower pace. The rich bandwidth and lagging latency can be effectively coped with in Internet systems by three data management techniques: caching, replication, and prefetching. The focus of this dissertation is to address the latency problem in Internet by utilizing the rich bandwidth and large storage capacity for efficiently prefetching data to significantly improve the Web content caching performance, by proposing and implementing scalable data consistency maintenance methods to handle Internet Web address caching in distributed name systems (DNS), and to handle massive data replications in peer-to-peer systems. While the DNS service is critical in Internet, peer-to-peer data sharing is being accepted as an important activity in Internet.;We have made three contributions in developing prefetching techniques. First, we have proposed an efficient data structure for maintaining Web access information, called popularity-based Prediction by Partial Matching (PB-PPM), where data are placed and replaced guided by popularity information of Web accesses, thus only important and useful information is stored. PB-PPM greatly reduces the required storage space, and improves the prediction accuracy. Second, a major weakness in existing Web servers is that prefetching activities are scheduled independently of dynamically changing server workloads. Without a proper control and coordination between the two kinds of activities, prefetching can negatively affect the Web services and degrade the Web access performance. to address this problem, we have developed a queuing model to characterize the interactions. Guided by the model, we have designed a coordination scheme that dynamically adjusts the prefetching aggressiveness in Web Servers. This scheme not only prevents the Web servers from being overloaded, but it can also minimize the average server response time. Finally, we have proposed a scheme that effectively coordinates the sharing of access information for both proxy and Web servers. With the support of this scheme, the accuracy of prefetching decisions is significantly improved.;Regarding data consistency support for Internet caching and data replications, we have conducted three significant studies. First, we have developed a consistency support technique to maintain the data consistency among the replicas in structured P2P networks. Based on Pastry, an existing and popular P2P system, we have implemented this scheme, and show that it can effectively maintain consistency while prevent hot-spot and node-failure problems. Second, we have designed and implemented a DNS cache update protocol, called DNScup, to provide strong consistency for domain/IP mappings. Finally, we have developed a dynamic lease scheme to timely update the replicas in Internet
Entrega de conteúdos multimédia em over-the-top: caso de estudo das gravações automáticas
Doutoramento em Engenharia EletrotécnicaOver-The-Top (OTT) multimedia delivery is a very appealing approach for providing
ubiquitous,
exible, and globally accessible services capable of low-cost
and unrestrained device targeting. In spite of its appeal, the underlying delivery
architecture must be carefully planned and optimized to maintain a high Qualityof-
Experience (QoE) and rational resource usage, especially when migrating from
services running on managed networks with established quality guarantees. To address
the lack of holistic research works on OTT multimedia delivery systems, this
Thesis focuses on an end-to-end optimization challenge, considering a migration
use-case of a popular Catch-up TV service from managed IP Television (IPTV)
networks to OTT. A global study is conducted on the importance of Catch-up
TV and its impact in today's society, demonstrating the growing popularity of
this time-shift service, its relevance in the multimedia landscape, and tness as
an OTT migration use-case. Catch-up TV consumption logs are obtained from
a Pay-TV operator's live production IPTV service containing over 1 million subscribers
to characterize demand and extract insights from service utilization at a
scale and scope not yet addressed in the literature. This characterization is used
to build demand forecasting models relying on machine learning techniques to enable
static and dynamic optimization of OTT multimedia delivery solutions, which
are able to produce accurate bandwidth and storage requirements' forecasts, and
may be used to achieve considerable power and cost savings whilst maintaining a
high QoE. A novel caching algorithm, Most Popularly Used (MPU), is proposed,
implemented, and shown to outperform established caching algorithms in both
simulation and experimental scenarios. The need for accurate QoE measurements
in OTT scenarios supporting HTTP Adaptive Streaming (HAS) motivates the creation
of a new QoE model capable of taking into account the impact of key HAS
aspects. By addressing the complete content delivery pipeline in the envisioned
content-aware OTT Content Delivery Network (CDN), this Thesis demonstrates
that signi cant improvements are possible in next-generation multimedia delivery
solutions.A entrega de conteúdos multimédia em Over-The-Top (OTT) e uma proposta
atractiva para fornecer um serviço flexível e globalmente acessível, capaz de alcançar qualquer dispositivo, com uma promessa de baixos custos. Apesar das suas vantagens, e necessario um planeamento arquitectural detalhado e optimizado para manter níveis elevados de Qualidade de Experiência (QoE), em particular aquando da migração dos serviços suportados em redes geridas com garantias de qualidade pré-estabelecidas. Para colmatar a falta de trabalhos de investigação na área de sistemas de entrega de conteúdos multimédia em OTT, esta Tese foca-se na optimização destas soluções como um todo, partindo do caso de uso de migração de um serviço popular de Gravações Automáticas suportado em redes de Televisão sobre IP (IPTV) geridas, para um cenário de entrega em OTT. Um estudo global para aferir a importância das Gravações Automáticas revela a sua relevância no panorama de serviços multimédia e a sua adequação enquanto caso de uso de
migração para cenários OTT. São obtidos registos de consumos de um serviço
de produção de Gravações Automáticas, representando mais de 1 milhão de assinantes,
para caracterizar e extrair informação de consumos numa escala e âmbito
não contemplados ate a data na literatura. Esta caracterização e utilizada para
construir modelos de previsão de carga, tirando partido de sistemas de machine
learning, que permitem optimizações estáticas e dinâmicas dos sistemas de entrega
de conteúdos em OTT através de previsões das necessidades de largura de banda e
armazenamento, potenciando ganhos significativos em consumo energético e custos.
Um novo mecanismo de caching, Most Popularly Used (MPU), demonstra um
desempenho superior as soluções de referencia, quer em cenários de simulação quer
experimentais. A necessidade de medição exacta da QoE em streaming adaptativo
HTTP motiva a criaçao de um modelo capaz de endereçar aspectos específicos
destas tecnologias adaptativas. Ao endereçar a cadeia completa de entrega através
de uma arquitectura consciente dos seus conteúdos, esta Tese demonstra que são
possíveis melhorias de desempenho muito significativas nas redes de entregas de
conteúdos em OTT de próxima geração
Workload characterization and customer interaction at e-commerce web servers
Electronic commerce servers have a significant presence in today's Internet. Corporations want to maintain high availability, sufficient capacity, and satisfactory performance for their E-commerce Web systems, and want to provide satisfactory services to customers. Workload characterization and the analysis of customers' interactions with Web sites are the bases upon which to analyze server performance, plan system capacity, manage system resources, and personalize services at the Web site. To date, little empirical evidence has been discovered that identifies the characteristics for Web workloads of E-commerce systems and the behaviours of customers.
This thesis analyzes the Web access logs at public Web sites for three organizations: a car rental company, an IT company, and the Computer
Science department of the University of Saskatchewan. In these case studies, the
characteristics of Web workloads are explored at the request level, functionlevel, resource level, and session level; customers' interactions
with Web sites are analyzed by identifying
and characterizing session groups.
The main E-commerce Web workload characteristics and performance implications are: i) The requests for dynamic Web objects are an important
part of the workload. These requests should be characterized separately since the system processes them differently; ii) Some popular image files, which are embedded in the same Web page, are always requested together. If these files are requested and sent in a bundle, a system will greatly reduce the overheads in processing requests for these files; iii) The
percentage of requests for each Web page category tends to be stable in the workload when the time scale is large enough. This observation is helpful in forecasting workload composition; iv) the Secure Socket Layer protocol (SSL) is heavily used and most Web objects are either requested primarily through SSL or primarily not through SSL; and v) Session groups of different characteristics are identified for all logs. The analysis of session groups may be helpful in improving system performance, maximizing revenue throughput of the system, providing better services to customers, and managing and planning system resources.
A hybrid clustering algorithm, which is a combination of the minimum spanning tree method and k-means clustering algorithm, is proposed to identify session clusters. Session clusters obtained using the three session representations
Pages Requested, Navigation Pattern, and Resource Usage are similar enough so that it is possible to use different session representations interchangeably to produce similar groupings. The grouping based on one session representation is believed to be sufficient to answer questions in server performance, resource management, capacity planning and Web site personalization, which previously would have required multiple different groupings. Grouping by Pages Requested is recommended since it is the simplest and data on Web pages requested is relatively easy to obtain in HTTP logs
Recommended from our members
On Applications of Relational Data
With the advances of technology and the popularity of the Internet, a large amount of data is being generated and collected. Much of these data is relational data, which describe how people and things, or entities, are related to one another. For example, data from sale transactions on e-commerce websites tell us which customers buy or view which products. Analyzing the known relationships from relational data can help us to discover knowledge that can benefit businesses, organizations, and our lives. For instance, learning the products that are commonly bought together allows businesses to recommend products to customers and increase their sales. Hidden or new relationships can also be inferred based on relational data. In addition, based on the connections among the entities, we can approximate the level of relatedness between two entities, even though their relationship may be hard to observe or quantify.
This research aims to explore novel applications of relational data that will help to improve our life in various aspects, such as improving business operations, improving experiences in using online services, and improving health care services. In applying relational data in any domain, there are two common challenges. First, the size of the data can be massive, but many applications require that results are obtained within a short time. Second, relational data are often noisy and incomplete. Many relationships are extracted automatically from text resources, and hence they are prone to errors. Our goal is not only to propose novel applications of relational data but also to develop techniques and algorithms that will facilitate and make such applications practical. This work addresses three novel applications of relational data. The first application is to use relational data to improve user experiences in online video sharing services. Second, we propose the use of relational data to find entities that are closely related to one another. Such problems arise in various domains, such as product recommendation and query suggestion. Third, we propose the use of relational data to assist medical practitioners in drug prescription. For these applications, we introduce several techniques and algorithms to address the aforementioned challenges in using relational data. Our approaches are evaluated extensively to demonstrate their effectiveness. The approaches proposed in this work not only can be used in the specific applications we discuss but also can help to facilitate and promote the use of relational data in other application domains
- …