508 research outputs found

    Adaptive and secured resource management in distributed and Internet systems

    Get PDF
    The effectiveness of computer system resource management has been always determined by two major factors: (1) workload demands and management objectives, (2) the updates of the computer technology. These two factors are dynamically changing, and resource management systems must be timely adaptive to the changes. This dissertation attempts to address several important and related resource management issues.;We first study memory system utilization in centralized servers by improving memory performance of sorting algorithms, which provides fundamental understanding on memory system organizations and its performance optimizations for data-intensive workloads. to reduce different types of cache misses, we restructure the mergesort and quicksort algorithms by integrating tiling, padding, and buffering techniques and by repartitioning the data set. Our study shows substantial performance improvements from our new methods.;We have further extended the work to improve load sharing for utilizing global memory resources in distributed systems. Aiming at reducing the memory resource contention caused by page faults and I/O activities, we have developed and examined load sharing policies by considering effective usage of global memory in addition to CPU load balancing in both homogeneous and heterogeneous clusters.;Extending our research from clusters to Internet systems, we have further investigated memory and storage utilizations in Web caching systems. We have proposed several novel management schemes to restructure and decentralize the existing caching system by exploiting data locality at different levels of the global memory hierarchy and by effectively sharing data objects among the clients and their proxy caches.;Data integrity and communication anonymity issues are raised from our decentralized Web caching system design, which are also security concerns for general peer-to-peer systems. We propose an integrity protocol to ensure data integrity, and several protocols to achieve mutual communication anonymity between an information requester and a provider.;The potential impact and contributions of this dissertation are briefly stated as follows: (1) two major research topics identified in this dissertation are fundamentally important for the growth and development of information technology, and will continue to be demanding topics for a long term. (2) Our proposed cache-effective sorting methods bridge a serious gap between analytical complexity of algorithms and their execution complexity in practice due to the increasingly deep memory hierarchy in computer systems. This approach can also be used to improve memory performance at different levels of the memory hierarchy, such as I/O and file systems. (3) Our load sharing principle of giving a high priority to the requests of data accesses in memory and I/Os timely adapts the technology changes and effectively responds to the increasing demand of data-intensive applications. (4) Our proposed decentralized Web caching framework and its resource management schemes present a comprehensive case study to examine the P2P model. Our results and experiences can be used for related and further studies in distributed computing. (5) The proposed data integrity and communication anonymity protocols address limits and weaknesses of existing ones, and place a solid foundation for us to continue our work in this important area

    A Literature Survey of Cooperative Caching in Content Distribution Networks

    Full text link
    Content distribution networks (CDNs) which serve to deliver web objects (e.g., documents, applications, music and video, etc.) have seen tremendous growth since its emergence. To minimize the retrieving delay experienced by a user with a request for a web object, caching strategies are often applied - contents are replicated at edges of the network which is closer to the user such that the network distance between the user and the object is reduced. In this literature survey, evolution of caching is studied. A recent research paper [15] in the field of large-scale caching for CDN was chosen to be the anchor paper which serves as a guide to the topic. Research studies after and relevant to the anchor paper are also analyzed to better evaluate the statements and results of the anchor paper and more importantly, to obtain an unbiased view of the large scale collaborate caching systems as a whole.Comment: 5 pages, 5 figure

    Content Distribution in P2P Systems

    Get PDF
    The report provides a literature review of the state-of-the-art for content distribution. The report's contributions are of threefold. First, it gives more insight into traditional Content Distribution Networks (CDN), their requirements and open issues. Second, it discusses Peer-to-Peer (P2P) systems as a cheap and scalable alternative for CDN and extracts their design challenges. Finally, it evaluates the existing P2P systems dedicated for content distribution according to the identied requirements and challenges

    Traffic analysis of Internet user behavior and content demand patterns

    Get PDF
    El estudio del trafico de internet es relevante para poder mejorar la calidad de servicio de los usuarios. Ser capaz de conocer cuales son los servicios más populares y las horas con más usuarios activos permite identificar la cantidad de tráfico producido y, por lo tanto, diseñar una red capaz de soportar la actividad esperada. La implementación de una red considerando este conocimiento puede reducir el tiempo de espera considerablemente, mejorando la experiencia de los usuarios en la web. Ya existen análisis del trafico de los usuarios y de sus patrones de demanda. Pero, los datos utilizados en estos estudios no han sido renovados, por lo tanto los resultados obtenidos pueden estar obsoletos y se han podido producir cambios importantes. En esta tesis, se estudia la cantidad de trafico entrante y saliente producido por diferentes aplicaciones y se ha hecho una evolución teniendo en cuenta datos presentes y pasados. Esto nos permitirá entender los cambios producidos desde 2007 hasta 2015 y observar las tendencias actuales. Además, se han analizado los patrones de demanda de usuarios del inicio de 2016 y se han comparado con resultados previos. La evolución del tráfico demuestra cambios en las preferencias de los usuarios, a pesar de que los patrones de demanda siguen siendo los mismos que en años anteriores. Los resultados obtenidos en esta tesis confirman las predicciones sobre un aumento del tráfico de 'Streaming Media'; se ha comprobado que el tráfico de 'Streaming Media' es el tráfico total dominante, con Netflix como el mayor contribuidor.L'estudi del trànsit d'Internet és rellevant per a poder millor la qualitat de servei dels usuaris. Ser capaç de conèixer quins són els serveis més popular i les hores amb més usuaris actius permet identificar la quantitat de trànsit produït i, per tant, dissenyar una xarxa capaç de soportar la activitat esperada. L'implementació d'una xarxa considerant aquest coneixement pot reduir el temps d'espera considerablement, millorant l'experiència dels usuaris a la web. Ja existeixen anàlisis del transit dels usuaris i els seus patrons de demanda. Però, les dades utilitzades en aquests estudis no han sigut renovades, per tant els resultats obtinguts poden estar obsolets i s'han produït canvis importants. En aquesta tesis, s'estudia la quantitat de transit entrant i sortint produit per diferents aplicacions i s'ha fet una evolució, tenint en compte dades presents i passades. Això ens permetrà entendre els canvis produïts des de 2007 fins 2015 i observar les tendències actuals. A més, s'han analitzat els patrons de demanda de usuaris de principis de 2016 i s'han comparat amb resultats previs. L'evolució del trànsit mostra canvis en las preferències dels usuaris, en canvi els patrons de demanda continuen sent els mateixos que en anys posteriors. Els resultats obtinguts en aquesta tesis confirmen les prediccions sobre un augment del trànsit de 'Streaming Media'; s'ha comprovat que el trànsit de 'Streaming Media' es el trànsit total dominant, amb Netflix com el major contribuïdor.The study of Internet traffic is relevant in order to improve the quality of service of users. Being able to know which are the most popular services and the hours with most active users can let us identify the amount of inbound and outbound traffic produced, and hence design a network able to support the activity expected. The implementation of a network considering that knowledge can reduce the waiting time of users considerably, improving the users’ experience in the web. Analysis of users’ traffic and user demand patterns already exist. However, the data used in these studies is not renewed, thus the results found can be obsolete and considerable changes would have happened. In this bachelor’s thesis, it is studied the amount of inbound and outbound traffic produced considering different applications and the evolution when regarding previous and actual data has been taken into account. This would let us understand the changes produced from 2007 to 2015 and observe the tendencies nowadays. In addition, it has been analyzed the user demand patterns in the beginning of 2016 and it has been contrasted with previous results. The evolution of traffic has shown changes in users’ preferences, although their demand patterns are still the same as previous years. The results found in this thesis confirmed the expectations about an increase of streaming media Internet traffic; it was proved that streaming media traffic is the dominant total traffic, with Netflix as the major contributor

    Characterizing Popularity Dynamics of User-generated Videos: A Category-based Study of YouTube

    Get PDF
    Understanding the growth pattern of content popularity has become a subject of immense interest to Internet service providers, content makers and on-line advertisers. This understanding is also important for the sustainable development of content distribution systems. As an approach to comprehend the characteristics of this growth pattern, a significant amount of research has been done in analyzing the popularity growth patterns of YouTube videos. Unfortunately, no work has been done that intensively investigates the popularity patterns of YouTube videos based on video object category. In this thesis, an in-depth analysis of the popularity pattern of YouTube videos is performed, considering the categories of videos. Metadata and request patterns were collected by employing category-specific YouTube crawlers. The request patterns were observed for a period of five months. Results confirm that the time varying popularity of di fferent YouTube categories are conspicuously diff erent, in spite of having sets of categories with very similar viewing patterns. In particular, News and Sports exhibit similar growth curves, as do Music and Film. While for some categories views at early ages can be used to predict future popularity, for some others predicting future popularity is a challenging task and require more sophisticated techniques, e.g., time-series clustering. The outcomes of these analyses are instrumental towards designing a reliable workload generator, which can be further used to evaluate diff erent caching policies for YouTube and similar sites. In this thesis, workload generators for four of the YouTube categories are developed. Performance of these workload generators suggest that a complete category-specific workload generator can be developed using time-series clustering. Patterns of users' interaction with YouTube videos are also analyzed from a dataset collected in a local network. This shows the possible ways of improving the performance of Peer-to-Peer video distribution technique along with a new video recommendation method

    Techniques of data prefetching, replication, and consistency in the Internet

    Get PDF
    Internet has become a major infrastructure for information sharing in our daily life, and indispensable to critical and large applications in industry, government, business, and education. Internet bandwidth (or the network speed to transfer data) has been dramatically increased, however, the latency time (or the delay to physically access data) has been reduced in a much slower pace. The rich bandwidth and lagging latency can be effectively coped with in Internet systems by three data management techniques: caching, replication, and prefetching. The focus of this dissertation is to address the latency problem in Internet by utilizing the rich bandwidth and large storage capacity for efficiently prefetching data to significantly improve the Web content caching performance, by proposing and implementing scalable data consistency maintenance methods to handle Internet Web address caching in distributed name systems (DNS), and to handle massive data replications in peer-to-peer systems. While the DNS service is critical in Internet, peer-to-peer data sharing is being accepted as an important activity in Internet.;We have made three contributions in developing prefetching techniques. First, we have proposed an efficient data structure for maintaining Web access information, called popularity-based Prediction by Partial Matching (PB-PPM), where data are placed and replaced guided by popularity information of Web accesses, thus only important and useful information is stored. PB-PPM greatly reduces the required storage space, and improves the prediction accuracy. Second, a major weakness in existing Web servers is that prefetching activities are scheduled independently of dynamically changing server workloads. Without a proper control and coordination between the two kinds of activities, prefetching can negatively affect the Web services and degrade the Web access performance. to address this problem, we have developed a queuing model to characterize the interactions. Guided by the model, we have designed a coordination scheme that dynamically adjusts the prefetching aggressiveness in Web Servers. This scheme not only prevents the Web servers from being overloaded, but it can also minimize the average server response time. Finally, we have proposed a scheme that effectively coordinates the sharing of access information for both proxy and Web servers. With the support of this scheme, the accuracy of prefetching decisions is significantly improved.;Regarding data consistency support for Internet caching and data replications, we have conducted three significant studies. First, we have developed a consistency support technique to maintain the data consistency among the replicas in structured P2P networks. Based on Pastry, an existing and popular P2P system, we have implemented this scheme, and show that it can effectively maintain consistency while prevent hot-spot and node-failure problems. Second, we have designed and implemented a DNS cache update protocol, called DNScup, to provide strong consistency for domain/IP mappings. Finally, we have developed a dynamic lease scheme to timely update the replicas in Internet

    RAFDA: A Policy-Aware Middleware Supporting the Flexible Separation of Application Logic from Distribution

    Get PDF
    Middleware technologies often limit the way in which object classes may be used in distributed applications due to the fixed distribution policies that they impose. These policies permeate applications developed using existing middleware systems and force an unnatural encoding of application level semantics. For example, the application programmer has no direct control over inter-address-space parameter passing semantics. Semantics are fixed by the distribution topology of the application, which is dictated early in the design cycle. This creates applications that are brittle with respect to changes in distribution. This paper explores technology that provides control over the extent to which inter-address-space communication is exposed to programmers, in order to aid the creation, maintenance and evolution of distributed applications. The described system permits arbitrary objects in an application to be dynamically exposed for remote access, allowing applications to be written without concern for distribution. Programmers can conceal or expose the distributed nature of applications as required, permitting object placement and distribution boundaries to be decided late in the design cycle and even dynamically. Inter-address-space parameter passing semantics may also be decided independently of object implementation and at varying times in the design cycle, again possibly as late as run-time. Furthermore, transmission policy may be defined on a per-class, per-method or per-parameter basis, maximizing plasticity. This flexibility is of utility in the development of new distributed applications, and the creation of management and monitoring infrastructures for existing applications.Comment: Submitted to EuroSys 200

    Ontwerp en evaluatie van content distributie netwerken voor multimediale streaming diensten.

    Get PDF
    Traditionele Internetgebaseerde diensten voor het verspreiden van bestanden, zoals Web browsen en het versturen van e-mails, worden aangeboden via één centrale server. Meer recente netwerkdiensten zoals interactieve digitale televisie of video-op-aanvraag vereisen echter hoge kwaliteitsgaranties (QoS), zoals een lage en constante netwerkvertraging, en verbruiken een aanzienlijke hoeveelheid bandbreedte op het netwerk. Architecturen met één centrale server kunnen deze garanties moeilijk bieden en voldoen daarom niet meer aan de hoge eisen van de volgende generatie multimediatoepassingen. In dit onderzoek worden daarom nieuwe netwerkarchitecturen bestudeerd, die een dergelijke dienstkwaliteit kunnen ondersteunen. Zowel peer-to-peer mechanismes, zoals bij het uitwisselen van muziekbestanden tussen eindgebruikers, als servergebaseerde oplossingen, zoals gedistribueerde caches en content distributie netwerken (CDN's), komen aan bod. Afhankelijk van de bestudeerde dienst en de gebruikte netwerktechnologieën en -architectuur, worden gecentraliseerde algoritmen voor netwerkontwerp voorgesteld. Deze algoritmen optimaliseren de plaatsing van de servers of netwerkcaches en bepalen de nodige capaciteit van de servers en netwerklinks. De dynamische plaatsing van de aangeboden bestanden in de verschillende netwerkelementen wordt aangepast aan de heersende staat van het netwerk en aan de variërende aanvraagpatronen van de eindgebruikers. Serverselectie, herroutering van aanvragen en het verspreiden van de belasting over het hele netwerk komen hierbij ook aan bod
    corecore