176 research outputs found

    Quality-driven management of video streaming services in segment-based cache networks

    Get PDF

    Web Archive Services Framework for Tighter Integration Between the Past and Present Web

    Get PDF
    Web archives have contained the cultural history of the web for many years, but they still have a limited capability for access. Most of the web archiving research has focused on crawling and preservation activities, with little focus on the delivery methods. The current access methods are tightly coupled with web archive infrastructure, hard to replicate or integrate with other web archives, and do not cover all the users\u27 needs. In this dissertation, we focus on the access methods for archived web data to enable users, third-party developers, researchers, and others to gain knowledge from the web archives. We build ArcSys, a new service framework that extracts, preserves, and exposes APIs for the web archive corpus. The dissertation introduces a novel categorization technique to divide the archived corpus into four levels. For each level, we will propose suitable services and APIs that enable both users and third-party developers to build new interfaces. The first level is the content level that extracts the content from the archived web data. We develop ArcContent to expose the web archive content processed through various filters. The second level is the metadata level; we extract the metadata from the archived web data and make it available to users. We implement two services, ArcLink for temporal web graph and ArcThumb for optimizing the thumbnail creation in the web archives. The third level is the URI level that focuses on using the URI HTTP redirection status to enhance the user query. Finally, the highest level in the web archiving service framework pyramid is the archive level. In this level, we define the web archive by the characteristics of its corpus and building Web Archive Profiles. The profiles are used by the Memento Aggregator for query optimization

    Web page performance analysis

    Get PDF
    Computer systems play an increasingly crucial and ubiquitous role in human endeavour by carrying out or facilitating tasks and providing information and services. How much work these systems can accomplish, within a certain amount of time, using a certain amount of resources, characterises the systems’ performance, which is a major concern when the systems are planned, designed, implemented, deployed, and evolve. As one of the most popular computer systems, the Web is inevitably scrutinised in terms of performance analysis that deals with its speed, capacity, resource utilisation, and availability. Performance analyses for the Web are normally done from the perspective of the Web servers and the underlying network (the Internet). This research, on the other hand, approaches Web performance analysis from the perspective of Web pages. The performance metric of interest here is response time. Response time is studied as an attribute of Web pages, instead of being considered purely a result of network and server conditions. A framework that consists of measurement, modelling, and monitoring (3Ms) of Web pages that revolves around response time is adopted to support the performance analysis activity. The measurement module enables Web page response time to be measured and is used to support the modelling module, which in turn provides references for the monitoring module. The monitoring module estimates response time. The three modules are used in the software development lifecycle to ensure that developed Web pages deliver at worst satisfactory response time (within a maximum acceptable time), or preferably much better response time, thereby maximising the efficiency of the pages. The framework proposes a systematic way to understand response time as it is related to specific characteristics of Web pages and explains how individual Web page response time can be examined and improved

    Secure and efficient processing of outsourced data structures using trusted execution environments

    Full text link
    In recent years, more and more companies make use of cloud computing; in other words, they outsource data storage and data processing to a third party, the cloud provider. From cloud computing, the companies expect, for example, cost reductions, fast deployment time, and improved security. However, security also presents a significant challenge as demonstrated by many cloud computing–related data breaches. Whether it is due to failing security measures, government interventions, or internal attackers, data leakages can have severe consequences, e.g., revenue loss, damage to brand reputation, and loss of intellectual property. A valid strategy to mitigate these consequences is data encryption during storage, transport, and processing. Nevertheless, the outsourced data processing should combine the following three properties: strong security, high efficiency, and arbitrary processing capabilities. Many approaches for outsourced data processing based purely on cryptography are available. For instance, encrypted storage of outsourced data, property-preserving encryption, fully homomorphic encryption, searchable encryption, and functional encryption. However, all of these approaches fail in at least one of the three mentioned properties. Besides approaches purely based on cryptography, some approaches use a trusted execution environment (TEE) to process data at a cloud provider. TEEs provide an isolated processing environment for user-defined code and data, i.e., the confidentiality and integrity of code and data processed in this environment are protected against other software and physical accesses. Additionally, TEEs promise efficient data processing. Various research papers use TEEs to protect objects at different levels of granularity. On the one end of the range, TEEs can protect entire (legacy) applications. This approach facilitates the development effort for protected applications as it requires only minor changes. However, the downsides of this approach are that the attack surface is large, it is difficult to capture the exact leakage, and it might not even be possible as the isolated environment of commercially available TEEs is limited. On the other end of the range, TEEs can protect individual, stateless operations, which are called from otherwise unchanged applications. This approach does not suffer from the problems stated before, but it leaks the (encrypted) result of each operation and the detailed control flow through the application. It is difficult to capture the leakage of this approach, because it depends on the processed operation and the operation’s location in the code. In this dissertation, we propose a trade-off between both approaches: the TEE-based processing of data structures. In this approach, otherwise unchanged applications call a TEE for self-contained data structure operations and receive encrypted results. We examine three data structures: TEE-protected B+-trees, TEE-protected database dictionaries, and TEE-protected file systems. Using these data structures, we design three secure and efficient systems: an outsourced system for index searches; an outsourced, dictionary-encoding–based, column-oriented, in-memory database supporting analytic queries on large datasets; and an outsourced system for group file sharing supporting large and dynamic groups. Due to our approach, the systems have a small attack surface, a low likelihood of security-relevant bugs, and a data owner can easily perform a (formal) code verification of the sensitive code. At the same time, we prevent low-level leakage of individual operation results. For all systems, we present a thorough security evaluation showing lower bounds of security. Additionally, we use prototype implementations to present upper bounds on performance. For our implementations, we use a widely available TEE that has a limited isolated environment—Intel Software Guard Extensions. By comparing our systems to related work, we show that they provide a favorable trade-off regarding security and efficiency

    Effective web crawlers

    Get PDF
    Web crawlers are the component of a search engine that must traverse the Web, gathering documents in a local repository for indexing by a search engine so that they can be ranked by their relevance to user queries. Whenever data is replicated in an autonomously updated environment, there are issues with maintaining up-to-date copies of documents. When documents are retrieved by a crawler and have subsequently been altered on the Web, the effect is an inconsistency in user search results. While the impact depends on the type and volume of change, many existing algorithms do not take the degree of change into consideration, instead using simple measures that consider any change as significant. Furthermore, many crawler evaluation metrics do not consider index freshness or the amount of impact that crawling algorithms have on user results. Most of the existing work makes assumptions about the change rate of documents on the Web, or relies on the availability of a long history of change. Our work investigates approaches to improving index consistency: detecting meaningful change, measuring the impact of a crawl on collection freshness from a user perspective, developing a framework for evaluating crawler performance, determining the effectiveness of stateless crawl ordering schemes, and proposing and evaluating the effectiveness of a dynamic crawl approach. Our work is concerned specifically with cases where there is little or no past change statistics with which predictions can be made. Our work analyses different measures of change and introduces a novel approach to measuring the impact of recrawl schemes on search engine users. Our schemes detect important changes that affect user results. Other well-known and widely used schemes have to retrieve around twice the data to achieve the same effectiveness as our schemes. Furthermore, while many studies have assumed that the Web changes according to a model, our experimental results are based on real web documents. We analyse various stateless crawl ordering schemes that have no past change statistics with which to predict which documents will change, none of which, to our knowledge, has been tested to determine effectiveness in crawling changed documents. We empirically show that the effectiveness of these schemes depends on the topology and dynamics of the domain crawled and that no one static crawl ordering scheme can effectively maintain freshness, motivating our work on dynamic approaches. We present our novel approach to maintaining freshness, which uses the anchor text linking documents to determine the likelihood of a document changing, based on statistics gathered during the current crawl. We show that this scheme is highly effective when combined with existing stateless schemes. When we combine our scheme with PageRank, our approach allows the crawler to improve both freshness and quality of a collection. Our scheme improves freshness regardless of which stateless scheme it is used in conjunction with, since it uses both positive and negative reinforcement to determine which document to retrieve. Finally, we present the design and implementation of Lara, our own distributed crawler, which we used to develop our testbed

    Bandwidth management and monitoring for IP network traffic : an investigation

    Get PDF
    Bandwidth management is a topic which is often discussed, but on which relatively little work has been done with regard to compiling a comprehensive set of techniques and methods for managing traffic on a network. What work has been done has concentrated on higher end networks, rather than the low bandwidth links which are commonly available in South Africa and other areas outside the United States. With more organisations increasingly making use of the Internet on a daily basis, the demand for bandwidth is outstripping the ability of providers to upgrade their infrastructure. This resource is therefore in need of management. In addition, for Internet access to become economically viable for widespread use by schools, NGOs and other academic institutions, the associated costs need to be controlled. Bandwidth management not only impacts on direct cost control, but encompasses the process of engineering a network and network resources in order to ensure the provision of as optimal a service as possible. Included in this is the provision of user education. Software has been developed for the implementation of traffic quotas, dynamic firewalling and visualisation. The research investigates various methods for monitoring and management of IP traffic with particular applicability to low bandwidth links. Several forms of visualisation for the analysis of historical and near-realtime traffic data are also discussed, including the use of three-dimensional landscapes. A number of bandwidth management practices are proposed, and the advantages of their combination, and complementary use are highlighted. By implementing these suggested policies, a holistic approach can be taken to the issue of bandwidth management on Internet links

    Managing cache for efficient query processing

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Speeding Up Mobile Browsers without Infrastructure Support

    Get PDF
    Mobile browsers are known to be slow. We characterize the performance of mobile browsers and find out that resource loading is the bottleneck. Leveraging an unprecedented set of web usage data collected from 24 iPhone users continuously over one year, we examine the three fundamental, orthogonal approaches to improve resource loading without infrastructure support: caching, prefetching, and speculative loading, which is first proposed and studied in this work. Speculative loading predicts and speculatively loads the subresources needed to open a webpage once its URL is given. We show that while caching and prefetching are highly limited for mobile browsing, speculative loading can be significantly more effective. Empirically, we show that client-only solutions can improve the browser speed by 1.4 seconds on average. We also report the design, realization, and evaluation of speculative loading in a WebKit-based browser called Tempo. On average, Tempo can reduce browser delay by 1 second (~20%)
    corecore