3,330 research outputs found

    Realistic Traffic Generation for Web Robots

    Full text link
    Critical to evaluating the capacity, scalability, and availability of web systems are realistic web traffic generators. Web traffic generation is a classic research problem, no generator accounts for the characteristics of web robots or crawlers that are now the dominant source of traffic to a web server. Administrators are thus unable to test, stress, and evaluate how their systems perform in the face of ever increasing levels of web robot traffic. To resolve this problem, this paper introduces a novel approach to generate synthetic web robot traffic with high fidelity. It generates traffic that accounts for both the temporal and behavioral qualities of robot traffic by statistical and Bayesian models that are fitted to the properties of robot traffic seen in web logs from North America and Europe. We evaluate our traffic generator by comparing the characteristics of generated traffic to those of the original data. We look at session arrival rates, inter-arrival times and session lengths, comparing and contrasting them between generated and real traffic. Finally, we show that our generated traffic affects cache performance similarly to actual traffic, using the common LRU and LFU eviction policies.Comment: 8 page

    Bringing Web Time Travel to MediaWiki: An Assessment of the Memento MediaWiki Extension

    Full text link
    We have implemented the Memento MediaWiki Extension Version 2.0, which brings the Memento Protocol to MediaWiki, used by Wikipedia and the Wikimedia Foundation. Test results show that the extension has a negligible impact on performance. Two 302 status code datetime negotiation patterns, as defined by Memento, have been examined for the extension: Pattern 1.1, which requires 2 requests, versus Pattern 2.1, which requires 3 requests. Our test results and mathematical review find that, contrary to intuition, Pattern 2.1 performs better than Pattern 1.1 due to idiosyncrasies in MediaWiki. In addition to implementing Memento, Version 2.0 allows administrators to choose the optional 200-style datetime negotiation Pattern 1.2 instead of Pattern 2.1. It also permits administrators the ability to have the Memento MediaWiki Extension return full HTTP 400 and 500 status codes rather than using standard MediaWiki error pages. Finally, version 2.0 permits administrators to turn off recommended Memento headers if desired. Seeing as much of our work focuses on producing the correct revision of a wiki page in response to a user's datetime input, we also examine the problem of finding the correct revisions of the embedded resources, including images, stylesheets, and JavaScript; identifying the issues and discussing whether or not MediaWiki must be changed to support this functionality.Comment: 23 pages, 18 figures, 9 tables, 17 listing

    Deterministic Object Management in Large Distributed Systems

    Get PDF
    Caching is a widely used technique to improve the scalability of distributed systems. A central issue with caching is maintaining object replicas consistent with their master copies. Large distributed systems, such as the Web, typically deploy heuristic-based consistency mechanisms, which increase delay and place extra load on the servers, while not providing guarantees that cached copies served to clients are up-to-date. Server-driven invalidation has been proposed as an approach to strong cache consistency, but it requires servers to keep track of which objects are cached by which clients. We propose an alternative approach to strong cache consistency, called MONARCH, which does not require servers to maintain per-client state. Our approach builds on a few key observations. Large and popular sites, which attract the majority of the traffic, construct their pages from distinct components with various characteristics. Components may have different content types, change characteristics, and semantics. These components are merged together to produce a monolithic page, and the information about their uniqueness is lost. In our view, pages should serve as containers holding distinct objects with heterogeneous type and change characteristics while preserving the boundaries between these objects. Servers compile object characteristics and information about relationships between containers and embedded objects into explicit object management commands. Servers piggyback these commands onto existing request/response traffic so that client caches can use these commands to make object management decisions. The use of explicit content control commands is a deterministic, rather than heuristic, object management mechanism that gives content providers more control over their content. The deterministic object management with strong cache consistency offered by MONARCH allows content providers to make more of their content cacheable. Furthermore, MONARCH enables content providers to expose internal structure of their pages to clients. We evaluated MONARCH using simulations with content collected from real Web sites. The results show that MONARCH provides strong cache consistency for all objects, even for unpredictably changing ones, and incurs smaller byte and message overhead than heuristic policies. The results also show that as the request arrival rate or the number of clients increases, the amount of server state maintained by MONARCH remains the same while the amount of server state incurred by server invalidation mechanisms grows

    A framework for the dynamic management of Peer-to-Peer overlays

    Get PDF
    Peer-to-Peer (P2P) applications have been associated with inefficient operation, interference with other network services and large operational costs for network providers. This thesis presents a framework which can help ISPs address these issues by means of intelligent management of peer behaviour. The proposed approach involves limited control of P2P overlays without interfering with the fundamental characteristics of peer autonomy and decentralised operation. At the core of the management framework lays the Active Virtual Peer (AVP). Essentially intelligent peers operated by the network providers, the AVPs interact with the overlay from within, minimising redundant or inefficient traffic, enhancing overlay stability and facilitating the efficient and balanced use of available peer and network resources. They offer an “insider‟s” view of the overlay and permit the management of P2P functions in a compatible and non-intrusive manner. AVPs can support multiple P2P protocols and coordinate to perform functions collectively. To account for the multi-faceted nature of P2P applications and allow the incorporation of modern techniques and protocols as they appear, the framework is based on a modular architecture. Core modules for overlay control and transit traffic minimisation are presented. Towards the latter, a number of suitable P2P content caching strategies are proposed. Using a purpose-built P2P network simulator and small-scale experiments, it is demonstrated that the introduction of AVPs inside the network can significantly reduce inter-AS traffic, minimise costly multi-hop flows, increase overlay stability and load-balancing and offer improved peer transfer performance

    How the Web was Won : Keeping the computer networking curriculum current with HTTP/2

    Get PDF
    The Internet and the Web continue to grow in their pervasiveness and as new functionality and behavior emerge it is a challenge to keep the computer networking curriculum up to date. There are many excellent networking textbooks available but they cannot always keep pace with the rate of change. Recent developments in HTTP are a good example of this situation. Since around 2012 many of the web transactions between popular browsers and major web sites have been using a protocol called SPDY, which operates significantly differently from HTTP version 1.1 - the version covered in networking textbooks. SPDY has been largely adopted into the final standard of HTTP version 2. This paper seeks to fill the gap between current textbooks and the versions of HTTP now in use. It gives an overview of HTTP evolution from a technical perspective before suggesting materials and approaches that can be used as learning resources for the topic and how conceptual understanding can be reinforced through hands-on activities which use browsers' native network monitoring capabilities and other readily available tools.Postprin

    ACTS in Need: Automatic Configuration Tuning with Scalability Guarantees

    Full text link
    To support the variety of Big Data use cases, many Big Data related systems expose a large number of user-specifiable configuration parameters. Highlighted in our experiments, a MySQL deployment with well-tuned configuration parameters achieves a peak throughput as 12 times much as one with the default setting. However, finding the best setting for the tens or hundreds of configuration parameters is mission impossible for ordinary users. Worse still, many Big Data applications require the support of multiple systems co-deployed in the same cluster. As these co-deployed systems can interact to affect the overall performance, they must be tuned together. Automatic configuration tuning with scalability guarantees (ACTS) is in need to help system users. Solutions to ACTS must scale to various systems, workloads, deployments, parameters and resource limits. Proposing and implementing an ACTS solution, we demonstrate that ACTS can benefit users not only in improving system performance and resource utilization, but also in saving costs and enabling fairer benchmarking

    IETF standardization in the field of the Internet of Things (IoT): a survey

    Get PDF
    Smart embedded objects will become an important part of what is called the Internet of Things. However, the integration of embedded devices into the Internet introduces several challenges, since many of the existing Internet technologies and protocols were not designed for this class of devices. In the past few years, there have been many efforts to enable the extension of Internet technologies to constrained devices. Initially, this resulted in proprietary protocols and architectures. Later, the integration of constrained devices into the Internet was embraced by IETF, moving towards standardized IP-based protocols. In this paper, we will briefly review the history of integrating constrained devices into the Internet, followed by an extensive overview of IETF standardization work in the 6LoWPAN, ROLL and CoRE working groups. This is complemented with a broad overview of related research results that illustrate how this work can be extended or used to tackle other problems and with a discussion on open issues and challenges. As such the aim of this paper is twofold: apart from giving readers solid insights in IETF standardization work on the Internet of Things, it also aims to encourage readers to further explore the world of Internet-connected objects, pointing to future research opportunities

    A novel defense mechanism against web crawler intrusion

    Get PDF
    Web robots also known as crawlers or spiders are used by search engines, hackers and spammers to gather information about web pages. Timely detection and prevention of unwanted crawlers increases privacy and security of websites. In this research, a novel method to identify web crawlers is proposed to prevent unwanted crawler to access websites. The proposed method suggests a five-factor identification process to detect unwanted crawlers. This study provides the pretest and posttest results along with a systematic evaluation of web pages with the proposed identification technique versus web pages without the proposed identification process. An experiment was performed with repeated measures for two groups with each group containing ninety web pages. The outputs of the logistic regression analysis of treatment and control groups confirm the novel five-factor identification process as an effective mechanism to prevent unwanted web crawlers. This study concluded that the proposed five distinct identifier process is a very effective technique as demonstrated by a successful outcome
    • 

    corecore