23 research outputs found

    Realistic Traffic Generation for Web Robots

    Full text link
    Critical to evaluating the capacity, scalability, and availability of web systems are realistic web traffic generators. Web traffic generation is a classic research problem, no generator accounts for the characteristics of web robots or crawlers that are now the dominant source of traffic to a web server. Administrators are thus unable to test, stress, and evaluate how their systems perform in the face of ever increasing levels of web robot traffic. To resolve this problem, this paper introduces a novel approach to generate synthetic web robot traffic with high fidelity. It generates traffic that accounts for both the temporal and behavioral qualities of robot traffic by statistical and Bayesian models that are fitted to the properties of robot traffic seen in web logs from North America and Europe. We evaluate our traffic generator by comparing the characteristics of generated traffic to those of the original data. We look at session arrival rates, inter-arrival times and session lengths, comparing and contrasting them between generated and real traffic. Finally, we show that our generated traffic affects cache performance similarly to actual traffic, using the common LRU and LFU eviction policies.Comment: 8 page

    Security Monitoring of HTTP Traffic Using Extended Flows

    Get PDF
    In this paper, we present an analysis of HTTP traffic in a large-scale environment which uses network flow monitoring extended by parsing HTTP requests. In contrast to previously published analyses, we were the first to classify patterns of HTTP traffic which are relevant to network security. We described three classes of HTTP traffic which contain brute-force password attacks, connections to proxies, HTTP scanners, and web crawlers. Using the classification, we were able to detect up to 16 previously undetectable brute-force password attacks and 19 HTTP scans per day in our campus network. The activity of proxy servers and web crawlers was also observed. Symptoms of these attacks may be detected by other methods based on traditional flow monitoring, but detection using the analysis of HTTP requests is more straightforward. We, thus, confirm the added value of extended flow monitoring in comparison to the traditional method

    A novel defense mechanism against web crawler intrusion

    Get PDF
    Web robots also known as crawlers or spiders are used by search engines, hackers and spammers to gather information about web pages. Timely detection and prevention of unwanted crawlers increases privacy and security of websites. In this research, a novel method to identify web crawlers is proposed to prevent unwanted crawler to access websites. The proposed method suggests a five-factor identification process to detect unwanted crawlers. This study provides the pretest and posttest results along with a systematic evaluation of web pages with the proposed identification technique versus web pages without the proposed identification process. An experiment was performed with repeated measures for two groups with each group containing ninety web pages. The outputs of the logistic regression analysis of treatment and control groups confirm the novel five-factor identification process as an effective mechanism to prevent unwanted web crawlers. This study concluded that the proposed five distinct identifier process is a very effective technique as demonstrated by a successful outcome

    A study of different web-crawler behaviour

    Get PDF
    The article deals with a study of web-crawler behaviour on different websites. A classification of web-robots, information gathering tools and their detection methods are provided. Well-known scrapers and their behaviour are analyzed on the base of large web-server log set. Experimental results demonstrate that web-robot can be distinguished from human by feature analysis. The results of the research can be used as a basis for comprehensive intrusion detection and prevention system development

    Evolución y tendencias actuales de los Web crawlers

    Get PDF
    The information stored through the social network services is a growing source of information with special dynamic characteristics. The mechanisms responsible for tracking changes in such information (Web crawlers) often must be studied, and it is necessary to review and improve their algorithms. This document presents the current status of tracking algorithms of the Web (Web crawlers), its trends and developments, and its approach towards managing challenges emerging like social networks.La información disponible en redes de datos como la Web o las redes sociales se encuentra en continuo crecimiento, con unas características de dinamismo especiales. Entre los mecanismos encargados de rastrear los cambios en dicha información se encuentran los Webcrawlers, los cuales por la misma dinámica de la información, deben mejorarse constantemente en busca de algoritmos más eficientes. Este documento presenta el estado actual de los algoritmos de rastreo de la Web, sus tendencias, avances, y nuevos enfoques dentro del contexto de la dinámica de las redes sociales

    Vulnerability Assessment of IPv6 Websites to SQL Injection and Other Application Level Attacks

    Get PDF
    Given the proliferation of internet connected devices, IPv6 has been proposed to replace IPv4. Aside from providing a larger address space which can be assigned to internet enabled devices, it has been suggested that the IPv6 protocol offers increased security due to the fact that with the large number of addresses available, standard IP scanning attacks will no longer become feasible. However, given the interest in attacking organizations rather than individual devices, most initial points of entry onto an organization's network and their attendant devices are visible and reachable through web crawling techniques, and, therefore, attacks on the visible application layer may offer ways to compromise the overall network. In this evaluation, we provide a straightforward implementation of a web crawler in conjunction with a benign black box penetration testing system and analyze the ease at which SQL injection attacks can be carried out

    Credence Goods in Online Markets: An Empirical Analysis of Returns and Sales After Returns

    Get PDF
    While e-commerce sales continue to grow, product returns remain a key risk for online retailers’ profitability. At the same time, credence goods such as sustainable products become increasingly important in retailing. This study aims to combine these two developments and empirically investigates the effect of credence goods on product returns and sales after returns in e-commerce. Furthermore, we assess how third-party assurances can help organizations to positively affect customer behavior and reduce product returns of credence goods. Our research is based on unique data from a large-scale online field experiment with 35,000 customers combined with data of more than one million past transactions of these customers. Surprisingly, the results reveal that credence goods are associated with lower product returns than experience goods. Adding a third-party certificate to the online product description helps to reduce product returns and increase sales after returns. We find that customer relationship strength and price consciousness act as boundary conditions for the certificate to reduce product returns. Our research contributes to signaling theory and extends IS literature on product uncertainty and returns to the field of credence goods. Furthermore, we provide relevant insights for e-commerce practitioners on how to manage sales and returns of credence goods

    Effects of reputation and aesthetics on the credibility of search engine results

    Get PDF
    Search engines are the primary gatekeepers of online information, but are judged differently than traditional gatekeepers due to the interactive and impersonal nature of the online search process. The researcher distributed an online survey with 141 respondents and conducted 22 observational interviews. Information credibility was tested through measures of expertise, goodwill, and trustworthiness, which were each correlated with perceived reputation and perceived aesthetics. Search engine reputation was found to have moderate correlations with expertise and trustworthiness, and a lesser, but still moderate correlation with goodwill. Aesthetics was related to the credibility measures in similar but lesser proportions. Interviews indicated search habits such as wariness towards commercial interests and the high impact of search intent on the rigor of credibility judgments
    corecore