2,722 research outputs found
A Brief History of Web Crawlers
Web crawlers visit internet applications, collect data, and learn about new
web pages from visited pages. Web crawlers have a long and interesting history.
Early web crawlers collected statistics about the web. In addition to
collecting statistics about the web and indexing the applications for search
engines, modern crawlers can be used to perform accessibility and vulnerability
checks on the application. Quick expansion of the web, and the complexity added
to web applications have made the process of crawling a very challenging one.
Throughout the history of web crawling many researchers and industrial groups
addressed different issues and challenges that web crawlers face. Different
solutions have been proposed to reduce the time and cost of crawling.
Performing an exhaustive crawl is a challenging question. Additionally
capturing the model of a modern web application and extracting data from it
automatically is another open question. What follows is a brief history of
different technique and algorithms used from the early days of crawling up to
the recent days. We introduce criteria to evaluate the relative performance of
web crawlers. Based on these criteria we plot the evolution of web crawlers and
compare their performanc
Structural Learning of Attack Vectors for Generating Mutated XSS Attacks
Web applications suffer from cross-site scripting (XSS) attacks that
resulting from incomplete or incorrect input sanitization. Learning the
structure of attack vectors could enrich the variety of manifestations in
generated XSS attacks. In this study, we focus on generating more threatening
XSS attacks for the state-of-the-art detection approaches that can find
potential XSS vulnerabilities in Web applications, and propose a mechanism for
structural learning of attack vectors with the aim of generating mutated XSS
attacks in a fully automatic way. Mutated XSS attack generation depends on the
analysis of attack vectors and the structural learning mechanism. For the
kernel of the learning mechanism, we use a Hidden Markov model (HMM) as the
structure of the attack vector model to capture the implicit manner of the
attack vector, and this manner is benefited from the syntax meanings that are
labeled by the proposed tokenizing mechanism. Bayes theorem is used to
determine the number of hidden states in the model for generalizing the
structure model. The paper has the contributions as following: (1)
automatically learn the structure of attack vectors from practical data
analysis to modeling a structure model of attack vectors, (2) mimic the manners
and the elements of attack vectors to extend the ability of testing tool for
identifying XSS vulnerabilities, (3) be helpful to verify the flaws of
blacklist sanitization procedures of Web applications. We evaluated the
proposed mechanism by Burp Intruder with a dataset collected from public XSS
archives. The results show that mutated XSS attack generation can identify
potential vulnerabilities.Comment: In Proceedings TAV-WEB 2010, arXiv:1009.330
- …