8 research outputs found
A Brief History of Web Crawlers
Web crawlers visit internet applications, collect data, and learn about new
web pages from visited pages. Web crawlers have a long and interesting history.
Early web crawlers collected statistics about the web. In addition to
collecting statistics about the web and indexing the applications for search
engines, modern crawlers can be used to perform accessibility and vulnerability
checks on the application. Quick expansion of the web, and the complexity added
to web applications have made the process of crawling a very challenging one.
Throughout the history of web crawling many researchers and industrial groups
addressed different issues and challenges that web crawlers face. Different
solutions have been proposed to reduce the time and cost of crawling.
Performing an exhaustive crawl is a challenging question. Additionally
capturing the model of a modern web application and extracting data from it
automatically is another open question. What follows is a brief history of
different technique and algorithms used from the early days of crawling up to
the recent days. We introduce criteria to evaluate the relative performance of
web crawlers. Based on these criteria we plot the evolution of web crawlers and
compare their performanc
Pidgin Crasher: Searching for Minimised Crashing GUI Event Sequences
We present a search based testing system that automatically explores the space of all possible GUI event interleavings. Search guides our system to novel crashing sequences using Levenshtein distance and minimises the resulting fault-revealing UI sequences in a post-processing hill climb. We report on the application of our system to the SSBSE 2014 challenge program, Pidgin. Overall, our Pidgin Crasher found 20 different events that caused 2 distinct kinds of bugs, while the event sequences that caused them were reduced by 84% on average using our minimisation post processor
Locality-Sensitive Hashing for Efficient Web Application Security Testing
Web application security has become a major concern in recent years, as more
and more content and services are available online. A useful method for
identifying security vulnerabilities is black-box testing, which relies on an
automated crawling of web applications. However, crawling Rich Internet
Applications (RIAs) is a very challenging task. One of the key obstacles
crawlers face is the state similarity problem: how to determine if two
client-side states are equivalent. As current methods do not completely solve
this problem, a successful scan of many real-world RIAs is still not possible.
We present a novel approach to detect redundant content for security testing
purposes. The algorithm applies locality-sensitive hashing using MinHash
sketches in order to analyze the Document Object Model (DOM) structure of web
pages, and to efficiently estimate similarity between them. Our experimental
results show that this approach allows a successful scan of RIAs that cannot be
crawled otherwise
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Reverse Engineering Finite State Machines from Rich Internet Applications
In the last years, Rich Internet Applications (RIAs) have emerged as a new generation of web applications offering greater usability and interactivity than traditional ones. At the same time, RIAs introduce new issues and challenges in all the web application lifecycle activities. As an example, a key problem with RIAs consists of defining suitable software models for representing them and validating Reverse Engineering techniques for obtaining these models effectively. This paper presents a reverse engineering approach for abstracting Finite State Machines representing the client-side behaviour offered by RIAs. The approach is based on dynamic analysis of the RIA and employs clustering techniques for solving the problem of state explosion of the state machine. A case study illustrated in the paper shows the results of a preliminary experiment where the proposed process has been executed with success for reverse engineering the behaviour of an existing RI
Reverse Engineering and Testing of Rich Internet Applications
The World Wide Web experiences a continuous and constant evolution, where new initiatives, standards, approaches and technologies are continuously proposed for developing more effective and higher quality Web applications.
To satisfy the growing request of the market for Web applications, new technologies, frameworks, tools and environments that allow to develop Web and mobile applications with the least effort and in very short time have been introduced in the last years.
These new technologies have made possible the dawn of a new generation of Web applications, named Rich Internet Applications (RIAs), that offer greater usability and interactivity than traditional ones. This evolution has been accompanied by some drawbacks that are mostly due to the lack of applying well-known software engineering practices and approaches. As a consequence, new research questions and challenges have emerged in the field of web and mobile applications maintenance and testing.
The research activity described in this thesis has addressed some of these topics with the specific aim of proposing new and effective solutions to the problems of modelling, reverse engineering, comprehending, re-documenting and testing existing RIAs.
Due to the growing relevance of mobile applications in the renewed Web scenarios, the problem of testing mobile applications developed for the Android operating system has been addressed too, in an attempt of exploring and proposing new techniques of testing automation for these type of applications
Advanced Automated Web Application Vulnerability Analysis
Web applications are an integral part of our lives and culture. We useweb applications to manage our bank accounts, interact with friends,and file our taxes. A single vulnerability in one of these webapplications could allow a malicious hacker to steal your money, toimpersonate you on Facebook, or to access sensitive information, suchas tax returns. It is vital that we develop new approaches to discoverand fix these vulnerabilities before the cybercriminals exploit them.In this dissertation, I will present my research on securing the webagainst current threats and future threats. First, I will discuss mywork on improving black-box vulnerability scanners, which are toolsthat can automatically discover vulnerabilities in web applications.Then, I will describe a new type of web application vulnerability:Execution After Redirect, or EAR, and an approach to automaticallydetect EARs in web applications. Finally, I will present deDacota, afirst step in the direction of making web applications secure byconstruction
Recommended from our members
The cause, development and outcome of word-of-mouth marketing: with particular reference to WOM volume, valence and the modeling of viral marketing
Viral marketing is a form of online word-of-mouth (WOM) communication in which individuals are encouraged to pass on promotional messages through social websites. With the growing popularity of online social websites, viral marketing has increasingly garnered attention of marketers and marketing researchers alike. The two most important WOM attributes highlighted in the extant literature are volume and valence. This thesis looked into the cause, development and outcome of WOM marketing and provided computational models for forecasting the development of WOM volume and valence of viral marketing in social websites. With the data extracted from large-scale web-crawling activities, through a series of computer simulation experiments comparable to social websites, the author developed models to predict WOM volume and valence in viral marketing. The model for predicting WOM volume in viral marketing used theories of network topologies. The model for predicting WOM valence in viral marketing used an artificial neural network model. The author discussed the insights from the findings and suggested viral marketing strategies to optimize the performance of WOM volume and valence in social websites. A key contribution of this thesis is the new approaches of modeling and data collection for WOM volume and valance forecasting in viral marketing