Search CORE

8 research outputs found

A Brief History of Web Crawlers

Author: Bochmann Gregor V.
Dinçktürk Mustafa Emre
Hooshmand Salman
Jourdan Guy-Vincent
Mirtaheri Seyed M.
Onut Iosif Viorel
Publication venue
Publication date: 04/05/2014
Field of study

Web crawlers visit internet applications, collect data, and learn about new web pages from visited pages. Web crawlers have a long and interesting history. Early web crawlers collected statistics about the web. In addition to collecting statistics about the web and indexing the applications for search engines, modern crawlers can be used to perform accessibility and vulnerability checks on the application. Quick expansion of the web, and the complexity added to web applications have made the process of crawling a very challenging one. Throughout the history of web crawling many researchers and industrial groups addressed different issues and challenges that web crawlers face. Different solutions have been proposed to reduce the time and cost of crawling. Performing an exhaustive crawl is a challenging question. Additionally capturing the model of a modern web application and extracting data from it automatically is another open question. What follows is a brief history of different technique and algorithms used from the early days of crawling up to the recent days. We introduce criteria to evaluate the relative performance of web crawlers. Based on these criteria we plot the evolution of web crawlers and compare their performanc

arXiv.org e-Print Archive

CiteSeerX

Pidgin Crasher: Searching for Minimised Crashing GUI Event Sequences

Author: Dan H
Harman M
Krinke J
Li L
Marginean A
Wu F
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We present a search based testing system that automatically explores the space of all possible GUI event interleavings. Search guides our system to novel crashing sequences using Levenshtein distance and minimises the resulting fault-revealing UI sequences in a post-processing hill climb. We report on the application of our system to the SSBSE 2014 challenge program, Pidgin. Overall, our Pidgin Crasher found 20 different events that caused 2 distinct kinds of bugs, while the event sequences that caused them were reduced by 84% on average using our minimisation post processor

Crossref

UCL Discovery

Locality-Sensitive Hashing for Efficient Web Application Security Testing

Author: Ben-Bassat Ilan
Rokah Erez
Publication venue: 'Scitepress'
Publication date: 01/01/2019
Field of study

Web application security has become a major concern in recent years, as more and more content and services are available online. A useful method for identifying security vulnerabilities is black-box testing, which relies on an automated crawling of web applications. However, crawling Rich Internet Applications (RIAs) is a very challenging task. One of the key obstacles crawlers face is the state similarity problem: how to determine if two client-side states are equivalent. As current methods do not completely solve this problem, a successful scan of many real-world RIAs is still not possible. We present a novel approach to detect redundant content for security testing purposes. The algorithm applies locality-sensitive hashing using MinHash sketches in order to analyze the Document Object Model (DOM) structure of web pages, and to efficiently estimate similarity between them. Our experimental results show that this approach allows a successful scan of RIAs that cannot be crawled otherwise

arXiv.org e-Print Archive

Crossref

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

Reverse Engineering Finite State Machines from Rich Internet Applications

Author: Domenico Amalfitano
FASOLINO ANNA RITA
TRAMONTANA PORFIRIO
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

In the last years, Rich Internet Applications (RIAs) have emerged as a new generation of web applications offering greater usability and interactivity than traditional ones. At the same time, RIAs introduce new issues and challenges in all the web application lifecycle activities. As an example, a key problem with RIAs consists of defining suitable software models for representing them and validating Reverse Engineering techniques for obtaining these models effectively. This paper presents a reverse engineering approach for abstracting Finite State Machines representing the client-side behaviour offered by RIAs. The approach is based on dynamic analysis of the RIA and employs clustering techniques for solving the problem of state explosion of the state machine. A case study illustrated in the paper shows the results of a preliminary experiment where the proposed process has been executed with success for reverse engineering the behaviour of an existing RI

Archivio della ricerca - Università degli studi di Napoli Federico II

Reverse Engineering and Testing of Rich Internet Applications

Author: Amalfitano Domenico
Publication venue
Publication date: 30/11/2011
Field of study

The World Wide Web experiences a continuous and constant evolution, where new initiatives, standards, approaches and technologies are continuously proposed for developing more effective and higher quality Web applications. To satisfy the growing request of the market for Web applications, new technologies, frameworks, tools and environments that allow to develop Web and mobile applications with the least effort and in very short time have been introduced in the last years. These new technologies have made possible the dawn of a new generation of Web applications, named Rich Internet Applications (RIAs), that offer greater usability and interactivity than traditional ones. This evolution has been accompanied by some drawbacks that are mostly due to the lack of applying well-known software engineering practices and approaches. As a consequence, new research questions and challenges have emerged in the field of web and mobile applications maintenance and testing. The research activity described in this thesis has addressed some of these topics with the specific aim of proposing new and effective solutions to the problems of modelling, reverse engineering, comprehending, re-documenting and testing existing RIAs. Due to the growing relevance of mobile applications in the renewed Web scenarios, the problem of testing mobile applications developed for the Android operating system has been addressed too, in an attempt of exploring and proposing new techniques of testing automation for these type of applications

Università degli Studi di Napoli Federico Il Open Archive

Advanced Automated Web Application Vulnerability Analysis

Author: Doupé Adam
Publication venue: eScholarship, University of California
Publication date: 01/01/2014
Field of study

Web applications are an integral part of our lives and culture. We useweb applications to manage our bank accounts, interact with friends,and file our taxes. A single vulnerability in one of these webapplications could allow a malicious hacker to steal your money, toimpersonate you on Facebook, or to access sensitive information, suchas tax returns. It is vital that we develop new approaches to discoverand fix these vulnerabilities before the cybercriminals exploit them.In this dissertation, I will present my research on securing the webagainst current threats and future threats. First, I will discuss mywork on improving black-box vulnerability scanners, which are toolsthat can automatically discover vulnerabilities in web applications.Then, I will describe a new type of web application vulnerability:Execution After Redirect, or EAR, and an approach to automaticallydetect EARs in web applications. Finally, I will present deDacota, afirst step in the direction of making web applications secure byconstruction

Ezid

eScholarship - University of California

Recommended from our members

The cause, development and outcome of word-of-mouth marketing: with particular reference to WOM volume, valence and the modeling of viral marketing

Author: Wu Ying
Publication venue
Publication date: 12/07/2017
Field of study

Viral marketing is a form of online word-of-mouth (WOM) communication in which individuals are encouraged to pass on promotional messages through social websites. With the growing popularity of online social websites, viral marketing has increasingly garnered attention of marketers and marketing researchers alike. The two most important WOM attributes highlighted in the extant literature are volume and valence. This thesis looked into the cause, development and outcome of WOM marketing and provided computational models for forecasting the development of WOM volume and valence of viral marketing in social websites. With the data extracted from large-scale web-crawling activities, through a series of computer simulation experiments comparable to social websites, the author developed models to predict WOM volume and valence in viral marketing. The model for predicting WOM volume in viral marketing used theories of network topologies. The model for predicting WOM valence in viral marketing used an artificial neural network model. The author discussed the insights from the findings and suggested viral marketing strategies to optimize the performance of WOM volume and valence in social websites. A key contribution of this thesis is the new approaches of modeling and data collection for WOM volume and valance forecasting in viral marketing

Sussex Research Online