Search CORE

11,157 research outputs found

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

Automated Functional Testing based on the Navigation of Web Applications

Author: Dueñas Juan Carlos
García Boni
Publication venue: 'Open Publishing Association'
Publication date: 01/08/2011
Field of study

Web applications are becoming more and more complex. Testing such applications is an intricate hard and time-consuming activity. Therefore, testing is often poorly performed or skipped by practitioners. Test automation can help to avoid this situation. Hence, this paper presents a novel approach to perform automated software testing for web applications based on its navigation. On the one hand, web navigation is the process of traversing a web application using a browser. On the other hand, functional requirements are actions that an application must do. Therefore, the evaluation of the correct navigation of web applications results in the assessment of the specified functional requirements. The proposed method to perform the automation is done in four levels: test case generation, test data derivation, test case execution, and test case reporting. This method is driven by three kinds of inputs: i) UML models; ii) Selenium scripts; iii) XML files. We have implemented our approach in an open-source testing framework named Automatic Testing Platform. The validation of this work has been carried out by means of a case study, in which the target is a real invoice management system developed using a model-driven approach.Comment: In Proceedings WWV 2011, arXiv:1108.208

arXiv.org e-Print Archive

Directory of Open Access Journals

Scripted GUI Testing of Android Apps: A Study on Diffusion, Evolution and Fragility

Author: Coppola Riccardo
Morisio Maurizio
Torchiano Marco
Publication venue
Publication date: 01/01/2017
Field of study

Background. Evidence suggests that mobile applications are not thoroughly tested as their desktop counterparts. In particular GUI testing is generally limited. Like web-based applications, mobile apps suffer from GUI test fragility, i.e. GUI test classes failing due to minor modifications in the GUI, without the application functionalities being altered. Aims. The objective of our study is to examine the diffusion of GUI testing on Android, and the amount of changes required to keep test classes up to date, and in particular the changes due to GUI test fragility. We define metrics to characterize the modifications and evolution of test classes and test methods, and proxies to estimate fragility-induced changes. Method. To perform our experiments, we selected six widely used open-source tools for scripted GUI testing of mobile applications previously described in the literature. We have mined the repositories on GitHub that used those tools, and computed our set of metrics. Results. We found that none of the considered GUI testing frameworks achieved a major diffusion among the open-source Android projects available on GitHub. For projects with GUI tests, we found that test suites have to be modified often, specifically 5\%-10\% of developers' modified LOCs belong to tests, and that a relevant portion (60\% on average) of such modifications are induced by fragility. Conclusions. Fragility of GUI test classes constitute a relevant concern, possibly being an obstacle for developers to adopt automated scripted GUI tests. This first evaluation and measure of fragility of Android scripted GUI testing can constitute a benchmark for developers, and the basis for the definition of a taxonomy of fragility causes, and actionable guidelines to mitigate the issue.Comment: PROMISE'17 Conference, Best Paper Awar

arXiv.org e-Print Archive

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Simplicity-oriented lifelong learning of web applications

Author: Bainczyk Julius Alexander
Publication venue
Publication date: 01/01/2023
Field of study

Nowadays, web applications are ubiquitous. Entire business models revolve around making their services available over the Internet, anytime, anywhere in the world. Due to today’s rapid development practices, software changes are released faster than ever before, creating the risk of losing control over the quality of the delivered products. To counter this, appropriate testing methodologies must be deeply integrated into each phase of the development cycle to identify potential defects as early as possible and to ensure that the product operates as expected in production. The use of low- and no-code tools and code generation technologies can drastically reduce the implementation effort by using well-tailored (graphical) Domain-Specific Languages (DSLs) to focus on what is important: the product. DSLs and corresponding Integrated Modeling Environments (IMEs) are a key enabler for quality control because many system properties can already be verified at a pre-product level. However, to verify that the product fulfills given functional requirements at runtime, end-to-end testing is still a necessity. This dissertation describes the implementation of a lifelong learning framework for the continuous quality control of web applications. In this framework, models representing user-level behavior are mined from running systems using active automata learning, and system properties are verified using model checking. All this is achieved in a continuous and fully automated manner. Code changes trigger testing, learning, and verification processes which generate feedback that can be used for model refinement or product improvement. The main focus of this framework is simplicity. On the one hand, it allows Quality Assurance (QA) engineers to apply learning-based testing techniques to web applications with minimal effort, even without writing code; on the other hand, it allows automation engineers to easily implement these techniques in modern development workflows driven by Continuous Integration and Continuous Deployment (CI/CD). The effectiveness of this framework is leveraged by the Language-Driven Engineering (LDE) approach to web development. Key to this is the text-based DSL iHTML, which enables the instrumentation of user interfaces to make web applications learnable by design, i.e., they adhere to practices that allow fully automated inference of behavioral models without prior specification of an input alphabet. By designing code generators to generate instrumented web-based products, the effort for quality control in the LDE ecosystem is minimized and reduced to formulating runtime properties in temporal logic and verifying them against learned models

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Mining Web Usage to Generate Regression GUI Tests Automatically

Author: Marta Inês Macedo Vasconcelos
Publication venue
Publication date: 15/07/2019
Field of study

Repositório Aberto da Universidade do Porto

Topic driven testing

Author: Rau Andreas
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2020
Field of study

Modern interactive applications offer so many interaction opportunities that automated exploration and testing becomes practically impossible without some domain specific guidance towards relevant functionality. In this dissertation, we present a novel fundamental graphical user interface testing method called topic-driven testing. We mine the semantic meaning of interactive elements, guide testing, and identify core functionality of applications. The semantic interpretation is close to human understanding and allows us to learn specifications and transfer knowledge across multiple applications independent of the underlying device, platform, programming language, or technology stack—to the best of our knowledge a unique feature of our technique. Our tool ATTABOY is able to take an existing Web application test suite say from Amazon, execute it on ebay, and thus guide testing to relevant core functionality. Tested on different application domains such as eCommerce, news pages, mail clients, it can trans- fer on average sixty percent of the tested application behavior to new apps—without any human intervention. On top of that, topic-driven testing can go with even more vague instructions of how-to descriptions or use-case descriptions. Given an instruction, say “add item to shopping cart”, it tests the specified behavior in an application–both in a browser as well as in mobile apps. It thus improves state-of-the-art UI testing frame- works, creates change resilient UI tests, and lays the foundation for learning, transfer- ring, and enforcing common application behavior. The prototype is up to five times faster than existing random testing frameworks and tests functions that are hard to cover by non-trained approaches.Moderne interaktive Anwendungen bieten so viele Interaktionsmöglichkeiten, dass eine vollständige automatische Exploration und das Testen aller Szenarien praktisch unmöglich ist. Stattdessen muss die Testprozedur auf relevante Kernfunktionalität ausgerichtet werden. Diese Arbeit stellt ein neues fundamentales Testprinzip genannt thematisches Testen vor, das beliebige Anwendungen u ̈ber die graphische Oberfläche testet. Wir untersuchen die semantische Bedeutung von interagierbaren Elementen um die Kernfunktionenen von Anwendungen zu identifizieren und entsprechende Tests zu erzeugen. Statt typischen starren Testinstruktionen orientiert sich diese Art von Tests an menschlichen Anwendungsfällen in natürlicher Sprache. Dies erlaubt es, Software Spezifikationen zu erlernen und Wissen von einer Anwendung auf andere zu übertragen unabhängig von der Anwendungsart, der Programmiersprache, dem Testgerät oder der -Plattform. Nach unserem Kenntnisstand ist unser Ansatz der Erste dieser Art. Wir präsentieren ATTABOY, ein Programm, das eine existierende Testsammlung für eine Webanwendung (z.B. für Amazon) nimmt und in einer beliebigen anderen Anwendung (sagen wir ebay) ausführt. Dadurch werden Tests für Kernfunktionen generiert. Bei der ersten Ausführung auf Anwendungen aus den Domänen Online Shopping, Nachrichtenseiten und eMail, erzeugt der Prototyp sechzig Prozent der Tests automatisch. Ohne zusätzlichen manuellen Aufwand. Darüber hinaus interpretiert themen- getriebenes Testen auch vage Anweisungen beispielsweise von How-to Anleitungen oder Anwendungsbeschreibungen. Eine Anweisung wie "Fügen Sie das Produkt in den Warenkorb hinzu" testet das entsprechende Verhalten in der Anwendung. Sowohl im Browser, als auch in einer mobilen Anwendung. Die erzeugten Tests sind robuster und effektiver als vergleichbar erzeugte Tests. Der Prototyp testet die Zielfunktionalität fünf mal schneller und testet dabei Funktionen die durch nicht spezialisierte Ansätze kaum zu erreichen sind

Universaar

Acronym

Neural Embeddings for Web Testing

Author: Biagiola Matteo
Starace Luigi Libero Lucio
Stocco Andrea
Tonella Paolo
Willi Alexandra
Publication venue
Publication date: 12/06/2023
Field of study

Web test automation techniques employ web crawlers to automatically produce a web app model that is used for test generation. Existing crawlers rely on app-specific, threshold-based, algorithms to assess state equivalence. Such algorithms are hard to tune in the general case and cannot accurately identify and remove near-duplicate web pages from crawl models. Failing to retrieve an accurate web app model results in automated test generation solutions that produce redundant test cases and inadequate test suites that do not cover the web app functionalities adequately. In this paper, we propose WEBEMBED, a novel abstraction function based on neural network embeddings and threshold-free classifiers that can be used to produce accurate web app models during model-based test generation. Our evaluation on nine web apps shows that WEBEMBED outperforms state-of-the-art techniques by detecting near-duplicates more accurately, inferring better web app models that exhibit 22% more precision, and 24% more recall on average. Consequently, the test suites generated from these models achieve higher code coverage, with improvements ranging from 2% to 59% on an app-wise basis and averaging at 23%.Comment: 12 pages; in revisio

arXiv.org e-Print Archive

Experiences from declarative markup to improve the accessibility of HTML

Author: Rubano Vincenzo
Vitali Fabio
Publication venue
Publication date: 01/01/2020
Field of study

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna