575 research outputs found

    Neural Embeddings for Web Testing

    Full text link
    Web test automation techniques employ web crawlers to automatically produce a web app model that is used for test generation. Existing crawlers rely on app-specific, threshold-based, algorithms to assess state equivalence. Such algorithms are hard to tune in the general case and cannot accurately identify and remove near-duplicate web pages from crawl models. Failing to retrieve an accurate web app model results in automated test generation solutions that produce redundant test cases and inadequate test suites that do not cover the web app functionalities adequately. In this paper, we propose WEBEMBED, a novel abstraction function based on neural network embeddings and threshold-free classifiers that can be used to produce accurate web app models during model-based test generation. Our evaluation on nine web apps shows that WEBEMBED outperforms state-of-the-art techniques by detecting near-duplicates more accurately, inferring better web app models that exhibit 22% more precision, and 24% more recall on average. Consequently, the test suites generated from these models achieve higher code coverage, with improvements ranging from 2% to 59% on an app-wise basis and averaging at 23%.Comment: 12 pages; in revisio

    Web application testing: Using tree kernels to detect near-duplicate states in automated model inference

    Get PDF
    Background: In the context of End-to-End testing of web applications , automated exploration techniques (a.k.a. crawling) are widely used to infer state-based models of the site under test. These models, in which states represent features of the web application and transitions represent reachability relationships, can be used for several model-based testing tasks, such as test case generation. However, current exploration techniques often lead to models containing many near-duplicate states, i.e., states representing slightly different pages that are in fact instances of the same feature. This has a negative impact on the subsequent model-based testing tasks, adversely affecting, for example, size, running time, and achieved coverage of generated test suites. Aims: As a web page can be naturally represented by its tree-structured DOM representation, we propose a novel near-duplicate detection technique to improve the model inference of web applications, based on Tree Kernel (TK) functions. TKs are a class of functions that compute similarity between tree-structured objects, largely investigated and successfully applied in the Natural Language Processing domain. Method: To evaluate the capability of the proposed approach in detecting near-duplicate web pages, we conducted preliminary classification experiments on a freely-available massive dataset of about 100k manually annotated web page pairs. We compared the classification performance of the proposed approach with other state-of-the-art near-duplicate detection techniques. Results: Preliminary results show that our approach performs better than state-of-the-art techniques in the near-duplicate detection classification task. Conclusions: These promising results show that TKs can be applied to near-duplicate detection in the context of web application model inference, and motivate further research in this direction to assess the impact of the technique on the quality of the inferred models and on the subsequent application of model-based testing techniques

    Enhancing Web Browsing Security

    Get PDF
    Web browsing has become an integral part of our lives, and we use browsers to perform many important activities almost everyday and everywhere. However, due to the vulnerabilities in Web browsers and Web applications and also due to Web users\u27 lack of security knowledge, browser-based attacks are rampant over the Internet and have caused substantial damage to both Web users and service providers. Enhancing Web browsing security is therefore of great need and importance.;This dissertation concentrates on enhancing the Web browsing security through exploring and experimenting with new approaches and software systems. Specifically, we have systematically studied four challenging Web browsing security problems: HTTP cookie management, phishing, insecure JavaScript practices, and browsing on untrusted public computers. We have proposed new approaches to address these problems, and built unique systems to validate our approaches.;To manage HTTP cookies, we have proposed an approach to automatically validate the usefulness of HTTP cookies at the client-side on behalf of users. By automatically removing useless cookies, our approach helps a user to strike an appropriate balance between maximizing usability and minimizing security risks. to protect against phishing attacks, we have proposed an approach to transparently feed a relatively large number of bogus credentials into a suspected phishing site. Using those bogus credentials, our approach conceals victims\u27 real credentials and enables a legitimate website to identify stolen credentials in a timely manner. to identify insecure JavaScript practices, we have proposed an execution-based measurement approach and performed a large-scale measurement study. Our work sheds light on the insecure JavaScript practices and especially reveals the severity and nature of insecure JavaScript inclusion and dynamic generation practices on the Web. to achieve secure and convenient Web browsing on untrusted public computers, we have proposed a simple approach that enables an extended browser on a mobile device and a regular browser on a public computer to collaboratively support a Web session. A user can securely perform sensitive interactions on the mobile device and conveniently perform other browsing interactions on the public computer

    HYPERLINK NETWORK SYSTEM AND IMAGE OF GLOBAL CITIES: WEBPAGES AND THEIR CONTENTS

    Get PDF
    A distinctive trend of globalization research is a conceptual expansion that mirrors the penetration of globalization in various aspects of life. The World Wide Web has become the ultimate platform to create and disseminate information in this era of globalization. Although the importance of web-based information is widely acknowledged, the use of this information in global city research is not significant yet. Therefore, the purpose of this research is to extend the concept of globalization to the efficiency of information networks and the thematic dimensionality of the conveyed images from webpages. To this end, 264 global and globalizing cities are selected. The city hyperlink networks are constructed from the web crawling results of each city, and hyperlink network analysis measures the effectiveness of these hyperlink networks. The textual contents are also extracted from the crawled webpages, and the thematic dimensionality of the textual contents is measured by quantified content analysis and multidimensional scaling. The efficiency of the hyperlink network in information flow is confirmed to be a new consideration that shapes the globality of cities. The cities with high efficiency of connections have faster and easier access, which means better structure for city image formation. Specifically, social networking websites are the center of this information flow. This means that social interactions on the Web play a crucial role to form the images of cities. Apart from the positivity and the negativity of the city image, the dimensionality of cities on the thematic space denotes how they are expressed, discussed, and shared on the Web. The image status based on dimensions of globalization is an important starting point to city branding. It is concluded that a research framework handling information networks and images simultaneously deepens the understanding of how the structure and the contents on the Web affect the formation and maintenance of global city networks. Overall, this research demonstrates the usefulness of information networks and images of cities on the Web to overcome data inconsistency and scarcity in global city research

    BlogForever D2.4: Weblog spider prototype and associated methodology

    Get PDF
    The purpose of this document is to present the evaluation of different solutions for capturing blogs, established methodology and to describe the developed blog spider prototype
    • …
    corecore