Search CORE

3,274 research outputs found

A Brief History of Web Crawlers

Author: Bochmann Gregor V.
Dinçktürk Mustafa Emre
Hooshmand Salman
Jourdan Guy-Vincent
Mirtaheri Seyed M.
Onut Iosif Viorel
Publication venue
Publication date: 04/05/2014
Field of study

Web crawlers visit internet applications, collect data, and learn about new web pages from visited pages. Web crawlers have a long and interesting history. Early web crawlers collected statistics about the web. In addition to collecting statistics about the web and indexing the applications for search engines, modern crawlers can be used to perform accessibility and vulnerability checks on the application. Quick expansion of the web, and the complexity added to web applications have made the process of crawling a very challenging one. Throughout the history of web crawling many researchers and industrial groups addressed different issues and challenges that web crawlers face. Different solutions have been proposed to reduce the time and cost of crawling. Performing an exhaustive crawl is a challenging question. Additionally capturing the model of a modern web application and extracting data from it automatically is another open question. What follows is a brief history of different technique and algorithms used from the early days of crawling up to the recent days. We introduce criteria to evaluate the relative performance of web crawlers. Based on these criteria we plot the evolution of web crawlers and compare their performanc

arXiv.org e-Print Archive

CiteSeerX

Reverse Engineering and Testing of Rich Internet Applications

Author: Amalfitano Domenico
Publication venue
Publication date: 30/11/2011
Field of study

The World Wide Web experiences a continuous and constant evolution, where new initiatives, standards, approaches and technologies are continuously proposed for developing more effective and higher quality Web applications. To satisfy the growing request of the market for Web applications, new technologies, frameworks, tools and environments that allow to develop Web and mobile applications with the least effort and in very short time have been introduced in the last years. These new technologies have made possible the dawn of a new generation of Web applications, named Rich Internet Applications (RIAs), that offer greater usability and interactivity than traditional ones. This evolution has been accompanied by some drawbacks that are mostly due to the lack of applying well-known software engineering practices and approaches. As a consequence, new research questions and challenges have emerged in the field of web and mobile applications maintenance and testing. The research activity described in this thesis has addressed some of these topics with the specific aim of proposing new and effective solutions to the problems of modelling, reverse engineering, comprehending, re-documenting and testing existing RIAs. Due to the growing relevance of mobile applications in the renewed Web scenarios, the problem of testing mobile applications developed for the Android operating system has been addressed too, in an attempt of exploring and proposing new techniques of testing automation for these type of applications

Università degli Studi di Napoli Federico Il Open Archive

A Practical Blended Analysis for Dynamic Features in JavaScript

Author: Ryder Barbara G.
Wei Shiyi
Publication venue
Publication date: 01/01/2012
Field of study

The JavaScript Blended Analysis Framework is designed to perform a general-purpose, practical combined static/dynamic analysis of JavaScript programs, while handling dynamic features such as run-time generated code and variadic func- tions. The idea of blended analysis is to focus static anal- ysis on a dynamic calling structure collected at runtime in a lightweight manner, and to rene the static analysis us- ing additional dynamic information. We perform blended points-to analysis of JavaScript with our framework and compare results with those computed by a pure static points- to analysis. Using JavaScript codes from actual webpages as benchmarks, we show that optimized blended analysis for JavaScript obtains good coverage (86.6% on average per website) of the pure static analysis solution and nds ad- ditional points-to pairs (7.0% on average per website) con- tributed by dynamically generated/loaded code

Computer Science Technical Reports @Virginia Tech

CiteSeerX

Recommended from our members

Righting Web Development

Author: Vilk John
Publication venue: ScholarWorks@UMass Amherst
Publication date: 25/10/2018
Field of study

The web browser is the most important application runtime today, encompassing all types of applications on practically every Internet-connected device. Browsers power complete office suites, media players, games, and augmented and virtual reality experiences, and they integrate with cameras, microphones, GPSes, and other sensors available on computing devices. Many apparently native mobile and desktop applications are secretly hybrid apps that contain a mix of native and browser code. History has shown that when new devices, sensors, and experiences appear on the market, the browser will evolve to support them. Despite the browser\u27s importance, developing web applications is exceedingly difficult. Web browsers organically evolved from a document viewer into a ubiquitous program runtime. The browser\u27s scripting language for web designers, JavaScript, has grown into the only universally supported programming language in the browser. Unfortunately, JavaScript is notoriously difficult to write and debug. The browser\u27s high-level and event-driven I/O interfaces make it easy to add simple interactions to webpages, but these same interfaces lead to nondeterministic bugs and performance issues in larger applications. These bugs are challenging for developers to reason about and fix. This dissertation revisits web development and provides developers with a complete set of development tools with full support for the browser environment. McFly is the first time-traveling debugger for the browser, and lets developers debug web applications and their visual state during time-travel; components of this work shipped in Microsoft\u27s ChakraCore JavaScript engine. BLeak is the first system for automatically debugging memory leaks in web applications, and provides developers with a ranked list of memory leaks along with the source code responsible for them. BCause constructs a causal graph of a web application\u27s events, which helps developers understand their code\u27s behavior. Doppio lets developers run code written in conventional languages in the browser, and Browsix brings Unix into the browser to enable unmodified programs expecting a Unix-like environment to run directly in the browser. Together, these five systems form a solid foundation for web development

ScholarWorks@UMass Amherst

MT-WAVE: Profiling multi-tier web applications

Author: Arkles Anthony
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

The web is evolving: what was once primarily used for sharing static content has now evolved into a platform for rich client-side applications. These applications do not run exclusively on the client; while the client is responsible for presentation and some processing, there is a significant amount of processing and persistence that happens server-side. This has advantages and disadvantages. The biggest advantage is that the user’s data is accessible from anywhere. It doesn’t matter which device you sign into a web application from, everything you’ve been working on is instantly accessible. The largest disadvantage is that large numbers of servers are required to support a growing user base; unlike traditional client applications, an organization making a web application needs to provision compute and storage resources for each expected user. This infrastructure is designed in tiers that are responsible for different aspects of the application, and these tiers may not even be run by the same organization. As these systems grow in complexity, it becomes progressively more challenging to identify and solve performance problems. While there are many measures of software system performance, web application users only care about response latency. This “fingertip-to-eyeball performance” is the only metric that users directly perceive: when a button is clicked in a web application, how long does it take for the desired action to complete? MT-WAVE is a system for solving fingertip-to-eyeball performance problems in web applications. The system is designed for doing multi-tier tracing: each piece of the application is instrumented, execution traces are collected, and the system merges these traces into a single coherent snapshot of system latency at every tier. To ensure that user-perceived latency is accurately captured, the tracing begins in the web browser. The application developer then uses the MT-WAVE Visualization System to explore the execution traces to first identify which system is causing the largest amount of latency, and then zooms in on the specific function calls in that tier to find optimization candidates. After fixing an identified problem, the system is used to verify that the changes had the intended effect. This optimization methodology and toolset is explained through a series of case studies that identify and solve performance problems in open-source and commercial applications. These case studies demonstrate both the utility of the MT-WAVE system and the unintuitive nature of system optimization

eCommons@USASK

University of Saskatchewan Research Archive

Test Generation and Dependency Analysis for Web Applications

Author: Biagiola Matteo
Publication venue: Universit\ue0 degli studi di Genova
Publication date: 15/01/2020
Field of study

In web application testing existing model based web test generators derive test paths from a navigation model of the web application, completed with either manually or randomly generated inputs. Test paths extraction and input generation are handled separately, ignoring the fact that generating inputs for test paths is difficult or even impossible if such paths are infeasible. In this thesis, we propose three directions to mitigate the path infeasibility problem. The first direction uses a search based approach defining novel set of genetic operators that support the joint generation of test inputs and feasible test paths. Results show that such search based approach can achieve higher level of model coverage than existing approaches. Secondly, we propose a novel web test generation algorithm that pre-selects the most promising candidate test cases based on their diversity from previously generated tests. Results of our empirical evaluation show that promoting diversity is beneficial not only to a thorough exploration of the web application behaviours, but also to the feasibility of automatically generated test cases. Moreover, the diversity based approach achieves higher coverage of the navigation model significantly faster than crawling based and search based approaches. The third approach we propose uses a web crawler as a test generator. As such, the generated tests are concrete, hence their navigations among the web application states are feasible by construction. However, the crawling trace cannot be easily turned into a minimal test suite that achieves the same coverage due to test dependencies. Indeed, test dependencies are undesirable in the context of regression testing, preventing the adoption of testing optimization techniques that assume tests to be independent. In this thesis, we propose the first approach to detect test dependencies in a given web test suite by leveraging the information available both in the web test code and on the client side of the web application. Results of our empirical validation show that our approach can effectively and efficiently detect test dependencies and it enables dependency aware formulations of test parallelization and test minimization

Archivio istituzionale della ricerca - Università di Genova

Semantics of RxJS

Author: Li Yonglun
Zhao Tian
Publication venue: UWM Digital Commons
Publication date: 11/11/2022
Field of study

RxJS is a popular JavaScript library for reactive programming in Web applications. It provides numerous operators to create, combine, transform, and filter discrete events and to handle errors. These operators may be stateful and have side effects, which makes it difficult to understand the precise meaning of the resulting computation. In this paper, we define a formal model for RxJS programs by formalizing a selected subset of RxJS operators using a small-step operational semantics. We present several debugging related applications using the semantics as a model. We also implemented a subset of RxJS based on this semantics, which provides convenient access to the runtime representation of the RxJS program to help debugging

University of Wisconsin-Milwaukee

An Extensible User Interface for Lean 4

Author: Ayers Edward W.
Ebner Gabriel
Nawrocki Wojciech
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 14th International Conference on Interactive Theorem Proving (ITP 2023)
Publication date: 01/01/2023
Field of study

Contemporary proof assistants rely on complex automation and process libraries with millions of lines of code. At these scales, understanding the emergent interactions between components can be a serious challenge. One way of managing complexity, long established in informal practice, is through varying external representations. For instance, algebraic notation facilitates term-based reasoning whereas geometric diagrams invoke spatial intuition. Objects viewed one way become much simpler than when viewed differently. In contrast, modern general-purpose ITP systems usually only support limited, textual representations. Treating this as a problem of human-computer interaction, we aim to demonstrate that presentations - UI elements that store references to the objects they are displaying - are a fruitful way of thinking about ITP interface design. They allow us to make headway on two fronts - introspection of prover internals and support for diagrammatic reasoning. To this end we have built an extensible user interface for the Lean 4 prover with an associated ProofWidgets 4 library of presentation-based UI components. We demonstrate the system with several examples including type information popups, structured traces, contextual suggestions, a display for algebraic reasoning, and visualizations of red-black trees. Our interface is already part of the core Lean distribution

Dagstuhl Research Online Publication Server