3,274 research outputs found
A Brief History of Web Crawlers
Web crawlers visit internet applications, collect data, and learn about new
web pages from visited pages. Web crawlers have a long and interesting history.
Early web crawlers collected statistics about the web. In addition to
collecting statistics about the web and indexing the applications for search
engines, modern crawlers can be used to perform accessibility and vulnerability
checks on the application. Quick expansion of the web, and the complexity added
to web applications have made the process of crawling a very challenging one.
Throughout the history of web crawling many researchers and industrial groups
addressed different issues and challenges that web crawlers face. Different
solutions have been proposed to reduce the time and cost of crawling.
Performing an exhaustive crawl is a challenging question. Additionally
capturing the model of a modern web application and extracting data from it
automatically is another open question. What follows is a brief history of
different technique and algorithms used from the early days of crawling up to
the recent days. We introduce criteria to evaluate the relative performance of
web crawlers. Based on these criteria we plot the evolution of web crawlers and
compare their performanc
Reverse Engineering and Testing of Rich Internet Applications
The World Wide Web experiences a continuous and constant evolution, where new initiatives, standards, approaches and technologies are continuously proposed for developing more effective and higher quality Web applications.
To satisfy the growing request of the market for Web applications, new technologies, frameworks, tools and environments that allow to develop Web and mobile applications with the least effort and in very short time have been introduced in the last years.
These new technologies have made possible the dawn of a new generation of Web applications, named Rich Internet Applications (RIAs), that offer greater usability and interactivity than traditional ones. This evolution has been accompanied by some drawbacks that are mostly due to the lack of applying well-known software engineering practices and approaches. As a consequence, new research questions and challenges have emerged in the field of web and mobile applications maintenance and testing.
The research activity described in this thesis has addressed some of these topics with the specific aim of proposing new and effective solutions to the problems of modelling, reverse engineering, comprehending, re-documenting and testing existing RIAs.
Due to the growing relevance of mobile applications in the renewed Web scenarios, the problem of testing mobile applications developed for the Android operating system has been addressed too, in an attempt of exploring and proposing new techniques of testing automation for these type of applications
A Practical Blended Analysis for Dynamic Features in JavaScript
The JavaScript Blended Analysis Framework is designed to
perform a general-purpose, practical combined static/dynamic
analysis of JavaScript programs, while handling dynamic
features such as run-time generated code and variadic func-
tions. The idea of blended analysis is to focus static anal-
ysis on a dynamic calling structure collected at runtime in
a lightweight manner, and to rene the static analysis us-
ing additional dynamic information. We perform blended
points-to analysis of JavaScript with our framework and
compare results with those computed by a pure static points-
to analysis. Using JavaScript codes from actual webpages
as benchmarks, we show that optimized blended analysis
for JavaScript obtains good coverage (86.6% on average per
website) of the pure static analysis solution and nds ad-
ditional points-to pairs (7.0% on average per website) con-
tributed by dynamically generated/loaded code
Recommended from our members
Righting Web Development
The web browser is the most important application runtime today, encompassing all types of applications on practically every Internet-connected device. Browsers power complete office suites, media players, games, and augmented and virtual reality experiences, and they integrate with cameras, microphones, GPSes, and other sensors available on computing devices. Many apparently native mobile and desktop applications are secretly hybrid apps that contain a mix of native and browser code. History has shown that when new devices, sensors, and experiences appear on the market, the browser will evolve to support them.
Despite the browser\u27s importance, developing web applications is exceedingly difficult. Web browsers organically evolved from a document viewer into a ubiquitous program runtime. The browser\u27s scripting language for web designers, JavaScript, has grown into the only universally supported programming language in the browser. Unfortunately, JavaScript is notoriously difficult to write and debug. The browser\u27s high-level and event-driven I/O interfaces make it easy to add simple interactions to webpages, but these same interfaces lead to nondeterministic bugs and performance issues in larger applications. These bugs are challenging for developers to reason about and fix.
This dissertation revisits web development and provides developers with a complete set of development tools with full support for the browser environment. McFly is the first time-traveling debugger for the browser, and lets developers debug web applications and their visual state during time-travel; components of this work shipped in Microsoft\u27s ChakraCore JavaScript engine. BLeak is the first system for automatically debugging memory leaks in web applications, and provides developers with a ranked list of memory leaks along with the source code responsible for them. BCause constructs a causal graph of a web application\u27s events, which helps developers understand their code\u27s behavior. Doppio lets developers run code written in conventional languages in the browser, and Browsix brings Unix into the browser to enable unmodified programs expecting a Unix-like environment to run directly in the browser. Together, these five systems form a solid foundation for web development
MT-WAVE: Profiling multi-tier web applications
The web is evolving: what was once primarily used for sharing static content has now evolved into a platform
for rich client-side applications. These applications do not run exclusively on the client; while the client is
responsible for presentation and some processing, there is a significant amount of processing and persistence
that happens server-side. This has advantages and disadvantages. The biggest advantage is that the userâs
data is accessible from anywhere. It doesnât matter which device you sign into a web application from,
everything youâve been working on is instantly accessible. The largest disadvantage is that large numbers
of servers are required to support a growing user base; unlike traditional client applications, an organization
making a web application needs to provision compute and storage resources for each expected user. This
infrastructure is designed in tiers that are responsible for different aspects of the application, and these tiers
may not even be run by the same organization.
As these systems grow in complexity, it becomes progressively more challenging to identify and solve
performance problems. While there are many measures of software system performance, web application
users only care about response latency. This âfingertip-to-eyeball performanceâ is the only metric that users
directly perceive: when a button is clicked in a web application, how long does it take for the desired action
to complete?
MT-WAVE is a system for solving fingertip-to-eyeball performance problems in web applications. The
system is designed for doing multi-tier tracing: each piece of the application is instrumented, execution
traces are collected, and the system merges these traces into a single coherent snapshot of system latency
at every tier. To ensure that user-perceived latency is accurately captured, the tracing begins in the web
browser. The application developer then uses the MT-WAVE Visualization System to explore the execution
traces to first identify which system is causing the largest amount of latency, and then zooms in on the
specific function calls in that tier to find optimization candidates. After fixing an identified problem, the
system is used to verify that the changes had the intended effect.
This optimization methodology and toolset is explained through a series of case studies that identify and
solve performance problems in open-source and commercial applications. These case studies demonstrate
both the utility of the MT-WAVE system and the unintuitive nature of system optimization
Test Generation and Dependency Analysis for Web Applications
In web application testing existing model based web test generators derive test paths from a navigation model of the web application, completed with either manually or randomly generated inputs. Test paths extraction and input generation are handled separately, ignoring the fact that generating inputs for test paths is difficult or even impossible if such paths are infeasible. In this thesis, we propose three directions to mitigate the path infeasibility problem. The first direction uses a search based approach defining novel set of genetic operators that support the joint generation of test inputs and feasible test paths. Results show that such search based approach can achieve higher level of model coverage than existing approaches. Secondly, we propose a novel web test generation algorithm that pre-selects the most promising candidate test cases based on their diversity from previously generated tests. Results of our empirical evaluation show that promoting diversity is beneficial not only to a thorough exploration of the web application behaviours, but also to the feasibility of automatically generated test cases. Moreover, the diversity based approach achieves higher coverage of the navigation model significantly faster than crawling based and search based approaches. The third approach we propose uses a web crawler as a test generator. As such, the generated tests are concrete, hence their navigations among the web application states are feasible by construction. However, the crawling trace cannot be easily turned into a minimal test suite that achieves the same coverage due to test dependencies. Indeed, test dependencies are undesirable in the context of regression testing, preventing the adoption of testing optimization techniques that assume tests to be independent. In this thesis, we propose the first approach to detect test dependencies in a given web test suite by leveraging the information available both in the web test code and on the client side of the web application. Results of our empirical validation show that our approach can effectively and efficiently detect test dependencies and it enables dependency aware formulations of test parallelization and test minimization
Semantics of RxJS
RxJS is a popular JavaScript library for reactive programming in Web applications. It provides numerous operators to create, combine, transform, and filter discrete events and to handle errors. These operators may be stateful and have side effects, which makes it difficult to understand the precise meaning of the resulting computation. In this paper, we define a formal model for RxJS programs by formalizing a selected subset of RxJS operators using a small-step operational semantics. We present several debugging related applications using the semantics as a model. We also implemented a subset of RxJS based on this semantics, which provides convenient access to the runtime representation of the RxJS program to help debugging
An Extensible User Interface for Lean 4
Contemporary proof assistants rely on complex automation and process libraries with millions of lines of code. At these scales, understanding the emergent interactions between components can be a serious challenge. One way of managing complexity, long established in informal practice, is through varying external representations. For instance, algebraic notation facilitates term-based reasoning whereas geometric diagrams invoke spatial intuition. Objects viewed one way become much simpler than when viewed differently. In contrast, modern general-purpose ITP systems usually only support limited, textual representations. Treating this as a problem of human-computer interaction, we aim to demonstrate that presentations - UI elements that store references to the objects they are displaying - are a fruitful way of thinking about ITP interface design. They allow us to make headway on two fronts - introspection of prover internals and support for diagrammatic reasoning. To this end we have built an extensible user interface for the Lean 4 prover with an associated ProofWidgets 4 library of presentation-based UI components. We demonstrate the system with several examples including type information popups, structured traces, contextual suggestions, a display for algebraic reasoning, and visualizations of red-black trees. Our interface is already part of the core Lean distribution
- âŠ