22 research outputs found

    Automatic Optimization of Web Navigation Sequences

    Get PDF
    This version of the article has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-642-28795-4_15.[Abstract]: Web automation applications are widely used for different purposes such as B2B integration, automated testing of web applications or technology and business watch. In this work-in-progress paper we outline a set of techniques which constitute the basis to build a web navigation component able to analyze a web navigation sequence and automatically optimize it, detecting which parts of the loaded pages are needed, and which ones can be discarded in the following executions of the sequence. Our techniques build on the Document Object Model and the first tests executed with real web sources have found them to be very effective.This research was partially supported by the Spanish Ministry of Science and Innovation under project TIN2010-09988-E, and the European Commission under project FP7-SEC-2007-01 Proposal Nº 218223

    Підхід до розробки інформаційної системи для екстракції даних з веб

    Get PDF
    Today, the Internet contains a huge number of sources of information, which is constantly used in our daily lives. It often happens that similar in meaning information is presented in different forms on different resources (for example, electronic libraries, online stores, news sites and etc.). In this paper, we analyze the extraction of information from certain type of web sources that is required by the user. The analysis of the data extraction problem was carried out. When considering the main approaches to data extraction, the strengths and weaknesses of each were identified. The main aspects of the extraction of web knowledge were formulated. Approaches and information technologies for solving problems of syntactic analysis based on existing information systems are analyzed. Based on the analysis, the task of developing models and software components for extracting data from certain types of web resources were solving. A conceptual model of extracting data was developed taking into account web space as an external data source. A requirements specification for the software component was created, which will allow to continue working on the project and to clearly understand the requirements and constraints for implementation. During the process of modeling software, the following diagrams have been developed, such as activities, sequences and deployments, which will then be used to create the finished software application. For further development of the software, a programming platform and types of testing (load and modular) were defined. The obtained results allow to state that the proposed design solution, which will be implemented as a prototype of the software system, can perform the task of extracting data from different sources on the basis of a single semantic template.Сьогодні Інтернет містить величезну кількість джерел інформації, яка постійно використовується в нашому щоденному житті. Часто буває, що схожа за змістом інформація представлена в різній формі на різних ресурсах (наприклад, електронні бібліотеки, інтернет-магазини, новинні сайти). У даній роботі аналізується вилучення інформації з веб-джерел певного типу, яке потрібно користувачеві. Проведено аналіз проблеми вилучення даних. При розгляді основних підходів до екстракції даних були виділені сильні і слабкі сторони кожного. Сформульовано основні аспекти вилучення веб-знань. Проаналізовано підходи та інформаційні технології вирішення проблем синтаксичного аналізу на основі існуючих інформаційних систем. На основі проведеного аналізу була сформована задача розробки моделей і програмних компонентів для отримання даних з веб-ресурсів певного типу. Розроблено концептуальну модель вилучення даних з урахуванням веб-простору як зовнішнього джерела даних. Була створена специфікація вимог для програмного компонента, що дозволить продовжити роботу над проектом, щоб чітко розуміти вимоги і обмеження для реалізації. При моделюванні програмного забезпечення були розроблені наступні діаграми, такі як діаграми класів, активності, послідовності і розгортання, які потім будуть використовуватися для створення готового додатка. Для подальшої розробки програмного забезпечення була визначена платформа програмування і види тестування (навантажувальний і модульне). Отримані результати дозволяють стверджувати, що пропоноване проектне рішення, яке буде реалізовано у вигляді прототипу програмної системи, може виконувати завдання екстракції даних з різних джерел на основі одного семантичного шаблону

    Optimization Techniques to Speed Up the Page Loading in Custom Web Browsers

    Get PDF
    IEEE 12th International Conference on e-Business Engineering (ICEBE 2015), 23-25 October 2015. Beijing, China.[Absctract]: Web automation applications are widely used for different purposes such as B2B integration, web mashups, automated testing of web applications, Internet metasearch or technology and business watch. One crucial part in intensive web automation applications, that require real time responses, is for them to execute the navigation sequences in the shortest possible time. The approach of building the automatic web navigation component by using the APIs of conventional browsers, followed by most of the current systems, is not appropriate in that scenario, because it presents performance problems. Other approach consist in creating custom browsers specially designed for web automation, which can develop some improvements based in the peculiarities of the web automation tasks. In this work, we present a new set of techniques and algorithms that allow the parallel evaluation of the scripting code when a custom browser loads a web page. We also outline the components that should be included in the custom browser architecture to implement these techniques. The tests executed with real web sources, to evaluate the validity of our proposal, show that a custom web browser loads the web pages faster when the scripts are executed in parallel using the designed techniques

    A Workflow-Based Approach for Creating Complex Web Wrappers

    Get PDF
    This version of the article has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-540-85481-4_30.[Abstract]: In order to let software programs access and use the information and services provided by web sources, wrapper programs must be built to provide a “machine-readable” view over them. Although research literature on web wrappers is vast, the problem of how to specify the internal logic of complex wrappers in a graphical and simple way remains mainly ignored. In this paper, we propose a new language for addressing this task. Our approach leverages on the existing work on intelligent web data extraction and automatic web navigation as building blocks, and uses a workflow-based approach to specify the wrapper control logic. The features included in the language have been decided from the results of a study of a wide range of real web automation applications from different business areas. In this paper, we also present the most salient results of the study.This research was partially supported by the Spanish Ministry of Education and Science under project TSI2005-07730. Alberto Pan’s work was partially supported by the “Ramón y Cajal” programme of the Spanish Ministry of Education and Scienc

    Smart bookmarks : automatic retroactive macro recording on the web

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 82-83).New technology has made the Web more dynamic and personalized, while at the same time interaction with the Web has become more complicated and involved. This thesis presents Smart Bookmarks, a web automation system that allows users to automate complex tasks or easily return to otherwise hard-to-reach dynamic web pages by creating smart bookmarks. A smart bookmark consists of an automatically generated script of recorded browsing commands that returns to a particular web page or web application state. Smart bookmarks can be created retroactively, meaning that the user does not need to explicitly initiate recording before performing a task, but can instead request a bookmark after visiting the destination page; the appropriate sequence of commands need to return to a page is selected automatically from a history of the user's browsing interactions. Smart Bookmarks provides a rich, visual representation of recorded bookmarks in order to clearly illustrate the actions that a bookmark performs, and includes textual descriptions, screenshots, and animated previews of each command. Finally, the system allows users to easily and intuitively edit bookmarks after they have been created, and to share smart bookmarks with other users.by Darris Hupp.M.Eng

    A Custom Browser Architecture to Execute Web Navigation Sequences

    Get PDF
    This version of the article has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-319-26187-4_11.[Abstract]: Web automation applications are widely used for different purposes such as B2B integration and automated testing of web applications. Most current systems build the automatic web navigation component by using the APIs of conventional browsers. This approach suffers performance problems for intensive web automation tasks which require real time responses and/or a high degree of parallelism. Other systems use the approach of creating custom browsers to avoid some of the tasks of conventional browsers, but they work like them, when building the internal representation of the web pages. In this paper, we present a complete architecture for a custom browser able to efficiently execute web navigation sequences. The proposed architecture supports some novel automatic optimization techniques that can be applied when loading and building the internal representation of the pages. The tests performed using real web sources show that the reference implementation of the proposed architecture runs significantly faster than other navigation components

    Integrating Deep-Web Information Sources

    Get PDF
    Deep-web information sources are difficult to integrate into automated business processes if they only provide a search form. A wrapping agent is a piece of software that allows a developer to query such information sources without worrying about the details of interacting with such forms. Our goal is to help soft ware engineers construct wrapping agents that interpret queries written in high-level structured languages. We think that this shall definitely help reduce integration costs because this shall relieve developers from the burden of transforming their queries into low-level interactions in an ad-hoc manner. In this paper, we report on our reference framework, delve into the related work, and highlight current research challenges. This is intended to help guide future research efforts in this area.Ministerio de Educación y Ciencia TIN2007-64119Junta de Andalucía P07-TIC-2602Junta de Andalucía P08-TIC-4100Ministerio de Ciencia e Innovación TIN2008-04718-

    End-user programming for the Web

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (leaves 104-106).On the desktop, an application can specify its user interface down to the last pixel, but on the World Wide Web, a content provider has little control over how the client will view the page once it has been delivered to the browser. This creates an opportunity for end-users who want to automate and customize their web experiences, but the growing complexity of web pages and standards prevents most users from realizing this opportunity. This thesis describes a programming system named Chickenfoot that enables end-users to automate, customize, and integrate web applications without examining their source code. It accomplishes this by embedding a programming environment directly into the Firefox web browser, where end-users can interactively develop programs that manipulate the interfaces of web pages. The design and implementation of the system's language are described, as well as the results of a user study that influenced the design. A range of applications built using Chickenfoot are also presented.by Michael Bolin.M.Eng

    An Architecture for Efficient Web Crawling

    Get PDF
    Virtual Integration systems require a crawling tool able to navigate and reach relevant pages in the Deep Web in an efficient way. Existing proposals in the crawling area fulfill some of these requirements, but most of them need to download pages in order to classify them as relevant or not. We propose a crawler supported by a web page classifier that uses solely a page URL to determine page relevance. Such a crawler is able to choose in each step only the URLs that lead to relevant pages, and therefore reduces the number of unnecessary pages downloaded, minimising bandwidth and making it efficient and suitable for virtual integration systems.Ministerio de Educación y Ciencia TIN2007-64119Junta de Andalucía P07-TIC-2602Junta de Andalucía P08- TIC-4100Ministerio de Ciencia e Innovación TIN2008-04718-EMinisterio de Ciencia e Innovación TIN2010-21744Ministerio de Economía, Industria y Competitividad TIN2010-09809-EMinisterio de Ciencia e Innovación TIN2010-10811-EMinisterio de Ciencia e Innovación TIN2010-09988-
    corecore