Search CORE

106,119 research outputs found

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

A Mining Algorithm for Extracting Decision Process Data Models

Author: Cristina-Claudia DOLEAN
Razvan PETRUSEL
Publication venue
Publication date
Field of study

The paper introduces an algorithm that mines logs of user interaction with simulation software. It outputs a model that explicitly shows the data perspective of the decision process, namely the Decision Data Model (DDM). In the first part of the paper we focus on how the DDM is extracted by our mining algorithm. We introduce it as pseudo-code and, then, provide explanations and examples of how it actually works. In the second part of the paper, we use a series of small case studies to prove the robustness of the mining algorithm and how it deals with the most common patterns we found in real logs.Decision Process Data Model, Decision Process Mining, Decision Mining Algorithm

Research Papers in Economics

Construction of a taxonomy for requirements engineering commercial-off-the-shelf components

Author: Ayala Martínez Claudia Patricia
Botella López Pere
Franch Gutiérrez Javier
Publication venue
Publication date: 01/01/2005
Field of study

This article presents a procedure for constructing a taxonomy of COTS products in the field of Requirements Engineering (RE). The taxonomy and the obtained information reach transcendental benefits to the selection of systems and tools that aid to RE-related actors to simplify and facilitate their work. This taxonomy is performed by means of a goal-oriented methodology inspired in GBRAM (Goal-Based Requirements Analysis Method), called GBTCM (Goal-Based Taxonomy Construction Method), that provides a guide to analyze sources of information and modeling requirements and domains, as well as gathering and organizing the knowledge in any segment of the COTS market. GBTCM claims to promote the use of standards and the reuse of requirements in order to support different processes of selection and integration of components.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

The Locus Algorithm II: A robust software system to maximise the quality of fields of view for Differential Photometry

Author: Creaner Oisín
Hickey Eugene
Nolan Kevin
Publication venue: eScholarship, University of California
Publication date: 10/03/2020
Field of study

We present the software system developed to implement the Locus Algorithm, a novel algorithm designed to maximise the performance of differential photometry systems by optimising the number and quality of reference stars in the Field of View with the target. Firstly, we state the design requirements, constraints and ambitions for the software system required to implement this algorithm. Then, a detailed software design is presented for the system in operation. Next, the data design including file structures used and the data environment required for the system are defined. Finally, we conclude by illustrating the scaling requirements which mandate a high-performance computing implementation of this system, which is discussed in the other papers in this series

arXiv.org e-Print Archive

eScholarship - University of California

Using visualization for visualization : an ecological interface design approach to inputting data

Author: Mathers C.
Walton J. P. R. B.
Wright H.
Publication venue: 'Elsevier BV'
Publication date: 09/02/2013
Field of study

Visualization is experiencing growing use by a diverse community, with continuing improvements in the availability and usability of systems. In spite of these developments the problem of how first to get the data in has received scant attention: the established approach of pre-defined readers and programming aids has changed little in the last two decades. This paper proposes a novel way of inputting data for scientific visualization that employs rapid interaction and visual feedback in order to understand how the data is stored. The approach draws on ideas from the discipline of ecological interface design to extract and control important parameters describing the data, at the same time harnessing our innate human ability to recognize patterns. Crucially, the emphasis is on file format discovery rather than file format description, so the method can therefore still work when nothing is known initially of how the file was originally written, as is often the case with legacy binary data. © 2013 Elsevier Ltd

Repository@Hull - Worktribe