Search CORE

2,557 research outputs found

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

Recommended from our members

Integration, management and communication of heterogeneous design resources with WWW technologies

Author: Ji S
Li J
Su D
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/02/2006
Field of study

Recently, advanced information technologies have opened new pos-sibilities for collaborative designs. In this paper, a Web-based collaborative de-sign environment is proposed, where heterogeneous design applications can be integrated with a common interface, managed dynamically for publishing and searching, and communicated with each other for integrated multi-objective de-sign. The CORBA (Common Object Request Broker Architecture) is employed as an implementation tool to enable integration and communication of design application programs; and the XML (eXtensible Markup Language) is used as a common data descriptive language for data exchange between heterogeneous applications and for resource description and recording. This paper also intro-duces the implementation of the system and the encapsulating issues of existing legacy applications. At last, an example of gear design based on the system is il-lustrated to identify the methods and procedure developed by this research

Nottingham Trent Institutional Repository (IRep)

JetWeb: A WWW Interface and Database for Monte Carlo Tuning and Validation

Author: Bromley
Corcella
J.M. Butterworth
Johnson
Matthews
S. Butterworth
Sjöstrand
Tomcat
Publication venue: 'Elsevier BV'
Publication date: 29/10/2002
Field of study

A World Wide Web interface to a Monte Carlo validation and tuning facility is described. The aim of the package is to allow rapid and reproducible comparisons to be made between detailed measurements at high-energy physics colliders and general physics simulation packages. The package includes a relational database, a Java servlet query and display facility, and clean interfaces to simulation packages and their parameters.Comment: See http://jetweb.hep.ucl.ac.uk for further informatio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Supporting text mining for e-Science: the challenges for Grid-enabled natural language processing

Author: Carroll John
Evans Roger
Klein Ewan
Publication venue
Publication date: 01/01/2005
Field of study

Over the last few years, language technology has moved rapidly from 'applied research' to 'engineering', and from small-scale to large-scale engineering. Applications such as advanced text mining systems are feasible, but very resource-intensive, while research seeking to address the underlying language processing questions faces very real practical and methodological limitations. The e-Science vision, and the creation of the e-Science Grid, promises the level of integrated large-scale technological support required to sustain this important and successful new technology area. In this paper, we discuss the foundations for the deployment of text mining and other language technology on the Grid - the protocols and tools required to build distributed large-scale language technology systems, meeting the needs of users, application builders and researchers

CiteSeerX

University of Brighton Research Portal

Sussex Research Online

Generating and visualizing a soccer knowledge base

Author: Buitelaar Paul
Cimiano Philipp
Eigner Thomas
Gulrajani Greg
Ladwig Günter
Mantel Matthias
Schutz Alexander
Siegel Melanie
Weber Nicolas
Zhu Honggang
Publication venue
Publication date: 01/01/2006
Field of study

This demo abstract describes the SmartWeb Ontology-based Information Extraction System (SOBIE). A key feature of SOBIE is that all information is extracted and stored with respect to the SmartWeb ontology. In this way, other components of the systems, which use the same ontology, can access this information in a straightforward way. We will show how information extracted by SOBIE is visualized within its original context, thus enhancing the browsing experience of the end user

Hochschulschriftenserver - Universität Frankfurt am Main

Preface of the Proceedings of WRAP 2004

Author: Thiran Philippe
Van den Heuvel Willem-Jan
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2004
Field of study

Repository of the University of Namur

Proceedings of the Workshop on the Wrapper Techniques for Legacy Systems

Author: Thiran Philippe
van den Heuvel Willem-Jan
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2004
Field of study

Repository of the University of Namur