Search CORE

844 research outputs found

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

XML Document Adaptation Queries (XDAQ)

Author: F. Bry
F. Bry
K. Cheverst
P. Brusilovsky
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Adaptive web applications combine data retrieval on the web with reasoning so as to generate context dependent contents. The data is retrieved either as content or as context specifications. Content data is, for example, fragments of a textbook or e-commerce catalogue, whereas context data is, for example, a user model or a device profile. Current adaptive web applications are often implemented using ad hoc and heterogeneous techniques. This paper describes a novel approach called ”XML Document Adaptation Queries (XDAQ)” requiring less heterogeneous software components. The approach is based on using a web query language for data retrieval (content as well as context) and on a novel generic formalism to express adaptation. The approach is generic in the sense that it is applicable with all web query and transformation languages, for example with XQuery and XSLT

CiteSeerX

Crossref

Open Access LMU

Perspectives for Electronic Books in the World Wide Web Age

Author: Bry François
Kraus Michael
Publication venue: 'Emerald'
Publication date: 01/01/2002
Field of study

While the World Wide Web (WWW or Web) is steadily expanding, electronic books (e-books) remain a niche market. In this article, it is first postulated that specialized contents and device independence can make Web-based e-books compete with paper prints; and that adaptive features that can be implemented by client-side computing are relevant for e-books, while more complex forms of adaptation requiring server-side computations are not. Then, enhancements of the WWW standards (specifically of XML, XHTML, of the style-sheet languages CSS and XSL, and of the linking language XLink) are proposed for a better support of client-side adaptation and device independent content modeling. Finally, advanced browsing functionalities desirable for e-books as well as their implementation in the WWW context are described

Crossref

Open Access LMU

Ontology driven Websites - Metamorphosis: a framework to specify and manage ontology driven websites

Author: Henriques Pedro Rangel
Librelotto Giovani Rubert
Ramalho José Carlos
Publication venue: Universidade do Minho. Departamento de Sistemas de Informação (DSI)
Publication date: 01/01/2003
Field of study

Website development has always been an hard task: it consumes time and resources. What is new today is normally taken as granted tomorrow by users. This is to say that users always want more. Today they want up to date information and they want to access it according to their point of view or particular preferences.To cope with these demands, websites must be dynamic and must be able to reconfigure automatically their structure, content and appearance. This scenery has favored the creation of tools for automatic generation and management websites. In this paper we propose not a new tool of this kind but a new approach to the problem. In our approach we consider two layers. A physical layer that we call the resources layer, composed by databases, XML documents, directory subtrees, and the whole sort of files you can think of to represent your information. A metadata layer called the ontology layer, that provides a view to those resources.Our framework consists of several parts. In this paper the focus will be the navigation component.This component takes an ontology and uses it to navigate through the resources layer.We are using XML technology to implement the whole framework and this component is implemented through an XML transformation process

Universidade do Minho: RepositoriUM

Transforming XML Documents using fxt

Author: Alexandru Berlea Helmut
Helmut Seidl
Ru Berlea
Publication venue: 'University of Zagreb - University Computing Centre'
Publication date: 01/01/2002
Field of study

As XML spreads to various application domains, transformation tasks on XML documents are accomplished by an ever increasing number of non-programmers. In this respect, rather than providing just a collection of basic operations via a library in a special purpose language, it is useful to provide a more intuitive, rule-based approach to XML transformation. The rule-based approach requires pattern-matching for identifying parts of the document to be processed. As XML document processing is basically a subarea of tree processing for which the functional programming style is very natural, we choose SML as implementation language. The functional style implies a processing model in which navigation is possible only to subtrees of a tree. This restriction can be compensated by using a tree pattern-matcher able to relate to ancestors, successors, as well as to siblings of a match. On top of the powerful fxgrep XML pattern-matcher, we build fxt, a transformation tool for XML documents. The functional processing model that fxt uses, allows an implementation more efficient than implementations permitted by the processing model of the popular XSLT, where navigation in the input tree can proceed in arbitrary directions. Usual transformations are specified in fxt in an intuitive, declarative way. More elaborate transformations can be flexibly achieved by the hooks provided to the full functionality of the SML programming language, as well as by the fxt’s variable mechanism

CiteSeerX

Crossref

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Schem@Doc: a web-based XML schema visualizer

Author: Leal José Paulo
Queirós Ricardo
Publication venue: Faculdade de Ciencias da Universidade de Lisboa
Publication date: 01/01/2009
Field of study

XML Schema is one of the most used specifications for defining types of XML documents. It provides an extensive set of primitive data types, ways to extend and reuse definitions and an XML syntax that simplifies automatic manipulation. However, many features that make XML Schema Definitions (XSD) so interesting also make them rather cumbersome to read. Several tools to visualize and browse schema definitions have been proposed to cope with this issue. The novel approach proposed in this paper is to base XSD visualization and navigation on the XML document itself, using solely the web browser, without requiring a pre-processing step or an intermediate representation. We present the design and implementation of a web-based XML Schema browser called schem@Doc that operates over the XSD file itself. With this approach, XSD visualization is synchronized with the source file and always reflects its current state. This tool fits well in the schema development process and is easy to integrate in web repositories containing large numbers of XSD files.European Commisssio

Repositório Científico do Instituto Politécnico do Porto

Hypermedia Learning Objects System - On the Way to a Semantic Educational Web

Author: Engelhardt Michael
Kárpáti Andreas
Rack Torsten
Schmidt Ivette
Schmidt Thomas C.
Publication venue
Publication date: 01/01/2003
Field of study

While eLearning systems become more and more popular in daily education, available applications lack opportunities to structure, annotate and manage their contents in a high-level fashion. General efforts to improve these deficits are taken by initiatives to define rich meta data sets and a semanticWeb layer. In the present paper we introduce Hylos, an online learning system. Hylos is based on a cellular eLearning Object (ELO) information model encapsulating meta data conforming to the LOM standard. Content management is provisioned on this semantic meta data level and allows for variable, dynamically adaptable access structures. Context aware multifunctional links permit a systematic navigation depending on the learners and didactic needs, thereby exploring the capabilities of the semantic web. Hylos is built upon the more general Multimedia Information Repository (MIR) and the MIR adaptive context linking environment (MIRaCLE), its linking extension. MIR is an open system supporting the standards XML, Corba and JNDI. Hylos benefits from manageable information structures, sophisticated access logic and high-level authoring tools like the ELO editor responsible for the semi-manual creation of meta data and WYSIWYG like content editing.Comment: 11 pages, 7 figure

arXiv.org e-Print Archive

REPOSIT