844 research outputs found
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
XML Document Adaptation Queries (XDAQ)
Adaptive web applications combine data retrieval on the web with reasoning so as to generate context dependent contents. The data is retrieved either as content or as context specifications. Content data is, for example, fragments of a textbook or e-commerce catalogue, whereas context data is, for example, a user model or a device profile. Current adaptive web applications are often implemented using ad hoc and heterogeneous techniques. This paper describes a novel approach called ”XML Document Adaptation Queries (XDAQ)” requiring less heterogeneous software components. The approach is based on using a web query language for data retrieval (content as well as context) and on a novel generic formalism to express adaptation. The approach is generic in the sense that it is applicable with all web query and transformation languages, for example with XQuery and XSLT
Perspectives for Electronic Books in the World Wide Web Age
While the World Wide Web (WWW or Web) is steadily expanding, electronic books (e-books) remain a niche market. In this article, it is first postulated that specialized contents and device independence can make Web-based e-books compete with paper prints; and that adaptive features that can be implemented by client-side computing are relevant for e-books, while more complex forms of adaptation requiring server-side computations are not. Then, enhancements of the WWW standards (specifically of XML, XHTML, of the style-sheet languages CSS and XSL, and of the linking language XLink) are proposed for a better support of client-side adaptation and device independent content modeling. Finally, advanced browsing functionalities desirable for e-books as well as their implementation in the WWW context are described
Ontology driven Websites - Metamorphosis: a framework to specify and manage ontology driven websites
Website development has always been an hard task: it consumes time and resources. What is new today is normally taken as granted tomorrow by users. This is to say that users always want more. Today they want up to date information and they want to access it according to their point of view or particular preferences.To cope with these demands, websites must be dynamic and must be able to reconfigure automatically their structure, content and appearance. This scenery has favored the creation of tools for automatic generation and management websites.
In this paper we propose not a new tool of this kind but a new approach to the problem. In our approach we consider two layers. A physical layer that we call the resources layer, composed by databases, XML documents, directory subtrees, and the whole sort of files you can think of to represent your information. A metadata layer called the ontology layer, that provides a view to those resources.Our framework consists of several parts. In this paper the focus will be the navigation component.This component takes an ontology and uses it to navigate through the resources layer.We are using XML technology to implement the whole framework and this component is implemented through an XML transformation process
Transforming XML Documents using fxt
As XML spreads to various application domains, transformation tasks on XML documents are accomplished by an ever increasing number of non-programmers. In this respect, rather than providing just a collection of basic operations via a library in a special purpose language, it is useful to provide a more intuitive, rule-based approach to XML transformation. The rule-based approach requires pattern-matching for identifying parts of the document to be processed. As XML document processing is basically a subarea of tree processing for which the functional programming style is very natural, we choose SML as implementation language. The functional style implies a processing model in which navigation is possible only to subtrees of a tree. This restriction can be compensated by using a tree pattern-matcher able to relate to ancestors, successors, as well as to siblings of a match. On top of the powerful fxgrep XML pattern-matcher, we build fxt, a transformation tool for XML documents. The functional processing model that fxt uses, allows an implementation more efficient than implementations permitted by the processing model of the popular XSLT, where navigation in the input tree can proceed in arbitrary directions. Usual transformations are specified in fxt in an intuitive, declarative way. More elaborate transformations can be flexibly achieved by the hooks provided to the full functionality of the SML programming language, as well as by the fxt’s variable mechanism
Schem@Doc: a web-based XML schema visualizer
XML Schema is one of the most used specifications for defining
types of XML documents. It provides an extensive set of primitive data types,
ways to extend and reuse definitions and an XML syntax that simplifies automatic manipulation. However, many features that make XML Schema Definitions (XSD) so interesting also make them rather cumbersome to read.
Several tools to visualize and browse schema definitions have been proposed to cope with this issue. The novel approach proposed in this paper is to base XSD visualization and navigation on the XML document itself, using solely the web browser, without requiring a pre-processing step or an intermediate
representation. We present the design and implementation of a web-based XML
Schema browser called schem@Doc that operates over the XSD file itself. With
this approach, XSD visualization is synchronized with the source file and
always reflects its current state. This tool fits well in the schema development
process and is easy to integrate in web repositories containing large numbers of
XSD files.European Commisssio
Hypermedia Learning Objects System - On the Way to a Semantic Educational Web
While eLearning systems become more and more popular in daily education,
available applications lack opportunities to structure, annotate and manage
their contents in a high-level fashion. General efforts to improve these
deficits are taken by initiatives to define rich meta data sets and a
semanticWeb layer. In the present paper we introduce Hylos, an online learning
system. Hylos is based on a cellular eLearning Object (ELO) information model
encapsulating meta data conforming to the LOM standard. Content management is
provisioned on this semantic meta data level and allows for variable,
dynamically adaptable access structures. Context aware multifunctional links
permit a systematic navigation depending on the learners and didactic needs,
thereby exploring the capabilities of the semantic web. Hylos is built upon the
more general Multimedia Information Repository (MIR) and the MIR adaptive
context linking environment (MIRaCLE), its linking extension. MIR is an open
system supporting the standards XML, Corba and JNDI. Hylos benefits from
manageable information structures, sophisticated access logic and high-level
authoring tools like the ELO editor responsible for the semi-manual creation of
meta data and WYSIWYG like content editing.Comment: 11 pages, 7 figure
- …