19,971 research outputs found
Algorithms and implementation of functional dependency discovery in XML : a thesis presented in partial fulfilment of the requirements for the degree of Master of Information Sciences in Information Systems at Massey University
1.1 Background Following the advent of the web, there has been a great demand for data interchange between applications using internet infrastructure. XML (extensible Markup Language) provides a structured representation of data empowered by broad adoption and easy deployment. As a subset of SGML (Standard Generalized Markup Language), XML has been standardized by the World Wide Web Consortium (W3C) [Bray et al., 2004], XML is becoming the prevalent data exchange format on the World Wide Web and increasingly significant in storing semi-structured data. After its initial release in 1996, it has evolved and been applied extensively in all fields where the exchange of structured documents in electronic form is required. As with the growing popularity of XML, the issue of functional dependency in XML has recently received well deserved attention. The driving force for the study of dependencies in XML is it is as crucial to XML schema design, as to relational database(RDB) design [Abiteboul et al., 1995]
A Method for Mapping XML DTD to Relational Schemas In The Presence Of Functional Dependencies
The eXtensible Markup Language (XML) has recently emerged as a standard for
data representation and interchange on the web. As a lot of XML data in the web,
now the pressure is to manage the data efficiently. Given the fact that relational
databases are the most widely used technology for managing and storing XML,
therefore XML needs to map to relations and this process is one that occurs
frequently. There are many different ways to map and many approaches exist in the
literature especially considering the flexible nesting structures that XML allows. This
gives rise to the following important problem: Are some mappings ‘better’ than the
others? To approach this problem, the classical relational database design through
normalization technique that based on known functional dependency concept is
referred. This concept is used to specify the constraints that may exist in the relations
and guide the design while removing semantic data redundancies. This approach
leads to a good normalized relational schema without data redundancy. To achieve a
good normalized relational schema for XML, there is a need to extend the concept of
functional dependency in relations to XML and use this concept as guidance for the
design. Even though there exist functional dependency definitions for XML, but these definitions are not standard yet and still having several limitation. Due to the
limitations of the existing definitions, constraints in the presence of shared and local
elements that exist in XML document cannot be specified. In this study a new
definition of functional dependency constraints for XML is proposed that are general
enough to specify constraints and to discover semantic redundancies in XML
documents.
The focus of this study is on how to produce an optimal mapping approach in the
presence of XML functional dependencies (XFD), keys and Data Type Definition
(DTD) constraints, as a guidance to generate a good relational schema. To approach
the mapping problem, three different components are explored: the mapping
algorithm, functional dependency for XML, and implication process. The study of
XML implication is important to imply what other dependencies that are guaranteed
to hold in a relational representation of XML, given that a set of functional
dependencies holds in the XML document. This leads to the needs of deriving a set
of inference rules for the implication process. In the presence of DTD and userdefined
XFD, other set of XFDs that are guaranteed to hold in XML can be
generated using the set of inference rules. This mapping algorithm has been
developed within the tool called XtoR. The quality of the mapping approach has
been analyzed, and the result shows that the mapping approach (XtoR) significantly
improve in terms of generating a good relational schema for XML with respect to
reduce data and relation redundancy, remove dangling relations and remove
association problems. The findings suggest that if one wants to use RDBMS to
manage XML data, the mapping from XML document to relations must based be on
functional dependency constraints
Generating collaborative systems for digital libraries: A model-driven approach
This is an open access article shared under a Creative Commons Attribution 3.0 Licence (http://creativecommons.org/licenses/by/3.0/). Copyright @ 2010 The Authors.The design and development of a digital library involves different stakeholders, such as: information architects, librarians, and domain experts, who need to agree on a common language to describe, discuss, and negotiate the services the library has to offer. To this end, high-level, language-neutral models have to be devised. Metamodeling techniques favor the definition of domainspecific visual languages through which stakeholders can share their views and directly manipulate representations of the domain entities. This paper describes CRADLE (Cooperative-Relational Approach to Digital Library Environments), a metamodel-based framework and visual language for the definition of notions and services related to the development of digital libraries. A collection of tools allows the automatic generation of several services, defined with the CRADLE visual language, and of the graphical user interfaces providing access to them for the final user. The effectiveness of the approach is illustrated by presenting digital libraries generated with CRADLE, while the CRADLE environment has been evaluated by using the cognitive dimensions framework
Artequakt: Generating tailored biographies from automatically annotated fragments from the web
The Artequakt project seeks to automatically generate narrativebiographies of artists from knowledge that has been extracted from the Web and maintained in a knowledge base. An overview of the system architecture is presented here and the three key components of that architecture are explained in detail, namely knowledge extraction, information management and biography construction. Conclusions are drawn from the initial experiences of the project and future progress is detailed
Automatic extraction of knowledge from web documents
A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper
Topic Map Generation Using Text Mining
Starting from text corpus analysis with linguistic and statistical analysis algorithms, an infrastructure for text mining is described which uses collocation analysis as a central tool. This text mining method may be applied to different domains as well as languages. Some examples taken form large reference databases motivate the applicability to knowledge management using declarative standards of information structuring and description. The ISO/IEC Topic Map standard is introduced as a candidate for rich metadata description of information resources and it is shown how text mining can be used for automatic topic map generation
Recommended from our members
Visualising Discourse Structure in Interactive Documents
In this paper we introduce a method for generating interactive documents which exploits the visual features of hypertext to represent discourse structure. We explore the consistent and principled use of graphics and animation to support navigation and comprehension of non-linear text, where textual discourse markers do not always work effectively
SWI-Prolog and the Web
Where Prolog is commonly seen as a component in a Web application that is
either embedded or communicates using a proprietary protocol, we propose an
architecture where Prolog communicates to other components in a Web application
using the standard HTTP protocol. By avoiding embedding in external Web servers
development and deployment become much easier. To support this architecture, in
addition to the transfer protocol, we must also support parsing, representing
and generating the key Web document types such as HTML, XML and RDF.
This paper motivates the design decisions in the libraries and extensions to
Prolog for handling Web documents and protocols. The design has been guided by
the requirement to handle large documents efficiently. The described libraries
support a wide range of Web applications ranging from HTML and XML documents to
Semantic Web RDF processing.
To appear in Theory and Practice of Logic Programming (TPLP)Comment: 31 pages, 24 figures and 2 tables. To appear in Theory and Practice
of Logic Programming (TPLP
- …