104 research outputs found

    From XML to relational database.

    Get PDF
    by Yan, Men-Hin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 114-119).Abstracts in English and Chinese.Abstract --- p.iiAcknowledgments --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Storing XML in Database Systems --- p.2Chapter 1.2 --- Outline of the Thesis --- p.4Chapter 2 --- Related Work --- p.5Chapter 2.1 --- Overview of XML --- p.5Chapter 2.1.1 --- Extensible Markup Language (XML) --- p.5Chapter 2.1.2 --- Data Type Definition (DTD) --- p.6Chapter 2.1.3 --- "ID, IDREF and IDREFS" --- p.9Chapter 2.2 --- Using Special-Purpose Database to Store XML Data --- p.10Chapter 2.3 --- Using Relational Databases to Store XML Data --- p.11Chapter 2.3.1 --- Extracting Schemas with STORED --- p.11Chapter 2.3.2 --- Using Simple Schemes Based on Labeled Graph --- p.12Chapter 2.3.3 --- Generating Schemas from DTDs --- p.12Chapter 2.3.4 --- Commercial Approaches --- p.13Chapter 2.4 --- Discovering Functional Dependencies --- p.14Chapter 2.4.1 --- Functional Dependency --- p.14Chapter 2.4.2 --- Finding Functional Dependencies --- p.14Chapter 2.4.3 --- TANE and Partition Refinement --- p.15Chapter 2.5 --- Multivalued Dependencies --- p.17Chapter 2.5.1 --- Example of Multivalued Dependency --- p.18Chapter 3 --- Using RDBMS to Store XML Data --- p.20Chapter 3.1 --- Global Schema Extraction Algorithm --- p.22Chapter 3.1.1 --- Step 1: Simplify DTD --- p.22Chapter 3.1.2 --- Step 2: Construct Schema Prototype Trees --- p.24Chapter 3.1.3 --- Step 3: Generate Relational Schema Prototype --- p.29Chapter 3.1.4 --- Step 4: Discover Functional Dependencies and Candidate Keys --- p.31Chapter 3.1.5 --- Step 5: Normalize the Relational Schema Prototypes --- p.32Chapter 3.1.6 --- Discussion --- p.32Chapter 3.2 --- DTD-splitting Schema Extraction Algorithm --- p.34Chapter 3.2.1 --- Step 1: Simplify DTD --- p.35Chapter 3.2.2 --- Step 2: Construct Schema Prototype Trees --- p.36Chapter 3.2.3 --- Step 3: Generate Relational Schema Prototype --- p.45Chapter 3.2.4 --- Step 4: Discover Functional Dependencies and Candidate Keys --- p.46Chapter 3.2.5 --- Step 5: Normalize the Relational Schema Prototypes --- p.47Chapter 3.2.6 --- Discussion --- p.49Chapter 3.3 --- Experimental Results --- p.50Chapter 3.3.1 --- Real Life XML Data: SIGMOD Record XML --- p.50Chapter 3.3.2 --- Synthetic XML Data --- p.58Chapter 3.3.3 --- Discussion --- p.68Chapter 4 --- Finding Multivalued Dependencies --- p.75Chapter 4.1 --- Validation of Multivalued Dependencies --- p.77Chapter 4.2 --- Search Strategy and Pruning --- p.80Chapter 4.2.1 --- Search Strategy for Left-hand Sides Candidates --- p.81Chapter 4.2.2 --- Search Strategy for Right-hand Sides Candidates --- p.82Chapter 4.2.3 --- Other Pruning --- p.85Chapter 4.3 --- Computing with Partitions --- p.87Chapter 4.3.1 --- Computing Partitions --- p.88Chapter 4.4 --- Algorithm --- p.89Chapter 4.4.1 --- Generating Next Level Candidates --- p.92Chapter 4.4.2 --- Computing Partitions --- p.93Chapter 4.5 --- Experimental Results --- p.94Chapter 4.5.1 --- Results of the Algorithm --- p.95Chapter 4.5.2 --- Evaluation on the Results --- p.96Chapter 4.5.3 --- Scalability of the Algorithm --- p.98Chapter 4.5.4 --- Using Multivalued Dependencies in Schema Extraction Al- gorithms --- p.101Chapter 5 --- Conclusion --- p.108Chapter 5.1 --- Discussion --- p.108Chapter 5.2 --- Future Work --- p.110Chapter 5.2.1 --- Translate Semistructured Queries to SQL --- p.110Chapter 5.2.2 --- Improve the Multivalued Dependency Discovery Algorithm --- p.112Chapter 5.2.3 --- Incremental Update of Resulting Schema --- p.113Bibliography --- p.113Appendix --- p.120Chapter A --- Simple Proof for Minimality in Multivalued Dependencies --- p.120Chapter B --- Third and Fourth Normal Form Decompositions --- p.122Chapter B.1 --- 3NF Decomposition Algorithm --- p.123Chapter B.2 --- 4NF Decomposition Algorithm --- p.12

    Acta Cybernetica : Volume 20. Number 2.

    Get PDF

    KD2R: a Key Discovery method for semantic Reference Reconciliation in OWL

    Get PDF
    The reference reconciliation problem consists of deciding whether different identifiers refer to the same world entity. Some existing reference reconciliation approaches use key constraints to infer reconciliation decisions. In the context of the Linked Open Data, this knowledge is not available. In this master thesis we propose KD2R, a method which allows automatic discovery of key constraints associated to OWL2 classes. These keys are discovered from RDF data which can be incomplete. The proposed algorithm allows this discovery without having to scan all the data. KD2R has been tested on data sets of the international contest OAEI and obtains promising results.Le problème de réconciliation de référence consiste à décider si des identifiants différents référé à la même entité du monde réel. Certaines approches de réconciliation de référence utilisent des contraintes des clé pour déduire des décisions de réconciliation des références. Dans le contexte des données liées, cette connaissance n'est pas disponible. Dans ce stage de master nous proposons KD2R, une méthode qui permet la découverte automatique des contraintes de clé associées à des classes OWL2. Cette contraintes de cl'e sont découvertes a' partir de données RDF qui peuvent être incomplètes. L'algorithme propos'e permet cette découverte, sans avoir à passer en revue toutes les données. KD2R a été testé sur des jeux de données du concours international OAEI et obtient des résultats prometteurs

    TOWARDS EFFECTIVE RELATIONAL KEYWORD SEARCH USING SEMANTICS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Four Lessons in Versatility or How Query Languages Adapt to the Web

    Get PDF
    Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”

    Improving search engines with open Web-based SKOS vocabularies

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaThe volume of digital information is increasingly larger and even though organiza-tions are making more of this information available, without the proper tools users have great difficulties in retrieving documents about subjects of interest. Good infor-mation retrieval mechanisms are crucial for answering user information needs. Nowadays, search engines are unavoidable - they are an essential feature in docu-ment management systems. However, achieving good relevancy is a difficult problem particularly when dealing with specific technical domains where vocabulary mismatch problems can be prejudicial. Numerous research works found that exploiting the lexi-cal or semantic relations of terms in a collection attenuates this problem. In this dissertation, we aim to improve search results and user experience by inves-tigating the use of potentially connected Web vocabularies in information retrieval en-gines. In the context of open Web-based SKOS vocabularies we propose a query expan-sion framework implemented in a widely used IR system (Lucene/Solr), and evaluated using standard IR evaluation datasets. The components described in this thesis were applied in the development of a new search system that was integrated with a rapid applications development tool in the context of an internship at Quidgest S.A.Fundação para a Ciência e Tecnologia - ImTV research project, in the context of the UTAustin-Portugal collaboration (UTA-Est/MAI/0010/2009); QSearch project (FCT/Quidgest
    corecore