1,257 research outputs found

    Semantics Analysis for XML Keyword Search

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Keyword search in graphs, relational databases and social networks

    Get PDF
    Keyword search, a well known mechanism for retrieving relevant information from a set of documents, has recently been studied for extracting information from structured data (e.g., relational databases and XML documents). It offers an alternative way to query languages (e.g., SQL) to explore databases, which is effective for lay users who may not be familiar with the database schema or the query language. This dissertation addresses some issues in keyword search in structured data. Namely, novel solutions to existing problems in keyword search in graphs or relational databases are proposed. In addition, a problem related to graph keyword search, team formation in social networks, is studied. The dissertation consists of four parts. The first part addresses keyword search over a graph which finds a substructure of the graph containing all or some of the query keywords. Current methods for keyword search over graphs may produce answers in which some content nodes (i.e., nodes that contain input keywords) are not very close to each other. In addition, current methods explore both content and non-content nodes while searching for the result and are thus both time and memory consuming for large graphs. To address the above problems, we propose algorithms for finding r-cliques in graphs. An r-clique is a group of content nodes that cover all the input keywords and the distance between each pair of nodes is less than or equal to r. Two approximation algorithms that produce r-cliques with a bounded approximation ratio in polynomial delay are proposed. In the second part, the problem of duplication-free and minimal keyword search in graphs is studied. Current methods for keyword search in graphs may produce duplicate answers that contain the same set of content nodes. In addition, an answer found by these methods may not be minimal in the sense that some of the nodes in the answer may contain query keywords that are all covered by other nodes in the answer. Removing these nodes does not change the coverage of the answer but can make the answer more compact. We define the problem of finding duplication-free and minimal answers, and propose algorithms for finding such answers efficiently. Meaningful keyword search in relational databases is the subject of the third part of this dissertation. Keyword search over relational databases returns a join tree spanning tuples containing the query keywords. As many answers of varying quality can be found, and the user is often only interested in seeing the·top-k answers, how to gauge the relevance of answers to rank them is of paramount importance. This becomes more pertinent for databases with large and complex schemas. We focus on the relevance of join trees as the fundamental means to rank the answers. We devise means to measure relevance of relations and foreign keys in the schema over the information content of the database. The problem of keyword search over graph data is similar to the problem of team formation in social networks. In this setting, keywords represent skills and the nodes in a graph represent the experts that possess skills. Given an expert network, in which a node represents an expert that has a cost for using the expert service and an edge represents the communication cost between the two corresponding experts, we tackle the problem of finding a team of experts that covers a set of required skills and also minimizes the communication cost as well as the personnel cost of the team. We propose two types of approximation algorithms to solve this bi-criteria problem in the fourth part of this dissertation

    Four Lessons in Versatility or How Query Languages Adapt to the Web

    Get PDF
    Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”

    TOWARDS EFFECTIVE RELATIONAL KEYWORD SEARCH USING SEMANTICS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    From unstructured HTML to structured XML: how XML supports financial knowledge management on internet.

    Get PDF
    by Yuen Lok-tin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 88-95).Abstracts in English and Chinese.ABSTRACT --- p.I摘要 --- p.IIIACKNOWLEDGEMENT --- p.VTABLE OF CONTENTS --- p.VILIST OF FIGURES --- p.VIIILIST OF TABLES --- p.IXChapter 1 --- INTRODUCTION --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.2 --- Objectives --- p.2Chapter 1.3 --- Organization --- p.4Chapter 2 --- LITERATURE REVIEW & THEORETICAL FOUNDATION --- p.6Chapter 2.1 --- "Data, Information and Knowledge" --- p.6Chapter 2.2 --- Knowledge Management --- p.7Chapter 2.3 --- Information Transparency and Efficiency --- p.10Chapter 2.3.1 --- Transparency --- p.11Chapter 2.3.2 --- Efficiency --- p.13Chapter 2.4 --- extensible markup language (XML) --- p.14Chapter 3 --- DIGITAL FINANCIAL INFORMATION AND ISSUES --- p.16Chapter 3.1 --- Managing Financial Information on the Internet --- p.17Chapter 3.2 --- Existing Electronic Financial Filing Systems --- p.20Chapter 3.3 --- Financial Document Disclosure Model --- p.21Chapter 3.4 --- Interaction Between Information Producers and Consumers --- p.23Chapter 3.5 --- Gluing All Together --- p.26Chapter 4 --- IDEAL ELECTRONIC FINANCIAL DISCLOSURE SYSTEM --- p.27Chapter 4.1 --- Structure and Representation of Knowledge --- p.28Chapter 4.2 --- Content Creation --- p.33Chapter 5 --- PROPOSED APPROACH --- p.36Chapter 5.1 --- Preliminary XML Data Dictionary --- p.36Chapter 5.2 --- Creation of XML Tags --- p.40Chapter 5.2.1 --- Statistical Information Retrieval --- p.41Chapter 5.2.2 --- Accounting and Auditing Practice --- p.43Chapter 5.2.3 --- Investors´ةFeedback --- p.44Chapter 5.3 --- Value-Added Services --- p.45Chapter 6 --- DESIGN AND DEVELOPMENT OF ELFFS-XML --- p.49Chapter 6.1 --- Stages of ELFFS-XML --- p.49Chapter 6.1.1 --- Information Creation --- p.49Chapter 6.1.2 --- Information Collection/Storage --- p.50Chapter 6.1.3 --- Knowledge Generation --- p.51Chapter 6.1.4 --- Knowledge Dissemination/Presentation --- p.52Chapter 6.1.5 --- Feedback --- p.52Chapter 6.2 --- Components of ELFFS-XML --- p.53Chapter 6.2.1 --- Data Source Abstraction Layer --- p.55Chapter 6.2.2 --- Storage Abstraction Layer --- p.57Chapter 6.2.3 --- Logic Layer --- p.61Chapter 6.2.4 --- Presentation Layer --- p.63Chapter 7 --- EVALUATING ELFFS-XML --- p.66Chapter 7.1 --- Comparison with Other Financial Information Disclosure Systems --- p.66Chapter 7.2 --- Users' Evaluation --- p.70Chapter 7.3 --- Systems Efficiency --- p.71Chapter 7.4 --- XML Tag Generation Approach Performance Evaluation --- p.73Chapter 8 --- CONCLUSION AND FUTURE RESEARCH --- p.78APPENDIX I SURVEY ON INVESTMENT PATTERN --- p.80APPENDIX II CORE ELFFS-XML DTD --- p.84APPENDIX III PERFORMANCE RELATED XML TAGS --- p.86BIBLIOGRAPHY --- p.8

    Survey over Existing Query and Transformation Languages

    Get PDF
    A widely acknowledged obstacle for realizing the vision of the Semantic Web is the inability of many current Semantic Web approaches to cope with data available in such diverging representation formalisms as XML, RDF, or Topic Maps. A common query language is the first step to allow transparent access to data in any of these formats. To further the understanding of the requirements and approaches proposed for query languages in the conventional as well as the Semantic Web, this report surveys a large number of query languages for accessing XML, RDF, or Topic Maps. This is the first systematic survey to consider query languages from all these areas. From the detailed survey of these query languages, a common classification scheme is derived that is useful for understanding and differentiating languages within and among all three areas
    corecore