12 research outputs found

    Cross-organisation dataspace (COD) - architecture and implementation

    Get PDF
    With the rapid development of information and communication technologies, the need to share information to improve efficiency in large enterprises is also increasing rapidly. For a large enterprise the information can come from many different sources and in different formats. There is a real requirement to manage the vast amount and diverse sources of data in a convenient and integrated way so that repositories of information can be built up with little additional effort and the information can be easily accessed globally. This paper presents the design and implementation of a prototype, called COD (Cross- Organisation Dataspace), that addresses the above challenges. COD, in the context of an enterprise involving multiple organisations, allows users from different geographical locations to contribute information and to search and access information easily. The information can be contained in many different forms, e.g. text files, reports, drawings and databases

    Cross-Organisation Dataspace (COD) - Architecture and Implementation

    Get PDF
    With the rapid development of information and communication technologies, the need to share information to improve efficiency in large enterprises is also increasing rapidly. For a large enterprise the information can come from many different sources and in different formats. There is a real requirement to manage the vast amount and diverse sources of data in a convenient and integrated way so that repositories of information can be built up with little additional effort and the information can be easily accessed globally. This paper presents the design and implementation of a prototype, called COD (Cross- Organisation Dataspace), that addresses the above challenges. COD, in the context of an enterprise involving multiple organisations, allows users from different geographical locations to contribute information and to search and access information easily. The information can be contained in many different forms, e.g. text files, reports, drawings and databases

    Schema-aware keyword search on linked data

    Get PDF
    Keyword search is a popular technique for querying the ever growing repositories of RDF graph data on the Web. This is due to the fact that the users do not need to master complex query languages (e.g., SQL, SPARQL) and they do not need to know the underlying structure of the data on the Web to compose their queries. Keyword search is simple and flexible. However, it is at the same time ambiguous since a keyword query can be interpreted in different ways. This feature of keyword search poses at least two challenges: (a) identifying relevant results among a multitude of candidate results, and (b) dealing with the performance scalability issue of the query evaluation algorithms. In the literature, multiple schema-unaware approaches are proposed to cope with the above challenges. Some of them identify as relevant results only those candidate results which maintain the keyword instances in close proximity. Other approaches filter out irrelevant results using their structural characteristics or rank and top-k process the retrieved results based on statistical information about the data. In any case, these approaches cannot disambiguate the query to identify the intent of the user and they cannot scale satisfactorily when the size of the data and the number of the query keywords grow. In recent years, different approaches tried to exploit the schema (structural summary) of the RDF (Resource Description Framework) data graph to address the problems above. In this context, an original hierarchical clustering technique is introduced in this dissertation. This approach clusters the results based on a semantic interpretation of the keyword instances and takes advantage of relevance feedback from the user. The clustering hierarchy uses pattern graphs which are structured queries and clustering together result graphs with the same structure. Pattern graphs represent possible interpretations for the keyword query. By navigating though the hierarchy the user can select the pattern graph which is relevant to her intent. Nevertheless, structural summaries are approximate representations of the data and, therefore, might return empty answers or miss results which are relevant to the user intent. To address this issue, a novel approach is presented which combines the use of the structural summary and the user feedback with a relaxation technique for pattern graphs to extract additional results potentially of interest to the user. Query caching and multi-query optimization techniques are leveraged for the efficient evaluation of relaxed pattern graphs. Although the approaches which consider the structural summary of the data graph are promising, they require interaction with the user. It is claimed in this dissertation that without additional information from the user, it is not possible to produce results of high quality from keyword search on RDF data with the existing techniques. In this regard, an original keyword query language on RDF data is introduced which allows the user to convey his intention flexibly and effortlessly by specifying cohesive keyword groups. A cohesive group of keywords in a query indicates that its keywords should form a cohesive unit in the query results. It is experimentally demonstrated that cohesive keyword queries improve the result quality effectively and prune the search space of the pattern graphs efficiently compared to traditional keyword queries. Most importantly, these benefits are achieved while retaining the simplicity and the convenience of traditional keyword search. The last issue addressed in this dissertation is the diversification problem for keyword search on RDF data. The goal of diversification is to trade off relevance and diversity in the results set of a keyword query in order to minimize the dissatisfaction of the average user. Novel metrics are developed for assessing relevance and diversity along with techniques for the generation of a relevant and diversified set of query interpretations for a keyword query on an RDF data graph. Experimental results show the effectiveness of the metrics and the efficiency of the approach

    Keyword Search over Relational Databases

    Get PDF
    介绍了基于关系数据库的关键词查询问题的研究背景;阐述了解决该问题的两大类方法,即基于数据图的方法和基于模式图的方法,并详细介绍了各种方法的原理以及各自的优缺点;最后展望了未来的研究方向。First, the research background of keyword search over relational databases is presented and is followed by a detailed description of two solutions to this problem, i.e., data graph based and schema graph based methods, and a discussion of the principles, advantages and disadvantages of these methods is also mentioned. Finally, some future trends in this area are discussed.Supported by the National Natural Science Foundation of China under Grant No.50604012 (国家自然科学基金); the National High-Tech Research and Development Plan of China under Grant No.2009AA01Z150 (国家高技术研究发展计划(863)
    corecore