99 research outputs found

    Towards an effective processing of XML keyword query

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    XKMis: Effective and efficient keyword search in XML databases

    Full text link
    We present XKMis, a system for keyword search in xml documents. Unlike previous work, our method is not based on the lowest common ancestor (LCA) or its variant, rather we divide the nodes into meaningful and self-containing information segments, called minimal information segments (MISs), and return MIS-subtrees which consist of MISs that are logically connected by the keywords. The MIS-subtrees are closer to what the user wants. The MIS-subtrees enable us to use the region code of xml trees to develop an algorithm for the search which is more efficient especially for large xml trees. We report our experiment results, which verify the better effectiveness and efficiency of our system. Copyright ©2009 ACM

    Reasoning & Querying – State of the Art

    Get PDF
    Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF

    A Survey on Intent-based Diversification for Fuzzy Keyword Search

    Get PDF
    Keyword search is an interesting phenomenon, it is the process of finding important and relevant information from various data repositories. Structured and semistructured data can precisely be stored. Fully unstructured documents can annotate and be stored in the form of metadata. For the total web search, half of the web search is for information exploration process. In this paper, the earlier works for semantic meaning of keywords based on their context in the specified documents are thoroughly analyzed. In a tree data representation, the nodes are objects and could hold some intention. These nodes act as anchors for a Smallest Lowest Common Ancestor (SLCA) based pruning process. Based on their features, nodes are clustered. The feature is a distinctive attribute, it is the quality, property or traits of something. Automatic text classification algorithms are the modern way for feature extraction. Summarization and segmentation produce n consecutive grams from various forms of documents. The set of items which describe and summarize one important aspect of a query is known as the facet. Instead of exact string matching a fuzzy mapping based on semantic correlation is the new trend, whereas the correlation is quantified by cosine similarity. Once the outlier is detected, nearest neighbors of the selected points are mapped to the same hash code of the intend nodes with high probability. These methods collectively retrieve the relevant data and prune out the unnecessary data, and at the same time create a hash signature for the nearest neighbor search. This survey emphasizes the need for a framework for fuzzy oriented keyword search

    The accuracy of some length-based methods for fish population studies

    Get PDF
    Length, Growth, Stock assessment, Population dynamics, Fish, Methodology Pisces

    Genetic programming for manufacturing optimisation.

    Get PDF
    A considerable number of optimisation techniques have been proposed for the solution of problems associated with the manufacturing process. Evolutionary computation methods, a group of non-deterministic search algorithms that employ the concept of Darwinian strife for survival to guide the search for optimal solutions, have been extensively used for this purpose. Genetic programming is an evolutionary algorithm that evolves variable-length solution representations in the form of computer programs. While genetic programming has produced successful applications in a variety of optimisation fields, genetic programming methodologies for the solution of manufacturing optimisation problems have rarely been reported. The applicability of genetic programming in the field of manufacturing optimisation is investigated in this thesis. Three well-known problems were used for this purpose: the one-machine total tardiness problem, the cell-formation problem and the multiobjective process planning selection problem. The main contribution of this thesis is the introduction of novel genetic programming frameworks for the solution of these problems. In the case of the one-machine total tardiness problem genetic programming employed combinations of dispatching rules for the indirect representation of job schedules. The hybridisation of genetic programming with alternative search algorithms was proposed for the solution of more difficult problem instances. In addition, genetic programming was used for the evolution of new dispatching rules that challenged the efficiency of man-made dispatching rules for the solution of the problem. An integrated genetic programming - hierarchical clustering approach was proposed for the solution of simple and advanced formulations of the cell-formation problem. The proposed framework produced competitive results to alternative methodologies that have been proposed for the solution of the same problem. The evolution of similarity coefficients that can be used in combination with clustering techniques for the solution of cell-formation problems was also investigated. Finally, genetic programming was combined with a number of evolutionary multiobjective techniques for the solution of the multiobjective process planning selection problem. Results on test problems illustrated the ability of the proposed methodology to provide a wealth of potential solutions to the decision-maker

    A Labelling Technique Comparison for Indexing Large XML Database

    Get PDF
    The flexibility nature of XML documents has motivated researchers to use it for data transmission and storage in different domains. The hierarchical structure of XML documents is an attractive point to be researched for processing a user query based on labelling where each label describes the node structure in the tree. In this study, three categories of XML node labelling will be analysed to address the open problem of each category. A number of experiments are executed to compare performance of time execution and storage space required for labelling XML tree

    Efficiency and effectiveness of XML keyword search using a full element index

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2010.Thesis (Master's) -- Bilkent University, 2010.Includes bibliographical references leaves 63-67.In the last decade, both the academia and industry proposed several techniques to allow keyword search on XML databases and document collections. A common data structure employed in most of these approaches is an inverted index, which is the state-of-the-art for conducting keyword search over large volumes of textual data, such as world wide web. In particular, a full element-index considers (and indexes) each XML element as a separate document, which is formed of the text directly contained in it and the textual content of all of its descendants. A major criticism for a full element-index is the high degree of redundancy in the index (due to the nested structure of XML documents), which diminishes its usage for large-scale XML retrieval scenarios. As the rst contribution of this thesis, we investigate the e ciency and e ectiveness of using a full element-index for XML keyword search. First, we suggest that lossless index compression methods can signi cantly reduce the size of a full element-index so that query processing strategies, such as those employed in a typical search engine, can e ciently operate on it. We show that once the most essential problem of a full element-index, i.e., its size, is remedied, using such an index can improve both the result quality (e ectiveness) and query execution performance (e ciency) in comparison to other recently proposed techniques in the literature. Moreover, using a full element-index also allows generating query results in di erent forms, such as a ranked list of documents (as expected by a search engine user) or a complete list of elements that include all of the query terms (as expected by a DBMS user), in a uni ed framework. As a second contribution of this thesis, we propose to use a lossy approach, static index pruning, to further reduce the size of a full element-index. In this way, we aim to eliminate the repetition of an element's terms at upper levels in an adaptive manner considering the element's textual content and search system's ranking function. That is, we attempt to remove the repetitions in the index only when we expect that removal of them would not reduce the result quality. We conduct a well-crafted set of experiments and show that pruned index les are comparable or even superior to the full element-index up to very high pruning levels for various ad hoc tasks in terms of retrieval e ectiveness. As a nal contribution of this thesis, we propose to apply index pruning strategies to reduce the size of the document vectors in an XML collection to improve the clustering performance of the collection. Our experiments show that for certain cases, it is possible to prune up to 70% of the collection (or, more speci cally, underlying document vectors) and still generate a clustering structure that yields the same quality with that of the original collection, in terms of a set of evaluation metrics.Atılgan, DuyguM.S

    Enhancing the Usability of XML keyword Search

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore