540 research outputs found

    Reasoning & Querying – State of the Art

    Get PDF
    Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF

    Semantics and result disambiguation for keyword search on tree data

    Get PDF
    Keyword search is a popular technique for searching tree-structured data (e.g., XML, JSON) on the web because it frees the user from learning a complex query language and the structure of the data sources. However, the convenience of keyword search comes with drawbacks. The imprecision of the keyword queries usually results in a very large number of results of which only very few are relevant to the query. Multiple previous approaches have tried to address this problem. Some of them exploit structural and semantic properties of the tree data in order to filter out irrelevant results while others use a scoring function to rank the candidate results. These are not easy tasks though and in both cases, relevant results might be missed and the users might spend a significant amount of time searching for their intended result in a plethora of candidates. Another drawback of keyword search on tree data, also due to the incapacity of keyword queries to precisely express the user intent, is that the query answer may contain different types of meaningful results even though the user is interested in only some of them. Both problems of keyword search on tree data are addressed in this dissertation. First, an original approach for answering keyword queries is proposed. This approach extracts structural patterns of the query matches and reasons with them in order to return meaningful results ranked with respect to their relevance to the query. The proposed semantics performs comparisons between patterns of results by using different types of ho-momorphisms between the patterns. These comparisons are used to organize the patterns into a graph of patterns which is leveraged to determine ranking and filtering semantics. The experimental results show that the approach produces query results of higher quality compared to the previous ones. To address the second problem, an original approach for clustering the keyword search results on tree data is introduced. The clustered output allows the user to focus on a subset of the results, and to save time and effort while looking for the relevant results. The approach performs clustering at different levels of granularity to group similar results together effectively. The similarity of the results and result clusters is decided using relations on structural patterns of the results defined based on homomor-phisms between path patterns. An originality of the clustering approach is that the clusters are ranked at different levels of granularity to quickly guide the user to the relevant result patterns. An efficient stack-based algorithm is presented for generating result patterns and constructing the clustering hierarchy. The extensive experimentation with multiple real datasets show that the algorithm is fast and scalable. It also shows that the clustering methodology allows the users to effectively retrieve their intended results, and outperforms a recent state-of-the-art clustering approach. In order to tackle the second problem from a different aspect, diversifying the results of keyword search is addressed. Diversification aims to provide the users with a ranked list of results which balances the relevance and redundancy of the results. Measures for quantifying the relevance and dissimilarity of result patterns are presented and a heuristic for generating a diverse set of results using these metrics is introduced

    Quasi-SLCA based Keyword Query Processing over Probabilistic XML Data

    Full text link
    The probabilistic threshold query is one of the most common queries in uncertain databases, where a result satisfying the query must be also with probability meeting the threshold requirement. In this paper, we investigate probabilistic threshold keyword queries (PrTKQ) over XML data, which is not studied before. We first introduce the notion of quasi-SLCA and use it to represent results for a PrTKQ with the consideration of possible world semantics. Then we design a probabilistic inverted (PI) index that can be used to quickly return the qualified answers and filter out the unqualified ones based on our proposed lower/upper bounds. After that, we propose two efficient and comparable algorithms: Baseline Algorithm and PI index-based Algorithm. To accelerate the performance of algorithms, we also utilize probability density function. An empirical study using real and synthetic data sets has verified the effectiveness and the efficiency of our approaches

    Identifying meaningful return information for XML keyword search

    Full text link
    Keyword search enables web users to easily access XML data with-out the need to learn a structured query language and to study pos-sibly complex data schemas. Existing work has addressed the prob-lem of selecting qualied data nodes that match keywords and con-necting them in a meaningful way, in the spirit of inferring a where clause in XQuery. However, how to infer the return clause for key-word search is an open problem. To address this challenge, we present an XML keyword search en-gine, XSeek, to infer the semantics of the search and identify return nodes effectively. XSeek recognizes possible entities and attributes inherently represented in the data. It also distinguishes betwee

    Efficiency and effectiveness of XML keyword search using a full element index

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2010.Thesis (Master's) -- Bilkent University, 2010.Includes bibliographical references leaves 63-67.In the last decade, both the academia and industry proposed several techniques to allow keyword search on XML databases and document collections. A common data structure employed in most of these approaches is an inverted index, which is the state-of-the-art for conducting keyword search over large volumes of textual data, such as world wide web. In particular, a full element-index considers (and indexes) each XML element as a separate document, which is formed of the text directly contained in it and the textual content of all of its descendants. A major criticism for a full element-index is the high degree of redundancy in the index (due to the nested structure of XML documents), which diminishes its usage for large-scale XML retrieval scenarios. As the rst contribution of this thesis, we investigate the e ciency and e ectiveness of using a full element-index for XML keyword search. First, we suggest that lossless index compression methods can signi cantly reduce the size of a full element-index so that query processing strategies, such as those employed in a typical search engine, can e ciently operate on it. We show that once the most essential problem of a full element-index, i.e., its size, is remedied, using such an index can improve both the result quality (e ectiveness) and query execution performance (e ciency) in comparison to other recently proposed techniques in the literature. Moreover, using a full element-index also allows generating query results in di erent forms, such as a ranked list of documents (as expected by a search engine user) or a complete list of elements that include all of the query terms (as expected by a DBMS user), in a uni ed framework. As a second contribution of this thesis, we propose to use a lossy approach, static index pruning, to further reduce the size of a full element-index. In this way, we aim to eliminate the repetition of an element's terms at upper levels in an adaptive manner considering the element's textual content and search system's ranking function. That is, we attempt to remove the repetitions in the index only when we expect that removal of them would not reduce the result quality. We conduct a well-crafted set of experiments and show that pruned index les are comparable or even superior to the full element-index up to very high pruning levels for various ad hoc tasks in terms of retrieval e ectiveness. As a nal contribution of this thesis, we propose to apply index pruning strategies to reduce the size of the document vectors in an XML collection to improve the clustering performance of the collection. Our experiments show that for certain cases, it is possible to prune up to 70% of the collection (or, more speci cally, underlying document vectors) and still generate a clustering structure that yields the same quality with that of the original collection, in terms of a set of evaluation metrics.Atılgan, DuyguM.S

    Life Cycle Sustainability Assessment of the Hydrogen Fuel Cell Buses in the European Context. Evaluation of relevant measures to support low-carbon mobility in the public transport sector

    Get PDF
    Goal and Background. Transport represents 27% of Europe's Greenhouse Gas (GHG) emissions and is the main cause of air pollution in cities. With the global shift towards a low-carbon economy, the EU set forth a lowemission mobility strategy with the aim of reducing the overall emissions in the transport sector. The High V.LO.-City project is part of this overarching strategy and addresses the integration of hydrogen fuel cell (H2FC) buses in the public transport. Methods. In this thesis, the environmental assessment of one H2FC bus and the related refuelling station is carried out using the Life Cycle Assessment (LCA) methodology, taking into account the following phases: (1) bus production, (2) hydrogen production pathways (water electrolysis, chlor-alkali electrolysis, and steam methane reforming), (3) hydrogen consumption during bus operation, and (4) the vehicles' end of life. The potential impacts are evaluated for magnitude and signi cance in the life cycle impact assessment (LCIA) phase, using Environmental Footprint (EF) method which is part of the Product Environmental Footprint (PEF) method, established by the European Union (EU) in 2013. The calculated fuel economy is around 10.54 KgH2/100Km and the energy demand of a refuelling infrastructure may vary between 6 and 9 KWh/KgH2. Results. The results show that H2FC buses have the potential to reduce emissions during the use phase if renewables resources are used. The expected Global Warming Potential (GWP) bene t is about 85% in comparison to a diesel bus. Additionally, the emissions of the selected patterns of hydrogen production depend on how electricity is produced and on the chemical-based or fossil-based feedstocks used to drive the production process. Conclusions and Outlook. The improvement of the environmental pro le of hydrogen production requires to promote clean electricity sources to supply a low-carbon hydrogen and to sharpen policy focus with regard to life cycle management, and to counter potential setbacks, in particular those related to problem-shifting and to grid improvement

    The Wikipedia Image Retrieval Task

    Get PDF
    The wikipedia image retrieval task at ImageCLEF provides a testbed for the system-oriented evaluation of visual information retrieval from a collection of Wikipedia images. The aim is to investigate the effectiveness of retrieval approaches that exploit textual and visual evidence in the context of a large and heterogeneous collection of images that are searched for by users with diverse information needs. This chapter presents an overview of the available test collections, summarises the retrieval approaches employed by the groups that participated in the task during the 2008 and 2009 ImageCLEF campaigns, provides an analysis of the main evaluation results, identifies best practices for effective retrieval, and discusses open issues

    Subontology Extraction Using Hyponym and Hypernym Closure on is-a Directed Acyclic Graphs

    Get PDF
    International audienceOntologies are successfully used as semantic guides when navigating through the huge and ever increasing quantity of digital documents. Nevertheless, the size of numerous domain ontologies tends to grow beyond the human capacity to grasp information. This growth is problematic for a lot of key applications that require user interactions such as document annotation or ontology modification/evolution. The problem could be partially overcome by providing users with a sub-ontology focused on their current concepts of interest. A sub-ontology restricted to this sole set of concepts is of limited interest since their relationships can generally not be explicit without adding some of their hyponyms and hypernyms. This paper proposes efficient algorithms to identify these additional key concepts based on the closure of two common graph operators: the least common-ancestor and greatest common descendant. The resulting method produces ontology excerpts focused on a set of concepts of interest and is fast enough to be used in interactive environments. As an example, we use the resulting program, called OntoFocus (http://www.ontotoolkit.mines-ales.fr/), to restrict, in few seconds, the large Gene Ontology (~30,000 concepts) to a sub-ontology focused on concepts annotating a gene related to breast cancer
    • …