17,939 research outputs found

    A Formal Definition for Configuration

    Get PDF
    There exists a wide set of techniques to perform keyword-based search over relational databases but all of them match the keywords in the users' queries to elements of the databases to be queried as first step. The matching process is a time-consuming and complex task. So, improving the performance of this task is a key issue to improve the keyword based search on relational data sources.In this work, we show how to model the matching process on keyword-based search on relational databases by means of the symmetric group. Besides, how this approach reduces the search space is explained in detail

    Keyword search in graphs, relational databases and social networks

    Get PDF
    Keyword search, a well known mechanism for retrieving relevant information from a set of documents, has recently been studied for extracting information from structured data (e.g., relational databases and XML documents). It offers an alternative way to query languages (e.g., SQL) to explore databases, which is effective for lay users who may not be familiar with the database schema or the query language. This dissertation addresses some issues in keyword search in structured data. Namely, novel solutions to existing problems in keyword search in graphs or relational databases are proposed. In addition, a problem related to graph keyword search, team formation in social networks, is studied. The dissertation consists of four parts. The first part addresses keyword search over a graph which finds a substructure of the graph containing all or some of the query keywords. Current methods for keyword search over graphs may produce answers in which some content nodes (i.e., nodes that contain input keywords) are not very close to each other. In addition, current methods explore both content and non-content nodes while searching for the result and are thus both time and memory consuming for large graphs. To address the above problems, we propose algorithms for finding r-cliques in graphs. An r-clique is a group of content nodes that cover all the input keywords and the distance between each pair of nodes is less than or equal to r. Two approximation algorithms that produce r-cliques with a bounded approximation ratio in polynomial delay are proposed. In the second part, the problem of duplication-free and minimal keyword search in graphs is studied. Current methods for keyword search in graphs may produce duplicate answers that contain the same set of content nodes. In addition, an answer found by these methods may not be minimal in the sense that some of the nodes in the answer may contain query keywords that are all covered by other nodes in the answer. Removing these nodes does not change the coverage of the answer but can make the answer more compact. We define the problem of finding duplication-free and minimal answers, and propose algorithms for finding such answers efficiently. Meaningful keyword search in relational databases is the subject of the third part of this dissertation. Keyword search over relational databases returns a join tree spanning tuples containing the query keywords. As many answers of varying quality can be found, and the user is often only interested in seeing the·top-k answers, how to gauge the relevance of answers to rank them is of paramount importance. This becomes more pertinent for databases with large and complex schemas. We focus on the relevance of join trees as the fundamental means to rank the answers. We devise means to measure relevance of relations and foreign keys in the schema over the information content of the database. The problem of keyword search over graph data is similar to the problem of team formation in social networks. In this setting, keywords represent skills and the nodes in a graph represent the experts that possess skills. Given an expert network, in which a node represents an expert that has a cost for using the expert service and an edge represents the communication cost between the two corresponding experts, we tackle the problem of finding a team of experts that covers a set of required skills and also minimizes the communication cost as well as the personnel cost of the team. We propose two types of approximation algorithms to solve this bi-criteria problem in the fourth part of this dissertation

    Keyword Search in Relational Databases: Architecture, Approaches and Considerations

    Get PDF
    Questo lavoro di tesi presenta le diverse soluzioni proposte in letteratura per applicare il paradigma keyword search alle basi di dati relazionali, e vuole delineare una architettura generale per definire e sviluppare questi sistemi. A tal proposito, le soluzioni presentate dalla comunitĂ  scientifica sono state analizzate focalizzandosi sui singoli componenti della pipeline di ricerca. Infine, si sono analizzati i processi di valutazione sperimentale di questi sistem

    Effective semantic-based keyword search over relational databases for knowledge discovery

    Get PDF
    Keyword-based search has been popularized by Internet web search engines such as Google which is the most commonly used search engine to locate the information on the web. On the other hand while traditional database management systems offer powerful query languages such as SQL, they do not provide keyword-based search similar to the one provided by web search engines. The current amount of text data in relational databases is massive and is growing fast. This increases the importance and need for non-technical users to be able to search for such information using simple keyword search just as how they would search for text documents on the web. Keyword search over relational databases (KSRDBs) enables ordinary users to query relational databases by simply submitting keywords without having to know any SQL or having any knowledge of the underlying structure of the data. In this research work our primary focus is to enhance the effectiveness of the keyword search over relational databases using semantic web technologies. We have also addressed some the issues with the effectiveness of the current keyword search over relational databases. In particular we are addressing the followings: We have improved (gained significantly higher precision/recall curve) the existing state-of-the-art ranking functions by incorporating the query keywords\u27 proximity and query keywords\u27 quadgrams of the text attributes with long string into the scoring function. We have adapted a novel approach in making keyword search recommendations based on the text attributes in which the search terms were found without relying on the user\u27s past search criteria. A proof of concept (POC) prototype system called TupleRecommender has been implemented based on this approach. We have designed and implemented a proof of concept (POC) prototype system called database semantic search explorer (DBSemSXplorer) which can answer the traditional keyword search over relational databases in a more effective way with a better presentation of search results. This system is based on semantic web technologies and is equipped with faceted search and inference capability of the Semantic Web to ease the task of knowledge discovery for the end user

    Analysis of multiple update techniques on a RDF keyword search system

    Get PDF
    Keyword search is a technology that allows non-expert users to explore and retrieve information and it is traditionally used for unstructured data, such as in Web page searches. In the last decade, this search method has also become popular for exploring structured data, such as relational databases or graphs. Instead of using complex SQL or SPARQL queries and when the underlying schema is known, the user writes a series of words(keywords) to search for what he or she needs, getting as answers the ones more matching with the search. Keyword search systems are challenged by two fundamental parameters, efficiency and effectiveness. In fact, efficiency and effectiveness are two qualities of a SPARQL, or SQL, query that returns an answer quickly and always accurate even when operating on large amounts of data. The "virtual documents" method allows keyword search systems to work also on large databases by generating answers to keyword queries in a reasonable time. This paper aims to replicate the keyword search systems based on "virtual documents" TSA+BM25 and TSA+VDP for RDF graphs. In addition, two methods of update processing in a keyword search system, will be presented and analyzed: BruteForce and semiTSA. Although keyword search is a growing research matter, the topic of updates on structured data, such as RDF data, had not yet been addressed in the literature.Keyword search is a technology that allows non-expert users to explore and retrieve information and it is traditionally used for unstructured data, such as in Web page searches. In the last decade, this search method has also become popular for exploring structured data, such as relational databases or graphs. Instead of using complex SQL or SPARQL queries and when the underlying schema is known, the user writes a series of words(keywords) to search for what he or she needs, getting as answers the ones more matching with the search. Keyword search systems are challenged by two fundamental parameters, efficiency and effectiveness. In fact, efficiency and effectiveness are two qualities of a SPARQL, or SQL, query that returns an answer quickly and always accurate even when operating on large amounts of data. The "virtual documents" method allows keyword search systems to work also on large databases by generating answers to keyword queries in a reasonable time. This paper aims to replicate the keyword search systems based on "virtual documents" TSA+BM25 and TSA+VDP for RDF graphs. In addition, two methods of update processing in a keyword search system, will be presented and analyzed: BruteForce and semiTSA. Although keyword search is a growing research matter, the topic of updates on structured data, such as RDF data, had not yet been addressed in the literature

    LMPD: LIPID MAPS proteome database

    Get PDF
    The LIPID MAPS Proteome Database (LMPD) is an object-relational database of lipid-associated protein sequences and annotations. The initial release contains 2959 records, representing human and mouse proteins involved in lipid metabolism. UniProt IDs were obtained based on keyword search of KEGG and GO databases, and this LMPD protein list was then enhanced with annotations from UniProt, EntrezGene, ENZYME, GO, KEGG and other public resources. We also assigned associations with general lipid categories, based on GO and KEGG annotations. Users may search LMPD by database ID or keyword, and filter by species and/or lipid class associations; from the search results, one can then access a compilation of data relevant to each protein of interest, cross-linked to external databases. The LIPID MAPS Proteome Database (LMPD) is publicly available from the LIPID MAPS Consortium website (). The direct URL is
    • …
    corecore