62 research outputs found

    QueRIE: Collaborative Database Exploration

    Get PDF
    Interactive database exploration is a key task in information mining. However, users who lack SQL expertise or familiarity with the database schema face great difficulties in performing this task. To aid these users, we developed the QueRIE system for personalized query recommendations. QueRIE continuously monitors the user’s querying behavior and finds matching patterns in the system’s query log, in an attempt to identify previous users with similar information needs. Subsequently, QueRIE uses these “similar” users and their queries to recommend queries that the current user may find interesting. In this work we describe an instantiation of the QueRIE framework, where the active user’s session is represented by a set of query fragments. The recorded fragments are used to identify similar query fragments in the previously recorded sessions, which are in turn assembled in potentially interesting queries for the active user. We show through experimentation that the proposed method generates meaningful recommendations on real-life traces from the SkyServer database and propose a scalable design that enables the incremental update of similarities, making real-time computations on large amounts of data feasible. Finally, we compare this fragment-based instantiation with our previously proposed tuple-based instantiation discussing the advantages and disadvantages of each approach

    Query Formulation and Recommendation for Relational Databases Using User Sessions and Collaborative Filtering

    Get PDF
    Structured Query Language (SQL) has a uniform structure over different programming languages. The queries fired on Database Management System (DBMS) contain textual information along with selected segments of data parsed by data base management system to fire it as a structured query. Currently DBA needs to execute complex queries on large databases. Many times user or DBA fires similar queries on database server to get useful information. The queries which are similar to each other can then be categorized into two types a) the tuples retrieved by SQL queries are similar b) the fragment of the queries are similar. System gives recommendation to those similar queries so that it saves the time of DBA to construct it again and again. Query suggestions given to DBA or users are known as Query Recommendation. To develop a Query Recommendation system many authors suggested the use of Query Log. Query suggestions are divided into two areas mainly Collaborative Recommendations and Single Log Recommendations. This system is designed by single or collaborative log using parameter known as mixing factor. In this paper we analyzed Sql query Recommendation concepts and their uses. There are basically two types of similarity measure for Query Recommendation considered in [1] such as 1) Fragment Based 2) Tuple Based. Here in this research paper we are motivated towards generating recommendations for nested SQL queries. We adopt hierarchical classification on query log to create classes of similar queries and further to generate recommendations for SQL Query we proceed with finding matching class from which the recommendations can be modeled. DOI: 10.17762/ijritcc2321-8169.15070

    SQL query log analysis for identifying user interests and query recommendations

    Get PDF
    In the sciences and elsewhere, the use of relational databases has become ubiquitous. To get maximum profit from a database, one should have in-depth knowledge in both SQL and a domain (data structure and meaning that a database contains). To assist inexperienced users in formulating their needs, SQL query recommendation system (SQL QRS) has been proposed. It utilizes the experience of previous users captured by SQL query log as well as the user query history to suggest. When constructing such a system, one should solve related problems: (1) clean the query log and (2) define appropriate query similarity functions. These two tasks are not only necessary for building SQL QRS, but they apply to other problems. In what follows, we describe three scenarios of SQL query log analysis: (1) cleaning an SQL query log, (2) SQL query log clustering when testing SQL query similarity functions and (3) recommending SQL queries. We also explain how these three branches are related to each other. Scenario 1. Cleaning SQL query log as a general pre-processing step The raw query log is often not suitable for query log analysis tasks such as clustering, giving recommendations. That is because it contains antipatterns and robotic data downloads, also known as Sliding Window Search (SWS). An antipattern in software engineering is a special case of a pattern. While a pattern is a standard solution, an antipattern is a pattern with a negative effect. When it comes to SQL query recommendation, leaving such artifacts in the log during analysis results in a wrong suggestion. Firstly, the behaviour of "mortal" users who need a recommendation is different from robots, which perform SWS. Secondly, one does not want to recommend antipatterns, so they need to be excluded from the query pool. Thirdly, the bigger a log is, the slower a recommendation engine operates. Thus, excluding SWS and antipatterns from the input data makes the recommendation better and faster. The effect of SWS and antipatterns on query log clustering depends on the chosen similarity function. The result can either (1) do not change or (2) add clusters which cover a big part of data. In any case, having antipatterns and SWS in an input log increases only the time one need to cluster and do not increase the quality of results. Scenario 2. Identifying User Interests via Clustering To identify the hot spots of user interests, one clusters SQL queries. In a scientific domain, it exposes research trends. In business, it points to popular data slices which one might want to refactor for better accessibility. A good clustering result must be precise (match ground truth) and interpretable. Query similarity relies on SQL query representation. There are three strategies to represent an SQL query. FB (feature-based) query representation sees a query as structure, not considering the data, a query accesses. WB (witness-based) approach treat a query as a set of tuples in the result set. AAB (access area-based) representation considers a query as an expression in relational algebra. While WB and FB query similarity functions are straightforward (Jaccard or cosine similarities), AAB query similarity requires additional definition. We proposed two variants of AAB similarity measure – overlap (AABovl) and closeness (AABcl). In AABovl, the similarity of two queries is the overlap of their access areas. AABcl relies on the distance between two access areas in the data space – two queries may be similar even if their access areas do not overlap. The extensive experiments consist of two parts. The first one is clustering a rather small dataset with ground truth. This experiment serves to study the precision of various similarity functions by comparing clustering results to supervised insights. The second experiment aims to investigate on the interpretability of clustering results with different similarity functions. It clusters a big real-world query log. The domain expert then evaluates the results. Both experiments show that AAB similarity functions produce better results in both precision and interpretability. Scenario 3. SQL Query Recommendation A sound SQL query recommendation system (1) provides a query which can be run directly, (2) supports comparison operators and various logical operators, (3) is scalable and has low response times, (4) provides recommendations of high quality. The existing approaches fail to fulfill all the requirements. We proposed DASQR, scalable and data-aware query recommendation to meet all four needs. In a nutshell, DASQR is a hybrid (collaborative filtering + content-based) approach. Its variations utilize all similarity functions, which we define or find in the related work. Measuring the quality of SQL query recommendation system (QRS) is particularly challenging since there is no standard way approaching it. Previous studies have evaluated the results using quality metrics which only rely on the query representations used in these studies. It is somewhat subjective since a similarity function and a quality metric are dependent. We propose AAB quality metrics and then evaluate each approach based on all the metrics. The experiments test DASQR approaches and competitors. Both performance and runtime experiments indicate that DASQR approaches outperform the existing ones

    Revisting SQL Query Recommender System Using Hierarchical Classification

    Get PDF
    For analytical purposes, lots of data are gathered which are gathered and explored in data warehouses. Even to handle such a large data is a tough task for expert people. For non-expert users or for users who are not familiar with the database schema, handling such a voluminous data is more difficult task. The aim of this paper is to facilitate this class of users by recommending them SQL queries that they may use. By following the users past behavior and comparing them with other users, these SQL recommendations are selected. Initially, users may not know from where they can start their exploration. Secondly, users may overlook queries which help them to retrieve important data. Using hierarchical classification, the queries are recorded and compared which is then re-ranked according to relevance. Using users querying behavior, the relevant queries are retrieved. To issue a series of SQL queries, users use a query interface which aim to analyze the data and mine it for interesting information. DOI: 10.17762/ijritcc2321-8169.150614

    Query Recommender System Using Hierarchical Classification

    Get PDF
    In data warehouses, lots of data are gathered which are navigated and explored for analytical purposes. Even for expert people, to handle such a large data is a tough task. Handling such a voluminous data is more difficult task for non-expert users or for users who are not familiar with the database schema. The aim of this paper is to help this class of users by recommending them SQL queries that they might use. These SQL recommendations are selected by tracking the users past behavior and comparing them with other users. At first time, users may not know where to start their exploration. Secondly, users may overlook queries which help to retrieve important information. The queries are recorded and compared using hierarchical classification which is then re-ranked according to relevance. The relevant queries are retrieved using users querying behavior. Users use a query interface to issue a series of SQL queries that aim to analyze the data and mine it for interesting information. DOI: 10.17762/ijritcc2321-8169.15067

    Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases

    Full text link
    A critical challenge in constructing a natural language interface to database (NLIDB) is bridging the semantic gap between a natural language query (NLQ) and the underlying data. Two specific ways this challenge exhibits itself is through keyword mapping and join path inference. Keyword mapping is the task of mapping individual keywords in the original NLQ to database elements (such as relations, attributes or values). It is challenging due to the ambiguity in mapping the user's mental model and diction to the schema definition and contents of the underlying database. Join path inference is the process of selecting the relations and join conditions in the FROM clause of the final SQL query, and is difficult because NLIDB users lack the knowledge of the database schema or SQL and therefore cannot explicitly specify the intermediate tables and joins needed to construct a final SQL query. In this paper, we propose leveraging information from the SQL query log of a database to enhance the performance of existing NLIDBs with respect to these challenges. We present a system Templar that can be used to augment existing NLIDBs. Our extensive experimental evaluation demonstrates the effectiveness of our approach, leading up to 138% improvement in top-1 accuracy in existing NLIDBs by leveraging SQL query log information.Comment: Accepted to IEEE International Conference on Data Engineering (ICDE) 201

    Multidimensional query recommendation

    Get PDF
    In this master thesis we will first summarize the recent efforts to support analytical tasks over relational sources. These efforts have pointed out the necessity to come up with flexible, powerful means for analyzing the issued queries and exploit them in decision oriented processes (such as query recommendation or similar). Issued queries should be decomposed, stored and manipulated in a dedicated subsystem. With this aim, we present a novel approach for representing SQL analytical queries in terms of a multidimensional algebra, which better characterizes the analytical efforts of the user. This thesis discusses how a SQL query can be formulated as a multidimensional algebraic characterization. Then, we discuss how to normalize them in order to bridge (i.e. collapse) several SQL queries into a single characterization (representing the analytical session), according to their logical connections. Afterwards, we talk about how this characterization can be exploited in a wide range of decisional tasks such as query recommendation and others. Finally, we present an implementation example of this approach with limiting it with normalization phase because it surprisingly turned out that it is hard enough to achieve with regards to time available. This implementation may later be upgraded to demonstrate the full potential of this novel approach. 1.3 Scop

    Principles of Query Visualization

    Full text link
    Query Visualization (QV) is the problem of transforming a given query into a graphical representation that helps humans understand its meaning. This task is notably different from designing a Visual Query Language (VQL) that helps a user compose a query. This article discusses the principles of relational query visualization and its potential for simplifying user interactions with relational data.Comment: 20 pages, 12 figures, preprint for IEEE Data Engineering Bulleti

    Data fusion of relational databases

    Get PDF
    Data analysis and discovery of relations between connected entity types within databases can be very labour and time intensive. The reason being that every database has its specific structure, which needs to be examined. In this thesis, we evaluated if the reconstruction error of the matrix factorization model can be used to relate tables or entity types within a database. To test this concept, we developed a Python module that connects to a database and returns a ranked list of relations (pairs of entity types). This enables the user to identify the most informative relations and explore them further
    • …
    corecore