62 research outputs found
QueRIE: Collaborative Database Exploration
Interactive database exploration is a key task in information mining. However, users who lack SQL expertise or familiarity with the database schema face great difficulties in performing this task. To aid these users, we developed the QueRIE system for personalized query recommendations. QueRIE continuously monitors the user’s querying behavior and finds matching patterns in the system’s query log, in an attempt to identify previous users with similar information needs. Subsequently, QueRIE uses these “similar” users and their queries to recommend queries that the current user may find interesting. In this work we describe an instantiation of the QueRIE framework, where the active user’s session is represented by a set of query fragments. The recorded fragments are used to identify similar query fragments in the previously recorded sessions, which are in turn assembled in potentially interesting queries for the active user. We show through experimentation that the proposed method generates meaningful recommendations on real-life traces from the SkyServer database and propose a scalable design that enables the incremental update of similarities, making real-time computations on large amounts of data feasible. Finally, we compare this fragment-based instantiation with our previously proposed tuple-based instantiation discussing the advantages and disadvantages of each approach
Query Formulation and Recommendation for Relational Databases Using User Sessions and Collaborative Filtering
Structured Query Language (SQL) has a uniform structure over different programming languages. The queries fired on Database Management System (DBMS) contain textual information along with selected segments of data parsed by data base management system to fire it as a structured query. Currently DBA needs to execute complex queries on large databases. Many times user or DBA fires similar queries on database server to get useful information. The queries which are similar to each other can then be categorized into two types a) the tuples retrieved by SQL queries are similar b) the fragment of the queries are similar. System gives recommendation to those similar queries so that it saves the time of DBA to construct it again and again. Query suggestions given to DBA or users are known as Query Recommendation. To develop a Query Recommendation system many authors suggested the use of Query Log. Query suggestions are divided into two areas mainly Collaborative Recommendations and Single Log Recommendations. This system is designed by single or collaborative log using parameter known as mixing factor. In this paper we analyzed Sql query Recommendation concepts and their uses. There are basically two types of similarity measure for Query Recommendation considered in [1] such as 1) Fragment Based 2) Tuple Based. Here in this research paper we are motivated towards generating recommendations for nested SQL queries. We adopt hierarchical classification on query log to create classes of similar queries and further to generate recommendations for SQL Query we proceed with finding matching class from which the recommendations can be modeled.
DOI: 10.17762/ijritcc2321-8169.15070
SQL query log analysis for identifying user interests and query recommendations
In the sciences and elsewhere, the use of relational databases has become ubiquitous.
To get maximum profit from a database, one should have in-depth knowledge in both
SQL and a domain (data structure and meaning that a database contains). To assist
inexperienced users in formulating their needs, SQL query recommendation system
(SQL QRS) has been proposed. It utilizes the experience of previous users captured by
SQL query log as well as the user query history to suggest. When constructing such
a system, one should solve related problems: (1) clean the query log and (2) define
appropriate query similarity functions. These two tasks are not only necessary for
building SQL QRS, but they apply to other problems. In what follows, we describe
three scenarios of SQL query log analysis: (1) cleaning an SQL query log, (2) SQL
query log clustering when testing SQL query similarity functions and (3) recommending
SQL queries. We also explain how these three branches are related to each other.
Scenario 1. Cleaning SQL query log as a general pre-processing step
The raw query log is often not suitable for query log analysis tasks such as clustering,
giving recommendations. That is because it contains antipatterns and robotic data
downloads, also known as Sliding Window Search (SWS). An antipattern in software
engineering is a special case of a pattern. While a pattern is a standard solution, an
antipattern is a pattern with a negative effect.
When it comes to SQL query recommendation, leaving such artifacts in the log during
analysis results in a wrong suggestion. Firstly, the behaviour of "mortal" users who
need a recommendation is different from robots, which perform SWS. Secondly, one
does not want to recommend antipatterns, so they need to be excluded from the query
pool. Thirdly, the bigger a log is, the slower a recommendation engine operates. Thus,
excluding SWS and antipatterns from the input data makes the recommendation
better and faster.
The effect of SWS and antipatterns on query log clustering depends on the chosen
similarity function. The result can either (1) do not change or (2) add clusters which
cover a big part of data. In any case, having antipatterns and SWS in an input log
increases only the time one need to cluster and do not increase the quality of results.
Scenario 2. Identifying User Interests via Clustering
To identify the hot spots of user interests, one clusters SQL queries. In a scientific
domain, it exposes research trends. In business, it points to popular data slices which
one might want to refactor for better accessibility. A good clustering result must be
precise (match ground truth) and interpretable.
Query similarity relies on SQL query representation. There are three strategies to
represent an SQL query. FB (feature-based) query representation sees a query as
structure, not considering the data, a query accesses. WB (witness-based) approach
treat a query as a set of tuples in the result set. AAB (access area-based) representation
considers a query as an expression in relational algebra. While WB and FB query
similarity functions are straightforward (Jaccard or cosine similarities), AAB query
similarity requires additional definition. We proposed two variants of AAB similarity
measure – overlap (AABovl) and closeness (AABcl). In AABovl, the similarity of two
queries is the overlap of their access areas. AABcl relies on the distance between two
access areas in the data space – two queries may be similar even if their access areas
do not overlap.
The extensive experiments consist of two parts. The first one is clustering a rather
small dataset with ground truth. This experiment serves to study the precision of
various similarity functions by comparing clustering results to supervised insights. The
second experiment aims to investigate on the interpretability of clustering results with
different similarity functions. It clusters a big real-world query log. The domain expert
then evaluates the results. Both experiments show that AAB similarity functions
produce better results in both precision and interpretability.
Scenario 3. SQL Query Recommendation
A sound SQL query recommendation system (1) provides a query which can be run
directly, (2) supports comparison operators and various logical operators, (3) is scalable
and has low response times, (4) provides recommendations of high quality. The existing
approaches fail to fulfill all the requirements. We proposed DASQR, scalable and
data-aware query recommendation to meet all four needs. In a nutshell, DASQR is
a hybrid (collaborative filtering + content-based) approach. Its variations utilize all
similarity functions, which we define or find in the related work.
Measuring the quality of SQL query recommendation system (QRS) is particularly
challenging since there is no standard way approaching it. Previous studies have
evaluated the results using quality metrics which only rely on the query representations
used in these studies. It is somewhat subjective since a similarity function and a
quality metric are dependent. We propose AAB quality metrics and then evaluate
each approach based on all the metrics.
The experiments test DASQR approaches and competitors. Both performance and
runtime experiments indicate that DASQR approaches outperform the existing ones
Revisting SQL Query Recommender System Using Hierarchical Classification
For analytical purposes, lots of data are gathered which are gathered and explored in data warehouses. Even to handle such a large data is a tough task for expert people. For non-expert users or for users who are not familiar with the database schema, handling such a voluminous data is more difficult task. The aim of this paper is to facilitate this class of users by recommending them SQL queries that they may use. By following the users past behavior and comparing them with other users, these SQL recommendations are selected. Initially, users may not know from where they can start their exploration. Secondly, users may overlook queries which help them to retrieve important data. Using hierarchical classification, the queries are recorded and compared which is then re-ranked according to relevance. Using users querying behavior, the relevant queries are retrieved. To issue a series of SQL queries, users use a query interface which aim to analyze the data and mine it for interesting information.
DOI: 10.17762/ijritcc2321-8169.150614
Query Recommender System Using Hierarchical Classification
In data warehouses, lots of data are gathered which are navigated and explored for analytical purposes. Even for expert people, to handle such a large data is a tough task. Handling such a voluminous data is more difficult task for non-expert users or for users who are not familiar with the database schema. The aim of this paper is to help this class of users by recommending them SQL queries that they might use. These SQL recommendations are selected by tracking the users past behavior and comparing them with other users. At first time, users may not know where to start their exploration. Secondly, users may overlook queries which help to retrieve important information. The queries are recorded and compared using hierarchical classification which is then re-ranked according to relevance. The relevant queries are retrieved using users querying behavior. Users use a query interface to issue a series of SQL queries that aim to analyze the data and mine it for interesting information.
DOI: 10.17762/ijritcc2321-8169.15067
Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases
A critical challenge in constructing a natural language interface to database
(NLIDB) is bridging the semantic gap between a natural language query (NLQ) and
the underlying data. Two specific ways this challenge exhibits itself is
through keyword mapping and join path inference. Keyword mapping is the task of
mapping individual keywords in the original NLQ to database elements (such as
relations, attributes or values). It is challenging due to the ambiguity in
mapping the user's mental model and diction to the schema definition and
contents of the underlying database. Join path inference is the process of
selecting the relations and join conditions in the FROM clause of the final SQL
query, and is difficult because NLIDB users lack the knowledge of the database
schema or SQL and therefore cannot explicitly specify the intermediate tables
and joins needed to construct a final SQL query. In this paper, we propose
leveraging information from the SQL query log of a database to enhance the
performance of existing NLIDBs with respect to these challenges. We present a
system Templar that can be used to augment existing NLIDBs. Our extensive
experimental evaluation demonstrates the effectiveness of our approach, leading
up to 138% improvement in top-1 accuracy in existing NLIDBs by leveraging SQL
query log information.Comment: Accepted to IEEE International Conference on Data Engineering (ICDE)
201
Multidimensional query recommendation
In this master thesis we will first summarize the recent efforts to support analytical tasks over relational sources. These efforts have pointed out the necessity to come up with flexible, powerful means for analyzing the issued queries and exploit them in decision oriented processes (such as query recommendation or similar). Issued queries should be decomposed, stored and manipulated in a dedicated subsystem. With this aim, we present a novel approach for representing SQL analytical queries in terms of a multidimensional algebra, which better characterizes the analytical efforts of the user. This thesis discusses how a SQL query can be formulated as a multidimensional algebraic characterization. Then, we discuss how to normalize them in order to bridge (i.e. collapse) several SQL queries into a single characterization (representing the analytical session), according to their logical connections. Afterwards, we talk about how this characterization can be exploited in a wide range of decisional tasks such as query recommendation and others. Finally, we present an implementation example of this approach with limiting it with normalization phase because it surprisingly turned out that it is hard enough to achieve with regards to time available. This implementation may later be upgraded to demonstrate the full potential of this novel approach.
1.3 Scop
Principles of Query Visualization
Query Visualization (QV) is the problem of transforming a given query into a
graphical representation that helps humans understand its meaning. This task is
notably different from designing a Visual Query Language (VQL) that helps a
user compose a query. This article discusses the principles of relational query
visualization and its potential for simplifying user interactions with
relational data.Comment: 20 pages, 12 figures, preprint for IEEE Data Engineering Bulleti
Data fusion of relational databases
Data analysis and discovery of relations between connected entity types within databases can be very labour and time intensive. The reason being that every database has its specific structure, which needs to be examined.
In this thesis, we evaluated if the reconstruction error of the matrix factorization model can be used to relate tables or entity types within a database. To test this concept, we developed a Python module that connects to a database and returns a ranked list of relations (pairs of entity types). This enables the user to identify the most informative relations and explore them further
- …