2,578 research outputs found
QueRIE: Collaborative Database Exploration
Interactive database exploration is a key task in information mining. However, users who lack SQL expertise or familiarity with the database schema face great difficulties in performing this task. To aid these users, we developed the QueRIE system for personalized query recommendations. QueRIE continuously monitors the user’s querying behavior and finds matching patterns in the system’s query log, in an attempt to identify previous users with similar information needs. Subsequently, QueRIE uses these “similar” users and their queries to recommend queries that the current user may find interesting. In this work we describe an instantiation of the QueRIE framework, where the active user’s session is represented by a set of query fragments. The recorded fragments are used to identify similar query fragments in the previously recorded sessions, which are in turn assembled in potentially interesting queries for the active user. We show through experimentation that the proposed method generates meaningful recommendations on real-life traces from the SkyServer database and propose a scalable design that enables the incremental update of similarities, making real-time computations on large amounts of data feasible. Finally, we compare this fragment-based instantiation with our previously proposed tuple-based instantiation discussing the advantages and disadvantages of each approach
SQL query log analysis for identifying user interests and query recommendations
In the sciences and elsewhere, the use of relational databases has become ubiquitous.
To get maximum profit from a database, one should have in-depth knowledge in both
SQL and a domain (data structure and meaning that a database contains). To assist
inexperienced users in formulating their needs, SQL query recommendation system
(SQL QRS) has been proposed. It utilizes the experience of previous users captured by
SQL query log as well as the user query history to suggest. When constructing such
a system, one should solve related problems: (1) clean the query log and (2) define
appropriate query similarity functions. These two tasks are not only necessary for
building SQL QRS, but they apply to other problems. In what follows, we describe
three scenarios of SQL query log analysis: (1) cleaning an SQL query log, (2) SQL
query log clustering when testing SQL query similarity functions and (3) recommending
SQL queries. We also explain how these three branches are related to each other.
Scenario 1. Cleaning SQL query log as a general pre-processing step
The raw query log is often not suitable for query log analysis tasks such as clustering,
giving recommendations. That is because it contains antipatterns and robotic data
downloads, also known as Sliding Window Search (SWS). An antipattern in software
engineering is a special case of a pattern. While a pattern is a standard solution, an
antipattern is a pattern with a negative effect.
When it comes to SQL query recommendation, leaving such artifacts in the log during
analysis results in a wrong suggestion. Firstly, the behaviour of "mortal" users who
need a recommendation is different from robots, which perform SWS. Secondly, one
does not want to recommend antipatterns, so they need to be excluded from the query
pool. Thirdly, the bigger a log is, the slower a recommendation engine operates. Thus,
excluding SWS and antipatterns from the input data makes the recommendation
better and faster.
The effect of SWS and antipatterns on query log clustering depends on the chosen
similarity function. The result can either (1) do not change or (2) add clusters which
cover a big part of data. In any case, having antipatterns and SWS in an input log
increases only the time one need to cluster and do not increase the quality of results.
Scenario 2. Identifying User Interests via Clustering
To identify the hot spots of user interests, one clusters SQL queries. In a scientific
domain, it exposes research trends. In business, it points to popular data slices which
one might want to refactor for better accessibility. A good clustering result must be
precise (match ground truth) and interpretable.
Query similarity relies on SQL query representation. There are three strategies to
represent an SQL query. FB (feature-based) query representation sees a query as
structure, not considering the data, a query accesses. WB (witness-based) approach
treat a query as a set of tuples in the result set. AAB (access area-based) representation
considers a query as an expression in relational algebra. While WB and FB query
similarity functions are straightforward (Jaccard or cosine similarities), AAB query
similarity requires additional definition. We proposed two variants of AAB similarity
measure – overlap (AABovl) and closeness (AABcl). In AABovl, the similarity of two
queries is the overlap of their access areas. AABcl relies on the distance between two
access areas in the data space – two queries may be similar even if their access areas
do not overlap.
The extensive experiments consist of two parts. The first one is clustering a rather
small dataset with ground truth. This experiment serves to study the precision of
various similarity functions by comparing clustering results to supervised insights. The
second experiment aims to investigate on the interpretability of clustering results with
different similarity functions. It clusters a big real-world query log. The domain expert
then evaluates the results. Both experiments show that AAB similarity functions
produce better results in both precision and interpretability.
Scenario 3. SQL Query Recommendation
A sound SQL query recommendation system (1) provides a query which can be run
directly, (2) supports comparison operators and various logical operators, (3) is scalable
and has low response times, (4) provides recommendations of high quality. The existing
approaches fail to fulfill all the requirements. We proposed DASQR, scalable and
data-aware query recommendation to meet all four needs. In a nutshell, DASQR is
a hybrid (collaborative filtering + content-based) approach. Its variations utilize all
similarity functions, which we define or find in the related work.
Measuring the quality of SQL query recommendation system (QRS) is particularly
challenging since there is no standard way approaching it. Previous studies have
evaluated the results using quality metrics which only rely on the query representations
used in these studies. It is somewhat subjective since a similarity function and a
quality metric are dependent. We propose AAB quality metrics and then evaluate
each approach based on all the metrics.
The experiments test DASQR approaches and competitors. Both performance and
runtime experiments indicate that DASQR approaches outperform the existing ones
A Holistic Approach to OLAP Sessions Composition: The Falseto Experience
International audienceOLAP is the main paradigm for flexible and effective exploration of multidimensional cubes in data warehouses. During an OLAP session the user analyzes the results of a query and determines a new query that will give her a better understanding of information. Given the huge size of the data space, this exploration process is often tedious and may leave the user disoriented and frustrated. This paper presents an OLAP tool 1 named Falseto (Former AnalyticaL Sessions for lEss Tedious Olap), that is meant to assist query and session composition, by letting the user summarize, browse, query, and reuse former analytical sessions. Falseto's implementation on top of a formal framework is detailed. We also report the experiments we run to obtain and analyze real OLAP sessions and assess Falseto with them. Finally, we discuss how Falseto can be seen as a starting point for bridging OLAP with exploratory search, a search paradigm centered on the user and the evolution of her knowledge
Web Service Discovery Based on Past User Experience
Web service technology provides a way for simplifying interoperability among different organizations. A piece of functionality available as a web service can be involved in a new business process. Given the steadily growing number of available web services, it is hard for developers to find services appropriate for their needs. The main research efforts in this area are oriented on developing a mechanism for semantic web service description and matching. In this paper, we present an alternative approach for supporting users in web service discovery. Our system implements the implicit culture approach for recommending web services to developers based on the history of decisions made by other developers with similar needs. We explain the main ideas underlying our approach and report on experimental results
Predicting your next OLAP query based on recent analytical sessions
International audienceIn Business Intelligence systems, users interact with data warehouses by formulating OLAP queries aimed at exploring multidimensional data cubes. Being able to predict the most likely next queries would provide a way to recommend interesting queries to users on the one hand, and could improve the efficiency of OLAP sessions on the other. In particular, query recommendation would proactively guide users in data exploration and improve the quality of their interactive experience. In this paper, we propose a framework to predict the most likely next query and recommend this to the user. Our framework relies on a probabilistic user behavior model built by analyzing previous OLAP sessions and exploiting a query similarity metric. To gain insight in the recommendation precision and on what parameters it depends, we evaluate our approach using different quality assessments
Content-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with
textual content. However, most indexing approaches perform
structure and content matching independently, combining
the retrieved path and keyword occurrences in a third
step. This paper shows that retrieval in XML documents can
be accelerated significantly by processing text and structure
simultaneously during all retrieval phases. To this end,
the Content-Aware DataGuide (CADG) enhances the wellknown
DataGuide with (1) simultaneous keyword and path
matching and (2) a precomputed content/structure join. Extensive
experiments prove the CADG to be 50-90% faster
than the DataGuide for various sorts of query and document,
including difficult cases such as poorly structured
queries and recursive document paths. A new query classification
scheme identifies precise query characteristics with
a predominant influence on the performance of the individual
indices. The experiments show that the CADG is applicable
to many real-world applications, in particular large
collections of heterogeneously structured XML documents
- …