2,578 research outputs found

    QueRIE: Collaborative Database Exploration

    Get PDF
    Interactive database exploration is a key task in information mining. However, users who lack SQL expertise or familiarity with the database schema face great difficulties in performing this task. To aid these users, we developed the QueRIE system for personalized query recommendations. QueRIE continuously monitors the user’s querying behavior and finds matching patterns in the system’s query log, in an attempt to identify previous users with similar information needs. Subsequently, QueRIE uses these “similar” users and their queries to recommend queries that the current user may find interesting. In this work we describe an instantiation of the QueRIE framework, where the active user’s session is represented by a set of query fragments. The recorded fragments are used to identify similar query fragments in the previously recorded sessions, which are in turn assembled in potentially interesting queries for the active user. We show through experimentation that the proposed method generates meaningful recommendations on real-life traces from the SkyServer database and propose a scalable design that enables the incremental update of similarities, making real-time computations on large amounts of data feasible. Finally, we compare this fragment-based instantiation with our previously proposed tuple-based instantiation discussing the advantages and disadvantages of each approach

    SQL query log analysis for identifying user interests and query recommendations

    Get PDF
    In the sciences and elsewhere, the use of relational databases has become ubiquitous. To get maximum profit from a database, one should have in-depth knowledge in both SQL and a domain (data structure and meaning that a database contains). To assist inexperienced users in formulating their needs, SQL query recommendation system (SQL QRS) has been proposed. It utilizes the experience of previous users captured by SQL query log as well as the user query history to suggest. When constructing such a system, one should solve related problems: (1) clean the query log and (2) define appropriate query similarity functions. These two tasks are not only necessary for building SQL QRS, but they apply to other problems. In what follows, we describe three scenarios of SQL query log analysis: (1) cleaning an SQL query log, (2) SQL query log clustering when testing SQL query similarity functions and (3) recommending SQL queries. We also explain how these three branches are related to each other. Scenario 1. Cleaning SQL query log as a general pre-processing step The raw query log is often not suitable for query log analysis tasks such as clustering, giving recommendations. That is because it contains antipatterns and robotic data downloads, also known as Sliding Window Search (SWS). An antipattern in software engineering is a special case of a pattern. While a pattern is a standard solution, an antipattern is a pattern with a negative effect. When it comes to SQL query recommendation, leaving such artifacts in the log during analysis results in a wrong suggestion. Firstly, the behaviour of "mortal" users who need a recommendation is different from robots, which perform SWS. Secondly, one does not want to recommend antipatterns, so they need to be excluded from the query pool. Thirdly, the bigger a log is, the slower a recommendation engine operates. Thus, excluding SWS and antipatterns from the input data makes the recommendation better and faster. The effect of SWS and antipatterns on query log clustering depends on the chosen similarity function. The result can either (1) do not change or (2) add clusters which cover a big part of data. In any case, having antipatterns and SWS in an input log increases only the time one need to cluster and do not increase the quality of results. Scenario 2. Identifying User Interests via Clustering To identify the hot spots of user interests, one clusters SQL queries. In a scientific domain, it exposes research trends. In business, it points to popular data slices which one might want to refactor for better accessibility. A good clustering result must be precise (match ground truth) and interpretable. Query similarity relies on SQL query representation. There are three strategies to represent an SQL query. FB (feature-based) query representation sees a query as structure, not considering the data, a query accesses. WB (witness-based) approach treat a query as a set of tuples in the result set. AAB (access area-based) representation considers a query as an expression in relational algebra. While WB and FB query similarity functions are straightforward (Jaccard or cosine similarities), AAB query similarity requires additional definition. We proposed two variants of AAB similarity measure – overlap (AABovl) and closeness (AABcl). In AABovl, the similarity of two queries is the overlap of their access areas. AABcl relies on the distance between two access areas in the data space – two queries may be similar even if their access areas do not overlap. The extensive experiments consist of two parts. The first one is clustering a rather small dataset with ground truth. This experiment serves to study the precision of various similarity functions by comparing clustering results to supervised insights. The second experiment aims to investigate on the interpretability of clustering results with different similarity functions. It clusters a big real-world query log. The domain expert then evaluates the results. Both experiments show that AAB similarity functions produce better results in both precision and interpretability. Scenario 3. SQL Query Recommendation A sound SQL query recommendation system (1) provides a query which can be run directly, (2) supports comparison operators and various logical operators, (3) is scalable and has low response times, (4) provides recommendations of high quality. The existing approaches fail to fulfill all the requirements. We proposed DASQR, scalable and data-aware query recommendation to meet all four needs. In a nutshell, DASQR is a hybrid (collaborative filtering + content-based) approach. Its variations utilize all similarity functions, which we define or find in the related work. Measuring the quality of SQL query recommendation system (QRS) is particularly challenging since there is no standard way approaching it. Previous studies have evaluated the results using quality metrics which only rely on the query representations used in these studies. It is somewhat subjective since a similarity function and a quality metric are dependent. We propose AAB quality metrics and then evaluate each approach based on all the metrics. The experiments test DASQR approaches and competitors. Both performance and runtime experiments indicate that DASQR approaches outperform the existing ones

    A Holistic Approach to OLAP Sessions Composition: The Falseto Experience

    Get PDF
    International audienceOLAP is the main paradigm for flexible and effective exploration of multidimensional cubes in data warehouses. During an OLAP session the user analyzes the results of a query and determines a new query that will give her a better understanding of information. Given the huge size of the data space, this exploration process is often tedious and may leave the user disoriented and frustrated. This paper presents an OLAP tool 1 named Falseto (Former AnalyticaL Sessions for lEss Tedious Olap), that is meant to assist query and session composition, by letting the user summarize, browse, query, and reuse former analytical sessions. Falseto's implementation on top of a formal framework is detailed. We also report the experiments we run to obtain and analyze real OLAP sessions and assess Falseto with them. Finally, we discuss how Falseto can be seen as a starting point for bridging OLAP with exploratory search, a search paradigm centered on the user and the evolution of her knowledge

    Web Service Discovery Based on Past User Experience

    Get PDF
    Web service technology provides a way for simplifying interoperability among different organizations. A piece of functionality available as a web service can be involved in a new business process. Given the steadily growing number of available web services, it is hard for developers to find services appropriate for their needs. The main research efforts in this area are oriented on developing a mechanism for semantic web service description and matching. In this paper, we present an alternative approach for supporting users in web service discovery. Our system implements the implicit culture approach for recommending web services to developers based on the history of decisions made by other developers with similar needs. We explain the main ideas underlying our approach and report on experimental results

    Predicting your next OLAP query based on recent analytical sessions

    Get PDF
    International audienceIn Business Intelligence systems, users interact with data warehouses by formulating OLAP queries aimed at exploring multidimensional data cubes. Being able to predict the most likely next queries would provide a way to recommend interesting queries to users on the one hand, and could improve the efficiency of OLAP sessions on the other. In particular, query recommendation would proactively guide users in data exploration and improve the quality of their interactive experience. In this paper, we propose a framework to predict the most likely next query and recommend this to the user. Our framework relies on a probabilistic user behavior model built by analyzing previous OLAP sessions and exploiting a query similarity metric. To gain insight in the recommendation precision and on what parameters it depends, we evaluate our approach using different quality assessments

    Content-Aware DataGuides for Indexing Large Collections of XML Documents

    Get PDF
    XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents
    • …
    corecore