7 research outputs found

    A Self-Service Supporting Business Intelligence and Big Data Analytics Architecture

    Get PDF
    Self-service Business Intelligence (SSBI) is an emerging topic for many companies. Casual users should be enabled to independently build their own analyses and reports. This accelerates and simplifies the decision-making processes. Although recent studies began to discuss parts of a self-service environment, none of these present a comprehensive architecture. Following a design science research approach, this study proposes a new self-service oriented BI architecture in order to address this gap. Starting from an in-depth literature review, an initial model was developed and improved by qualitative data analysis from interviews with 18 BI and IT specialists form companies across different industries. The proposed architecture model demonstrates the interaction between introduced self-service elements with each other and with traditional BI components. For example, we look at the integration of collaboration rooms and a self-learning knowledge database that aims to be a source for a report recommender

    On the Usefulness of SQL-Query-Similarity Measures to Find User Interests

    Get PDF
    In the sciences and elsewhere, the use of relational databases has become ubiquitous. An important challenge is finding hot spots of user interests. In principle, one can discover user interests by clustering the queries in the query log. Such a clustering requires a notion of query similarity. This, in turn, raises the question of what features of SQL queries are meaningful. We have studied the query representations proposed in the literature and corresponding similarity functions and have identified shortcomings of all of them. To overcome these limitations, we propose new similarity functions for SQL queries. They rely on the so-called access area of a query and, more specifically, on the overlap and the closeness of the access areas. We have carried out experiments systematically to compare the various similarity functions described in this article. The first series of experiments measures the quality of clustering and compares it to a ground truth. In the second series, we focus on the query log from the well-known SkyServer database. Here, a domain expert has interpreted various clusters by hand. We conclude that clusters obtained with our new measures of similarity seem to be good indicators of user interests

    SQL query log analysis for identifying user interests and query recommendations

    Get PDF
    In the sciences and elsewhere, the use of relational databases has become ubiquitous. To get maximum profit from a database, one should have in-depth knowledge in both SQL and a domain (data structure and meaning that a database contains). To assist inexperienced users in formulating their needs, SQL query recommendation system (SQL QRS) has been proposed. It utilizes the experience of previous users captured by SQL query log as well as the user query history to suggest. When constructing such a system, one should solve related problems: (1) clean the query log and (2) define appropriate query similarity functions. These two tasks are not only necessary for building SQL QRS, but they apply to other problems. In what follows, we describe three scenarios of SQL query log analysis: (1) cleaning an SQL query log, (2) SQL query log clustering when testing SQL query similarity functions and (3) recommending SQL queries. We also explain how these three branches are related to each other. Scenario 1. Cleaning SQL query log as a general pre-processing step The raw query log is often not suitable for query log analysis tasks such as clustering, giving recommendations. That is because it contains antipatterns and robotic data downloads, also known as Sliding Window Search (SWS). An antipattern in software engineering is a special case of a pattern. While a pattern is a standard solution, an antipattern is a pattern with a negative effect. When it comes to SQL query recommendation, leaving such artifacts in the log during analysis results in a wrong suggestion. Firstly, the behaviour of "mortal" users who need a recommendation is different from robots, which perform SWS. Secondly, one does not want to recommend antipatterns, so they need to be excluded from the query pool. Thirdly, the bigger a log is, the slower a recommendation engine operates. Thus, excluding SWS and antipatterns from the input data makes the recommendation better and faster. The effect of SWS and antipatterns on query log clustering depends on the chosen similarity function. The result can either (1) do not change or (2) add clusters which cover a big part of data. In any case, having antipatterns and SWS in an input log increases only the time one need to cluster and do not increase the quality of results. Scenario 2. Identifying User Interests via Clustering To identify the hot spots of user interests, one clusters SQL queries. In a scientific domain, it exposes research trends. In business, it points to popular data slices which one might want to refactor for better accessibility. A good clustering result must be precise (match ground truth) and interpretable. Query similarity relies on SQL query representation. There are three strategies to represent an SQL query. FB (feature-based) query representation sees a query as structure, not considering the data, a query accesses. WB (witness-based) approach treat a query as a set of tuples in the result set. AAB (access area-based) representation considers a query as an expression in relational algebra. While WB and FB query similarity functions are straightforward (Jaccard or cosine similarities), AAB query similarity requires additional definition. We proposed two variants of AAB similarity measure – overlap (AABovl) and closeness (AABcl). In AABovl, the similarity of two queries is the overlap of their access areas. AABcl relies on the distance between two access areas in the data space – two queries may be similar even if their access areas do not overlap. The extensive experiments consist of two parts. The first one is clustering a rather small dataset with ground truth. This experiment serves to study the precision of various similarity functions by comparing clustering results to supervised insights. The second experiment aims to investigate on the interpretability of clustering results with different similarity functions. It clusters a big real-world query log. The domain expert then evaluates the results. Both experiments show that AAB similarity functions produce better results in both precision and interpretability. Scenario 3. SQL Query Recommendation A sound SQL query recommendation system (1) provides a query which can be run directly, (2) supports comparison operators and various logical operators, (3) is scalable and has low response times, (4) provides recommendations of high quality. The existing approaches fail to fulfill all the requirements. We proposed DASQR, scalable and data-aware query recommendation to meet all four needs. In a nutshell, DASQR is a hybrid (collaborative filtering + content-based) approach. Its variations utilize all similarity functions, which we define or find in the related work. Measuring the quality of SQL query recommendation system (QRS) is particularly challenging since there is no standard way approaching it. Previous studies have evaluated the results using quality metrics which only rely on the query representations used in these studies. It is somewhat subjective since a similarity function and a quality metric are dependent. We propose AAB quality metrics and then evaluate each approach based on all the metrics. The experiments test DASQR approaches and competitors. Both performance and runtime experiments indicate that DASQR approaches outperform the existing ones

    A Collaborative Filtering Approach for Recommending OLAP Sessions

    Get PDF
    While OLAP has a key role in supporting effective exploration of multidimensional cubes, the huge number of aggregations and selections that can be operated on data may make the user experience disorientating. To address this issue, in the paper we propose a recommendation approach stemming from collaborative filtering. We claim that the whole sequence of queries belonging to an OLAP session is valuable because it gives the user a compound and synergic view of data; for this reason, our goal is not to recommend single OLAP queries but OLAP sessions. Like other collaborative approaches, ours features three phases: (i) search the log for sessions that bear some similarity with the one currently being issued by the user; (ii) extract the most relevant subsessions; and (iii) adapt the top-ranked subsession to the current user's session. However, it is the first that treats sessions as first-class citizens, using new techniques for comparing sessions, finding meaningful recommendation candidates, and adapting them to the current session. After describing our approach, we discuss the results of a large set of effectiveness and efficiency tests based on different measures of recommendation quality

    Інформаційна технологія побудови розподілених сховищ даних гібридного типу

    Get PDF
    У дисертаційній роботі вирішено актуальне науково-практичне завдання створення інформаційної технології побудови розподілених сховищ даних гібридного типу з врахуванням властивостей даних і статистики виконання запитів до сховища. Здійснено аналіз проблеми побудови сховищ даних з врахуванням властивостей даних і виконуваних запитів, обґрунтовано актуальність вирішення цієї проблеми. Визначено вимоги до інформаційної технології побудови розподілених сховищ гібридного типу. Введено поняття мультибазових сховищ даних, розроблено концептуальну, логічну та фізичну моделі таких сховищ і процедури міжрівневих переходів. Описано інтеграцію даних у сховище за допомогою процедур перетворення елементів даних і операцій, вибору моделей представлення даних. Розташування даних по вузлах, маршрути реплікації даних визначаються за критерієм мінімальної сукупної вартості збереження та обробки даних з використанням модифікованого генетичного алгоритму. На основі запропонованих моделей і методів створено інформаційну технологію побудови розподілених сховищ гібридного типу, яка вирішує поставлене наукове завдання. Зазначена технологія застосована при розробленні інформаційних та інформаційно-аналітичних систем Міністерства фінансів України. Результати впровадження підтвердили її відповідність поставленим вимогам

    Designing Cross-Company Business Intelligence Networks

    Get PDF
    Business Intelligence (BI) ist der allgemein akzeptierte Begriff für Methoden, Konzepte und Werkzeuge zur Sammlung, Aufbereitung, Speicherung, Verteilung und Analyse von Daten für Management- und Geschäftsentscheidungen. Obwohl unternehmensübergreifende Kooperation in den vergangenen Jahrzehnten stets an Einfluss gewonnen hat, existieren nur wenige Forschungsergebnisse im Bereich unternehmensübergreifender BI. Die vorliegende Arbeit stellt eine Arbeitsdefinition des Begriffs Cross-Company BI (CCBI) vor und grenzt diesen von gemeinschaftlicher Entscheidungsfindung ab. Auf Basis eines Referenzmodells, das existierende Arbeiten und Ansätze verwandter Forschungsbereiche berücksichtigt, werden umfangreiche Simulationen und Parametertests unternehmensübergreifender BI-Netzwerke durchgeführt. Es wird gezeigt, dass eine Peer-To-Peer-basierte Gestaltung der Netzwerke leistungsfähig und kompetitiv zu existierenden zentral-fokussierten Ansätzen ist. Zur Quantifizierung der Beobachtungen werden Messgrößen geprüft, die sich aus existierenden Konzepten zur Schemaüberführung multidimensionaler Daten sowie Überlegungen zur Daten- und Informationsqualität ableiten oder entwickeln lassen.Business Intelligence (BI) is a well-established term for methods, concepts and tools to retrieve, store, deliver and analyze data for management and business purposes. Although collaboration across company borders has substantially increased over the past decades, little research has been conducted specifically on Cross-Company BI (CCBI). In this thesis, a working definition and distinction from general collaborative decision making is proposed. Based on a reference model that takes existing research and related approaches of adjacent fields into account a peer-to-peer network design is created. With an extensive simulation and parameter testing it is shown that the design proves valuable and competitive to centralized approaches and that obtaining a critical mass of participants leads to improved usefulness of the network. To quantify the observations, appropriate quality measures rigorously derived from respected concepts on data and information quality and multidimensional data models are introduced and validated
    corecore