127 research outputs found

    Describing and Assessing Cubes Through Intentional Analytics

    Get PDF
    The Intentional Analytics Model (IAM) has been envisioned as a way to tightly couple OLAP and analytics by (i) letting users explore multidimensional cubes stating their intentions, and (ii) returning multidimensional data coupled with knowledge insights in the form of annotations of subsets of data. Goal of this demonstration is to showcase the IAM approach using a notebook where the user can create a data exploration session by writing describe and assess statements, whose results are displayed by combining tabular data and charts so as to bring the highlights discovered to the user's attention. The demonstration plan will show the effectiveness of the IAM approach in supporting data exploration and analysis and its added value as compared to a traditional OLAP session by proposing two scenarios with guided interaction and letting users run custom sessions

    Conversational OLAP in Action

    Get PDF
    The democratization of data access and the adoption of OLAP in scenarios requiring hand-free interfaces push towards the creation of smart OLAP interfaces. In this demonstration we present COOL, a tool supporting natural language COnversational OLap sessions. COOL interprets and translates a natural language dialogue into an OLAP session that starts with a GPSJ (Generalized Projection, Selection and Join) query. The interpretation relies on a formal grammar and a knowledge base storing metadata from a multidimensional cube. COOL is portable, robust, and requires minimal user intervention. It adopts an n-gram based model and a string similarity function to match known entities in the natural language description. In case of incomplete text description, COOL can obtain the correct query either through automatic inference or through interactions with the user to disambiguate the text. The goal of the demonstration is to let the audience evaluate the usability of COOL and its capabilities in assisting query formulation and ambiguity/error resolution

    A Model-Driven Approach to Automate Data Visualization in Big Data Analytics

    Get PDF
    In big data analytics, advanced analytic techniques operate on big data sets aimed at complementing the role of traditional OLAP for decision making. To enable companies to take benefit of these techniques despite the lack of in-house technical skills, the H2020 TOREADOR Project adopts a model-driven architecture for streamlining analysis processes, from data preparation to their visualization. In this paper we propose a new approach named SkyViz focused on the visualization area, in particular on (i) how to specify the user's objectives and describe the dataset to be visualized, (ii) how to translate this specification into a platform-independent visualization type, and (iii) how to concretely implement this visualization type on the target execution platform. To support step (i) we define a visualization context based on seven prioritizable coordinates for assessing the user's objectives and conceptually describing the data to be visualized. To automate step (ii) we propose a skyline-based technique that translates a visualization context into a set of most-suitable visualization types. Finally, to automate step (iii) we propose a skyline-based technique that, with reference to a specific platform, finds the best bindings between the columns of the dataset and the graphical coordinates used by the visualization type chosen by the user. SkyViz can be transparently extended to include more visualization types on the one hand, more visualization coordinates on the other. The paper is completed by an evaluation of SkyViz based on a case study excerpted from the pilot applications of the TOREADOR Project

    Crop Management with the IoT: an Interdisciplinary Survey

    Get PDF
    In this study we analyze how crop management is going to benefit from the Internet of Things providing an overview of its architecture and components from an agronomic and a technological perspective. The present analysis highlights that IoT is a mature enabling technology, with articulated hardware and software components. Cheap networked devices may sense crop fields at a finer grain, to give timeliness warnings on stress conditions and the presence of disease to a wider range of farmers. Cloud computing allows to reliably store and access heterogeneous data, developing and deploy farm services. From this study emerges that IoT is also going to increase attention to sensor quality and placement protocol, while Machine Learning should be oriented to produce understandable knowledge, which is also useful to enhance Cropping System Simulation systems

    Schema Profiling of Document Stores

    Get PDF
    In document stores, schema is a soft concept and the documents in a collection can have different schemata; this gives designers and implementers augmented flexibility but requires an extra effort to understand the rules that drove the use of alternative schemata when heterogeneous documents are to be analyzed or integrated. In this paper we outline a technique, called schema profiling, to explain the schema variants within a collection in document stores by capturing the hidden rules explaining the use of these variants; we express these rules in the form of a decision tree, called schema profile, whose main feature is the coexistence of value-based and schema-based conditions. Consistently with the requirements we elicited from real users, we aim at creating explicative, precise, and concise schema profiles; to quantitatively assess these qualities we introduce a novel measure of entropy

    A comprehensive approach to data warehouse testing

    Full text link
    Testing is an essential part of the design life-cycle of any software product. Nevertheless, while most phases of data warehouse design have received considerable attention in the literature, not much has been said about data warehouse testing. In this paper we introduce a number of data mart-specific testing activities, we classify them in terms of what is tested and how it is tested, and we discuss how they can be framed within a reference design methodology. Categories and Subject Descriptor

    Interactive Multidimensional Modeling of Linked Data for Exploratory OLAP

    Get PDF
    Exploratory OLAP aims at coupling the precision and detail of corporate data with the information wealth of LOD. While some techniques to create, publish, and query RDF cubes are already available, little has been said about how to contextualize these cubes with situational data in an on-demand fashion. In this paper we describe an approach, called iMOLD, that enables non-technical users to enrich an RDF cube with multidimensional knowledge by discovering aggregation hierarchies in LOD. This is done through a user-guided process that recognizes in the LOD the recurring modeling patterns that express roll- up relationships between RDF concepts, then translates these patterns into aggregation hierarchies to enrich the RDF cube. Two families of aggregation patterns are identified, based on associations and generalization respectively, and the algorithms for recognizing them are described. To evaluate iMOLD in terms of efficiency and effectiveness we compare it with a related approach in the literature, we propose a case study based on DBpedia, and we discuss the results of a test made with real users

    Interactive multidimensional modeling of linked data for exploratory OLAP

    Get PDF
    Exploratory OLAP aims at coupling the precision and detail of corporate data with the information wealth of LOD. While some techniques to create, publish, and query RDF cubes are already available, little has been said about how to contextualize these cubes with situational data in an on-demand fashion. In this paper we describe an approach, called iMOLD, that enables non-technical users to enrich an RDF cube with multidimensional knowledge by discovering aggregation hierarchies in LOD. This is done through a user-guided process that recognizes in the LOD the recurring modeling patterns that express roll-up relationships between RDF concepts, then translates these patterns into aggregation hierarchies to enrich the RDF cube. Two families of aggregation patterns are identified, based on associations and generalization respectively, and the algorithms for recognizing them are described. To evaluate iMOLD in terms of efficiency and effectiveness we compare it with a related approach in the literature, we propose a case study based on DBpedia, and we discuss the results of a test made with real users.Peer ReviewedPostprint (author's final draft

    A reference architecture and model for sensor data warehousing

    Get PDF
    Funding: UK EPSRC under grant number EP/N007565/1, “Science of Sensor Systems Software”.Sensor data is becoming far more available thanks to the growth in both sensor systems and Internet of Things devices. Much of the value of sensor data comes from examining trends that occur over long timescales, ranging from hours to years. However, making use of data a long time after it has been collected has significant implications for the data-handling systems used to manage it. In particular, the data must be contextualised into the environment in which it was collected to avoid misleading (and potentially dangerous) mis-interpretation. We apply data warehousing techniques to develop an extensible model to capture contextual metadata alongside sensor datasets, and show how this can be used to support the analysis of datasets long after collection. We present our baseline reference framework for sensor context and derive multidimensional schemata representing different modelling and analysis scenarios. Finally, we exercise the model with two case studies.PostprintPeer reviewe
    • …
    corecore