110 research outputs found

    Storing and Querying Probabilistic XML Using a Probabilistic Relational DBMS

    Get PDF
    This work explores the feasibility of storing and querying probabilistic XML in a probabilistic relational database. Our approach is to adapt known techniques for mapping XML to relational data such that the possible worlds are preserved. We show that this approach can work for any XML-to-relational technique by adapting a representative schema-based (inlining) as well as a representative schemaless technique (XPath Accelerator). We investigate the maturity of probabilistic rela- tional databases for this task with experiments with one of the state-of- the-art systems, called Trio

    Rank-aware, Approximate Query Processing on the Semantic Web

    Get PDF
    Search over the Semantic Web corpus frequently leads to queries having large result sets. So, in order to discover relevant data elements, users must rely on ranking techniques to sort results according to their relevance. At the same time, applications oftentimes deal with information needs, which do not require complete and exact results. In this thesis, we face the problem of how to process queries over Web data in an approximate and rank-aware fashion

    EXODuS: Exploratory OLAP over Document Stores

    Get PDF
    OLAP has been extensively used for a couple of decades as a data analysis approach to support decision making on enterprise structured data. Now, with the wide diffusion of NoSQL databases holding semi-structured data, there is a growing need for enabling OLAP on document stores as well, to allow non-expert users to get new insights and make better decisions. Unfortunately, due to their schemaless nature, document stores are hardly accessible via direct OLAP querying. In this paper we propose EXODuS, an interactive, schema-on-read approach to enable OLAP querying of document stores in the context of self-service BI and exploratory OLAP. To discover multidimensional hierarchies in document stores we adopt a data-driven approach based on the mining of approximate functional dependencies; to ensure good performances, we incrementally build local portions of hierarchies for the levels involved in the current user query. Users execute an analysis session by expressing well-formed multidimensional queries related by OLAP operations; these queries are then translated into the native query language of MongoDB, one of the most popular document-based DBMS. An experimental evaluation on real-world datasets shows the efficiency of our approach and its compatibility with a real-time setting

    Benchmarking Big Data OLAP NoSQL Databases

    Get PDF
    With the advent of Big Data, new challenges have emerged regarding the evaluation of decision support systems (DSS). Existing evaluation benchmarks are not configured to handle a massive data volume and wide data diversity. In this paper, we introduce a new DSS benchmark that supports multiple data storage systems, such as relational and Not Only SQL (NoSQL) systems. Our scheme recognizes numerous data models (snowflake, star and flat topologies) and several data formats (CSV, JSON, TBL, XML, etc.). It entails complex data generation characterized within “volume, variety, and velocity” framework (3 V). Next, our scheme enables distributed and parallel data generation. Furthermore, we exhibit some experimental results with KoalaBench

    MongoDB Support for UnifiedPush Server

    Get PDF
    Tato diplomová práce se zabývá návrhem a implementací rozšíření pro UnifiedPush Server, které serveru umožní přistupovat k nerelační databázi MongoDB a využívá potenciál horiznotální škálovatelnosti neralačních databází. Součástí práce je i návrh výkonnostních testů a porovnání výkonu při behu na jednom a vícero uzlích, návrh migračního scénáře z MySQL na MongoDB, identifikace úzkých míst. Aplikace je implementována v jazyce Java a využívá Java Persistence API pro přístup k databázím. Pro přístup k nerelačním databázím používá implementaci standardu JPA Hibernate OGM.This thesis describes the design and implementation of extension for UnifiedPush Server, which allows the server to access non-relational MongoDB database and leverages the horizontal scalability potential of non-relational databases. The work includes a proposal for performance tests and compares results of single and multi node solutions, design migration scenario from MySQL to MongoDB, identification of bottlenecks. The application is implemented in Java and uses Java Persistence API for accessing databases. To access non-relational databases uses implementation of the JPA standard called Hibernate OGM.

    Business Intelligence on Non-Conventional Data

    Get PDF
    The revolution in digital communications witnessed over the last decade had a significant impact on the world of Business Intelligence (BI). In the big data era, the amount and diversity of data that can be collected and analyzed for the decision-making process transcends the restricted and structured set of internal data that BI systems are conventionally limited to. This thesis investigates the unique challenges imposed by three specific categories of non-conventional data: social data, linked data and schemaless data. Social data comprises the user-generated contents published through websites and social media, which can provide a fresh and timely perception about people’s tastes and opinions. In Social BI (SBI), the analysis focuses on topics, meant as specific concepts of interest within the subject area. In this context, this thesis proposes meta-star, an alternative strategy to the traditional star-schema for modeling hierarchies of topics to enable OLAP analyses. The thesis also presents an architectural framework of a real SBI project and a cross-disciplinary benchmark for SBI. Linked data employ the Resource Description Framework (RDF) to provide a public network of interlinked, structured, cross-domain knowledge. In this context, this thesis proposes an interactive and collaborative approach to build aggregation hierarchies from linked data. Schemaless data refers to the storage of data in NoSQL databases that do not force a predefined schema, but let database instances embed their own local schemata. In this context, this thesis proposes an approach to determine the schema profile of a document-based database; the goal is to facilitate users in a schema-on-read analysis process by understanding the rules that drove the usage of the different schemata. A final and complementary contribution of this thesis is an innovative technique in the field of recommendation systems to overcome user disorientation in the analysis of a large and heterogeneous wealth of data

    A Survey on Mapping Semi-Structured Data and Graph Data to Relational Data

    Get PDF
    The data produced by various services should be stored and managed in an appropriate format for gaining valuable knowledge conveniently. This leads to the emergence of various data models, including relational, semi-structured, and graph models, and so on. Considering the fact that the mature relational databases established on relational data models are still predominant in today's market, it has fueled interest in storing and processing semi-structured data and graph data in relational databases so that mature and powerful relational databases' capabilities can all be applied to these various data. In this survey, we review existing methods on mapping semi-structured data and graph data into relational tables, analyze their major features, and give a detailed classification of those methods. We also summarize the merits and demerits of each method, introduce open research challenges, and present future research directions. With this comprehensive investigation of existing methods and open problems, we hope this survey can motivate new mapping approaches through drawing lessons from eachmodel's mapping strategies, aswell as a newresearch topic - mapping multi-model data into relational tables.Peer reviewe

    Usage Statistics

    Get PDF
    Trabalho de projeto de mestrado, Informática, 2022, Universidade de Lisboa, Faculdade de CiênciasBy looking at logs, metrics and traces, it is possible to infer what is going on inside a system, in order to detect problems or inefficiencies. Quidgest is a global technological company and since its establishment in 1988, it has pioneered the use of AI applied to modelling and automatic generation of software. Genio is a tool that allows functional specialists and analysts to build and support information systems. Quidgest needs to evaluate the performance of the systems generated by Genio. The goal of this work is to integrate a dashboard with usage and performance statistics into the administration interface of the solutions generated by Genio. To generate mock metrics QuidServer is used, a windows service developed and used by Quidgest. This service reads and configures long duration processes and calls external services that run those processes. With this, an event collection agent was developed in C Sharp. This agent reads events in real time, parses them and writes them into an InfluxDB bucket. With InfluxDB, it is possible to create continuous queries that run automatically and periodically to downsample the data as needed. Grafana is then used to create dashboard, which allows for simultaneous visualization of different data. Having the processing time of received messages as a metric implemented by the company, a new metric is implemented, the number of invoked scheduling tasks, as well as the processing time of said tasks. This is done to understand how metrics are collected from a system: measuring data and aggregating it into metrics that can be sent through ETW events to be captured by the metric collection agent. Finally, Docker is used to run InfluxDB in one container and Grafana in another, which allows for the automation of the installation and configuration of InfluxDB and Grafana
    corecore