103 research outputs found
A Pattern Based Approach for Re-engineering Non-Ontological Resources into Ontologies
With the goal of speeding up the ontology development process, ontology engineers are starting to reuse as much as possible available ontologies and non-ontological resources such as classification schemes, thesauri, lexicons and folksonomies, that already have some degree of consensus. The reuse of such non-ontological resources necessarily involves their re-engineering into ontologies. Non-ontological resources are highly heterogeneous in their data model and contents: they encode different types of knowledge, and they can be modeled and implemented in different ways. In this paper we present (1) a typology for non-ontological resources, (2) a pattern based approach for re-engineering non-ontological resources into ontologies, and (3) a use case of the proposed approach
Practitioners’ view on command query responsibility segregation
Relational database management systems (RDBMS) have long been a predominant technology in information systems (IS). Today, however, the ever-changing technology landscape seems to be the proving grounds for many alternative approaches. For instance, alternative databases are currently used in many cloud services that affect everyday life. Similarly, a novel way to design applications has come to fruition. It relies on two concepts; command query responsibility segregation (CQRS) and event sourcing. A combination of the concepts is suggested to mitigate some performance and design issues that commonly arise in traditional information systems development (ISD). However, this particular approach hasn’t sparked interest from of academia yet. This inquiry sets out to find opportunities and challenges that arise from adoption of one of the two concepts, namely CQRS. This is done in relative isolation from event sourcing. In total five interviews were conducted with seven participants using open-ended interview questions derived from design patterns research. The results are five themes that provide guidance to IS professionals evaluating adoption. These are alignment between IT-artifacts and business processes, simultaneous development, flexibility from specific database technology, modularization as a means of implementation and risk of introducing complexity. The results indicate that several themes from domain-driven design are influential to the concept. Additionally, results indicate that CQRS may be a precursor to eventually consistent queries and aids fine-tuning of availability, consistency and partition tolerance considerations. It is concluded that CQRS may facilitate improved collaboration and ease distribution of work. Moreover, it is hoped that the results will help to contextualize CQRS and spark additional interest in the field of IS research. The inquiry suggests further inquiries in other areas. These are among others; extract transform load-patterns, operational transforms, probabilistic bounded staleness and occasionally connected systems
M2ORM2: A Model for the Transparent Management of Relationally Persistent Objects
Object-oriented application development often involves
storing application objects in a relational database.
Sometimes it is desirable to develop
the persistent classes and the relational database
in an independent way, and
to use an object persistent manager to connect them in a suitable way.
This paper introduces {M2ORM2,
a model for describing meet-in-the-middle mappings
between object schemas and relational schemas,
to support the transparent management of object persistence
by means of relational databases
A Paradigm for Learning Queries on Big Data
International audienceSpecifying a database query using a formal query language is typically a challenging task for non-expert users. In the context of big data, this problem becomes even harder as it requires the users to deal with database instances of big sizes and hence difficult to visualize. Such instances usually lack a schema to help the users specify their queries, or have an incomplete schema as they come from disparate data sources. In this paper, we propose a novel paradigm for interactive learning of queries on big data, without assuming any knowledge of the database schema. The paradigm can be applied to different database models and a class of queries adequate to the database model. In particular, in this paper we present two instantiations that validated the proposed paradigm for learning relational join queries and for learning path queries on graph databases. Finally, we discuss the challenges of employing the paradigm for further data models and for learning cross-model schema mappings
Scalable Model-Based Management of Correlated Dimensional Time Series in ModelarDB+
To monitor critical infrastructure, high quality sensors sampled at a high
frequency are increasingly used. However, as they produce huge amounts of data,
only simple aggregates are stored. This removes outliers and fluctuations that
could indicate problems. As a remedy, we present a model-based approach for
managing time series with dimensions that exploits correlation in and among
time series. Specifically, we propose compressing groups of correlated time
series using an extensible set of model types within a user-defined error bound
(possibly zero). We name this new category of model-based compression methods
for time series Multi-Model Group Compression (MMGC). We present the first MMGC
method GOLEMM and extend model types to compress time series groups. We propose
primitives for users to effectively define groups for differently sized data
sets, and based on these, an automated grouping method using only the time
series dimensions. We propose algorithms for executing simple and
multi-dimensional aggregate queries on models. Last, we implement our methods
in the Time Series Management System (TSMS) ModelarDB (ModelarDB+). Our
evaluation shows that compared to widely used formats, ModelarDB+ provides up
to 13.7 times faster ingestion due to high compression, 113 times better
compression due to the adaptivity of GOLEMM, 630 times faster aggregates by
using models, and close to linear scalability. It is also extensible and
supports online query processing.Comment: 12 Pages, 28 Figures, and 1 Tabl
Recommended from our members
Physical Plan Instrumentation in Databases: Mechanisms and Applications
Database management systems (DBMSs) are designed with the goal set to compile SQL queries to physical plans that, when executed, provide results to the SQL queries. Building on this functionality, an ever-increasing number of application domains (e.g., provenance management, online query optimization, physical database design, interactive data profiling, monitoring, and interactive data visualization) seek to operate on how queries are executed by the DBMS for a wide variety of purposes ranging from debugging and data explanation to optimization and monitoring. Unfortunately, DBMSs provide little, if any, support to facilitate the development of this class of important application domains. The effect is such that database application developers and database system architects either rewrite the database internals in ad-hoc ways; work around the SQL interface, if possible, with inevitable performance penalties; or even build new databases from scratch only to express and optimize their domain-specific application logic over how queries are executed.
To address this problem in a principled manner in this dissertation, we introduce a prototype DBMS, namely, Smoke, that exposes instrumentation mechanisms in the form of a framework to allow external applications to manipulate physical plans. Intuitively, a physical plan is the underlying representation that DBMSs use to encode how a SQL query will be executed, and providing instrumentation mechanisms at this representation level allows applications to express and optimize their logic on how queries are executed.
Having such an instrumentation-enabled DBMS in-place, we then consider how to express and optimize applications that rely their logic on how queries are executed. To best demonstrate the expressive and optimization power of instrumentation-enabled DBMSs, we express and optimize applications across several important domains including provenance management, interactive data visualization, interactive data profiling, physical database design, online query optimization, and query discovery. Expressivity-wise, we show that Smoke can express known techniques, introduce novel semantics on known techniques, and introduce new techniques across domains. Performance-wise, we show case-by-case that Smoke is on par with or up-to several orders of magnitudes faster than state-of-the-art imperative and declarative implementations of important applications across domains.
As such, we believe our contributions provide evidence and form the basis towards a class of instrumentation-enabled DBMSs with the goal set to express and optimize applications across important domains with core logic over how queries are executed by DBMSs
Automating the multidimensional design of data warehouses
Les experiències prèvies en l'àmbit dels magatzems de dades (o data warehouse), mostren que l'esquema multidimensional del data warehouse ha de ser fruit d'un enfocament híbrid; això és, una proposta que consideri tant els requeriments d'usuari com les fonts de dades durant el procés de disseny.Com a qualsevol altre sistema, els requeriments són necessaris per garantir que el sistema desenvolupat satisfà les necessitats de l'usuari. A més, essent aquest un procés de reenginyeria, les fonts de dades s'han de tenir en compte per: (i) garantir que el magatzem de dades resultant pot ésser poblat amb dades de l'organització, i, a més, (ii) descobrir capacitats d'anàlisis no evidents o no conegudes per l'usuari.Actualment, a la literatura s'han presentat diversos mètodes per donar suport al procés de modelatge del magatzem de dades. No obstant això, les propostes basades en un anàlisi dels requeriments assumeixen que aquestos són exhaustius, i no consideren que pot haver-hi informació rellevant amagada a les fonts de dades. Contràriament, les propostes basades en un anàlisi exhaustiu de les fonts de dades maximitzen aquest enfocament, i proposen tot el coneixement multidimensional que es pot derivar des de les fonts de dades i, conseqüentment, generen massa resultats. En aquest escenari, l'automatització del disseny del magatzem de dades és essencial per evitar que tot el pes de la tasca recaigui en el dissenyador (d'aquesta forma, no hem de confiar únicament en la seva habilitat i coneixement per aplicar el mètode de disseny elegit). A més, l'automatització de la tasca allibera al dissenyador del sempre complex i costós anàlisi de les fonts de dades (que pot arribar a ser inviable per grans fonts de dades).Avui dia, els mètodes automatitzables analitzen en detall les fonts de dades i passen per alt els requeriments. En canvi, els mètodes basats en l'anàlisi dels requeriments no consideren l'automatització del procés, ja que treballen amb requeriments expressats en llenguatges d'alt nivell que un ordenador no pot manegar. Aquesta mateixa situació es dona en els mètodes híbrids actual, que proposen un enfocament seqüencial, on l'anàlisi de les dades es complementa amb l'anàlisi dels requeriments, ja que totes dues tasques pateixen els mateixos problemes que els enfocament purs.En aquesta tesi proposem dos mètodes per donar suport a la tasca de modelatge del magatzem de dades: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Totes dues consideren els requeriments i les fonts de dades per portar a terme la tasca de modelatge i a més, van ser pensades per superar les limitacions dels enfocaments actuals.1. MDBE segueix un enfocament clàssic, en el que els requeriments d'usuari són coneguts d'avantmà. Aquest mètode es beneficia del coneixement capturat a les fonts de dades, però guia el procés des dels requeriments i, conseqüentment, és capaç de treballar sobre fonts de dades semànticament pobres. És a dir, explotant el fet que amb uns requeriments de qualitat, podem superar els inconvenients de disposar de fonts de dades que no capturen apropiadament el nostre domini de treball.2. A diferència d'MDBE, AMDO assumeix un escenari on es disposa de fonts de dades semànticament riques. Per aquest motiu, dirigeix el procés de modelatge des de les fonts de dades, i empra els requeriments per donar forma i adaptar els resultats generats a les necessitats de l'usuari. En aquest context, a diferència de l'anterior, unes fonts de dades semànticament riques esmorteeixen el fet de no tenir clars els requeriments d'usuari d'avantmà.Cal notar que els nostres mètodes estableixen un marc de treball combinat que es pot emprar per decidir, donat un escenari concret, quin enfocament és més adient. Per exemple, no es pot seguir el mateix enfocament en un escenari on els requeriments són ben coneguts d'avantmà i en un escenari on aquestos encara no estan clars (un cas recorrent d'aquesta situació és quan l'usuari no té clares les capacitats d'anàlisi del seu propi sistema). De fet, disposar d'uns bons requeriments d'avantmà esmorteeix la necessitat de disposar de fonts de dades semànticament riques, mentre que a l'inversa, si disposem de fonts de dades que capturen adequadament el nostre domini de treball, els requeriments no són necessaris d'avantmà. Per aquests motius, en aquesta tesi aportem un marc de treball combinat que cobreix tots els possibles escenaris que podem trobar durant la tasca de modelatge del magatzem de dades.Previous experiences in the data warehouse field have shown that the data warehouse multidimensional conceptual schema must be derived from a hybrid approach: i.e., by considering both the end-user requirements and the data sources, as first-class citizens. Like in any other system, requirements guarantee that the system devised meets the end-user necessities. In addition, since the data warehouse design task is a reengineering process, it must consider the underlying data sources of the organization: (i) to guarantee that the data warehouse must be populated from data available within the organization, and (ii) to allow the end-user discover unknown additional analysis capabilities.Currently, several methods for supporting the data warehouse modeling task have been provided. However, they suffer from some significant drawbacks. In short, requirement-driven approaches assume that requirements are exhaustive (and therefore, do not consider the data sources to contain alternative interesting evidences of analysis), whereas data-driven approaches (i.e., those leading the design task from a thorough analysis of the data sources) rely on discovering as much multidimensional knowledge as possible from the data sources. As a consequence, data-driven approaches generate too many results, which mislead the user. Furthermore, the design task automation is essential in this scenario, as it removes the dependency on an expert's ability to properly apply the method chosen, and the need to analyze the data sources, which is a tedious and timeconsuming task (which can be unfeasible when working with large databases). In this sense, current automatable methods follow a data-driven approach, whereas current requirement-driven approaches overlook the process automation, since they tend to work with requirements at a high level of abstraction. Indeed, this scenario is repeated regarding data-driven and requirement-driven stages within current hybrid approaches, which suffer from the same drawbacks than pure data-driven or requirement-driven approaches.In this thesis we introduce two different approaches for automating the multidimensional design of the data warehouse: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Both approaches were devised to overcome the limitations from which current approaches suffer. Importantly, our approaches consider opposite initial assumptions, but both consider the end-user requirements and the data sources as first-class citizens.1. MDBE follows a classical approach, in which the end-user requirements are well-known beforehand. This approach benefits from the knowledge captured in the data sources, but guides the design task according to requirements and consequently, it is able to work and handle semantically poorer data sources. In other words, providing high-quality end-user requirements, we can guide the process from the knowledge they contain, and overcome the fact of disposing of bad quality (from a semantical point of view) data sources.2. AMDO, as counterpart, assumes a scenario in which the data sources available are semantically richer. Thus, the approach proposed is guided by a thorough analysis of the data sources, which is properly adapted to shape the output result according to the end-user requirements. In this context, disposing of high-quality data sources, we can overcome the fact of lacking of expressive end-user requirements.Importantly, our methods establish a combined and comprehensive framework that can be used to decide, according to the inputs provided in each scenario, which is the best approach to follow. For example, we cannot follow the same approach in a scenario where the end-user requirements are clear and well-known, and in a scenario in which the end-user requirements are not evident or cannot be easily elicited (e.g., this may happen when the users are not aware of the analysis capabilities of their own sources). Interestingly, the need to dispose of requirements beforehand is smoothed by the fact of having semantically rich data sources. In lack of that, requirements gain relevance to extract the multidimensional knowledge from the sources.So that, we claim to provide two approaches whose combination turns up to be exhaustive with regard to the scenarios discussed in the literaturePostprint (published version
- …