Search CORE

12,210 research outputs found

NOSQL design for analytical workloads: Variability matters

Author: Abelló Gamazo Alberto
Herrero Otal Víctor
Romero Moral Óscar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Big Data has recently gained popularity and has strongly questioned relational databases as universal storage systems, especially in the presence of analytical workloads. As result, co-relational alternatives, commonly known as NOSQL (Not Only SQL) databases, are extensively used for Big Data. As the primary focus of NOSQL is on performance, NOSQL databases are directly designed at the physical level, and consequently the resulting schema is tailored to the dataset and access patterns of the problem in hand. However, we believe that NOSQL design can also benefit from traditional design approaches. In this paper we present a method to design databases for analytical workloads. Starting from the conceptual model and adopting the classical 3-phase design used for relational databases, we propose a novel design method considering the new features brought by NOSQL and encompassing relational and co-relational design altogether.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Conceptual Modelling and The Quality of Ontologies: Endurantism Vs. Perdurantism

Author: Al Asswad
Mark Lycett
Mohammad Mourhaf
Mutaz M Al-Debei
Sergio De Cesare
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 01/01/2012
Field of study

Ontologies are key enablers for sharing precise and machine-understandable semantics among different applications and parties. Yet, for ontologies to meet these expectations, their quality must be of a good standard. The quality of an ontology is strongly based on the design method employed. This paper addresses the design problems related to the modelling of ontologies, with specific concentration on the issues related to the quality of the conceptualisations produced. The paper aims to demonstrate the impact of the modelling paradigm adopted on the quality of ontological models and, consequently, the potential impact that such a decision can have in relation to the development of software applications. To this aim, an ontology that is conceptualised based on the Object-Role Modelling (ORM) approach (a representative of endurantism) is re-engineered into a one modelled on the basis of the Object Paradigm (OP) (a representative of perdurantism). Next, the two ontologies are analytically compared using the specified criteria. The conducted comparison highlights that using the OP for ontology conceptualisation can provide more expressive, reusable, objective and temporal ontologies than those conceptualised on the basis of the ORM approach

arXiv.org e-Print Archive

CiteSeerX

Explain3D: Explaining Disagreements in Disjoint Datasets

Author: Wang Xiaolan
Meliou Alexandra
Publication venue
Publication date: 24/02/1911
Field of study

Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the differences in the results of two semantically similar queries operating on two datasets with potentially different schemas. Our framework leverages the queries to perform a semantic mapping across the relevant parts of their provenance; discrepancies in this mapping point to causes of the queries' differences. Exploiting the queries gives Explain3D an edge over traditional schema matching and record linkage techniques, which are query-agnostic. Our work makes the following contributions: (1) We formalize the problem of deriving optimal explanations for the differences of the results of semantically similar queries over disjoint datasets. (2) We design a 3-stage framework for solving the optimal explanation problem. (3) We develop a smart-partitioning optimizer that improves the efficiency of the framework by orders of magnitude. (4)~We experiment with real-world and synthetic data to demonstrate that Explain3D can derive precise explanations efficiently

arXiv.org e-Print Archive

Trinity College

A Call to Arms: Revisiting Database Design

Author: Antonio Badia
Berners-Lee T.
Chauduri S.
Daniel Lemire
Golab L.
Helland P.
Kiely G.
Nagarajan S.
Olivé A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Good database design is crucial to obtain a sound, consistent database, and - in turn - good database design methodologies are the best way to achieve the right design. These methodologies are taught to most Computer Science undergraduates, as part of any Introduction to Database class. They can be considered part of the "canon", and indeed, the overall approach to database design has been unchanged for years. Moreover, none of the major database research assessments identify database design as a strategic research direction. Should we conclude that database design is a solved problem? Our thesis is that database design remains a critical unsolved problem. Hence, it should be the subject of more research. Our starting point is the observation that traditional database design is not used in practice - and if it were used it would result in designs that are not well adapted to current environments. In short, database design has failed to keep up with the times. In this paper, we put forth arguments to support our viewpoint, analyze the root causes of this situation and suggest some avenues of research.Comment: Removed spurious column break. Nothing else was change

arXiv.org e-Print Archive

CiteSeerX

R-libre

Crossref

Testing and test-driven development of conceptual schemas

Author: Tort Pugibet Albert
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2012
Field of study

The traditional focus for Information Systems (IS) quality assurance relies on the evaluation of its implementation. However, the quality of an IS can be largely determined in the first stages of its development. Several studies reveal that more than half the errors that occur during systems development are requirements errors. A requirements error is defined as a mismatch between requirements specification and stakeholders¿ needs and expectations. Conceptual modeling is an essential activity in requirements engineering aimed at developing the conceptual schema of an IS. The conceptual schema is the general knowledge that an IS needs to know in order to perform its functions. A conceptual schema specification has semantic quality when it is valid and complete. Validity means that the schema is correct (the knowledge it defines is true for the domain) and relevant (the knowledge it defines is necessary for the system). Completeness means that the conceptual schema includes all relevant knowledge. The validation of a conceptual schema pursues the detection of requirements errors in order to improve its semantic quality. Conceptual schema validation is still a critical challenge in requirements engineering. In this work we contribute to this challenge, taking into account that, since conceptual schemas of IS can be specified in executable artifacts, they can be tested. In this context, the main contributions of this Thesis are (1) an approach to test conceptual schemas of information systems, and (2) a novel method for the incremental development of conceptual schemas supported by continuous test-driven validation. As far as we know, this is the first work that proposes and implements an environment for automated testing of UML/OCL conceptual schemas, and the first work that explores the use of test-driven approaches in conceptual modeling. The testing of conceptual schemas may be an important and practical means for their validation. It allows checking correctness and completeness according to stakeholders¿ needs and expectations. Moreover, in conjunction with the automatic check of basic test adequacy criteria, we can also analyze the relevance of the elements defined in the schema. The testing environment we propose requires a specialized language for writing tests of conceptual schemas. We defined the Conceptual Schema Testing Language (CSTL), which may be used to specify automated tests of executable schemas specified in UML/OCL. We also describe a prototype implementation of a test processor that makes feasible the approach in practice. The conceptual schema testing approach supports test-last validation of conceptual schemas, but it also makes sense to test incomplete conceptual schemas while they are developed. This fact lays the groundwork of Test-Driven Conceptual Modeling (TDCM), which is our second main contribution. TDCM is a novel conceptual modeling method based on the main principles of Test-Driven Development (TDD), an extreme programming method in which a software system is developed in short iterations driven by tests. We have applied the method in several case studies, in the context of Design Research, which is the general research framework we adopted. Finally, we also describe an integration approach of TDCM into a broad set of software development methodologies, including the Unified Process development methodology, MDD-based approaches, storytest-driven agile methods and goal and scenario-oriented requirements engineering methods.Els enfocaments per assegurar la qualitat deis sistemes d'informació s'han basal tradicional m en! en l'avaluació de la seva implementació. No obstan! aix6, la qualitat d'un sis tema d'informació pot ser ampliament determinada en les primeres fases del seu desenvolupament. Diversos estudis indiquen que més de la meitat deis errors de software són errors de requisits . Un error de requisit es defineix com una desalineació entre l'especificació deis requisits i les necessitats i expectatives de les parts im plicades (stakeholders ). La modelització conceptual és una activitat essencial en l'enginyeria de requisits , l'objectiu de la qual és desenvolupar !'esquema conceptual d'un sistema d'informació. L'esquema conceptual és el coneixement general que un sistema d'informació requereix per tal de desenvolupar les seves funcions . Un esquema conceptual té qualitat semantica quan és va lid i complet. La valides a implica que !'esquema sigui correcte (el coneixement definit és cert peral domini) i rellevant (el coneixement definit és necessari peral sistema). La completes a significa que !'esquema conceptual inclou tot el coneixement rellevant. La validació de !'esquema conceptual té coma objectiu la detecció d'errors de requisits per tal de millorar la qualitat semantica. La validació d'esquemes conceptuals és un repte crític en l'enginyeria de requisits . Aquesta te si contribueix a aquest repte i es basa en el fet que els es quemes conceptuals de sistemes d'informació poden ser especificats en artefactes executables i, per tant, poden ser provats. Les principals contribucions de la te si són (1) un enfocament pera les pro ves d'esquemes conceptuals de sistemes d'informació, i (2) una metodología innovadora pel desenvolupament incremental d'esquemes conceptuals assistit per una validació continuada basada en proves . Les pro ves d'esquemes conceptuals poden ser una im portant i practica técnica pera la se va validació, jaque permeten provar la correctesa i la completesa d'acord ambles necessitats i expectatives de les parts interessades. En conjunció amb la comprovació d'un conjunt basic de criteris d'adequació de les proves, també podem analitzar la rellevancia deis elements definits a !'esquema. L'entorn de test proposat inclou un llenguatge especialitzat per escriure proves automatitzades d'esquemes conceptuals, anomenat Conceptual Schema Testing Language (CSTL). També hem descrit i implementa! a un prototip de processador de tes tos que fa possible l'aplicació de l'enfocament proposat a la practica. D'acord amb l'estat de l'art en validació d'esquemes conceptuals , aquest és el primer treball que proposa i implementa un entorn pel testing automatitzat d'esquemes conceptuals definits en UML!OCL. L'enfocament de proves d'esquemes conceptuals permet dura terme la validació d'esquemes existents , pero també té sentit provar es quemes conceptuals incomplets m entre estant sent desenvolupats. Aquest fet és la base de la metodología Test-Driven Conceptual Modeling (TDCM), que és la segona contribució principal. El TDCM és una metodología de modelització conceptual basada en principis basics del Test-Driven Development (TDD), un métode de programació en el qual un sistema software és desenvolupat en petites iteracions guiades per proves. També hem aplicat el métode en diversos casos d'estudi en el context de la metodología de recerca Design Science Research. Finalment, hem proposat enfocaments d'integració del TDCM en diverses metodologies de desenvolupament de software

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Potentially Polluting Marine Sites GeoDB: An S-100 Geospatial Database as an Effective Contribution to the Protection of the Marine Environment

Author: Alexander Lee
Calder Brian R.
Masetti Giuseppe
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 29/08/2012
Field of study

Potentially Polluting Marine Sites (PPMS) are objects on, or areas of, the seabed that may release pollution in the future. A rationale for, and design of, a geospatial database to inventory and manipu-late PPMS is presented. Built as an S-100 Product Specification, it is specified through human-readable UML diagrams and implemented through machine-readable GML files, and includes auxiliary information such as pollution-control resources and potentially vulnerable sites in order to support analyses of the core data. The design and some aspects of implementation are presented, along with metadata requirements and structure, and a perspective on potential uses of the database

University of New Brunswick: Centre for Digital Scholarship Journals

UNH Scholars' Repository

A filtering engine for large conceptual schemas

Author: Villegas Niño Antonio
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2013
Field of study

Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Generating natural language specifications from UML class diagrams

Author: A Abbott
AV Gervasi
CL Heitmeyer
E Brill
E Goldberg
Farid Meziane
G Booch
HM Harmain
K Walden
L Goldin
L Mich
MD Lubars
Nikos Athanasakis
P Martin-Löf
PPS Chen
PPS Chen
Sophia Ananiadou
SW Ambler
W Ahrendt
WC Mann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Early phases of software development are known to be problematic, difficult to manage and errors occurring during these phases are expensive to correct. Many systems have been developed to aid the transition from informal Natural Language requirements to semistructured or formal specifications. Furthermore, consistency checking is seen by many software engineers as the solution to reduce the number of errors occurring during the software development life cycle and allow early verification and validation of software systems. However, this is confined to the models developed during analysis and design and fails to include the early Natural Language requirements. This excludes proper user involvement and creates a gap between the original requirements and the updated and modified models and implementations of the system. To improve this process, we propose a system that generates Natural Language specifications from UML class diagrams. We first investigate the variation of the input language used in naming the components of a class diagram based on the study of a large number of examples from the literature and then develop rules for removing ambiguities in the subset of Natural Language used within UML. We use WordNet,a linguistic ontology, to disambiguate the lexical structures of the UML string names and generate semantically sound sentences. Our system is developed in Java and is tested on an independent though academic case study

CiteSeerX

University of Salford Institutional Repository

Crossref

The University of Manchester - Institutional Repository

Integration of Legacy and Heterogeneous Databases

Author: Hainaut Jean-Luc
Thiran Philippe
Publication venue: Institut d'Informatrique - LIBD
Publication date: 01/01/2002
Field of study

Repository of the University of Namur