12,208 research outputs found
NOSQL design for analytical workloads: Variability matters
Big Data has recently gained popularity and has strongly questioned relational databases as universal storage systems, especially in the presence of analytical workloads. As result, co-relational alternatives, commonly known as NOSQL (Not Only SQL) databases, are extensively used for Big Data. As the primary focus of NOSQL is on performance, NOSQL databases are directly designed at the physical level, and consequently the resulting schema is tailored to the dataset and access patterns of the problem in hand. However, we believe that NOSQL design can also benefit from traditional design approaches. In this paper we present a method to design databases for analytical workloads. Starting from the conceptual model and adopting the classical 3-phase design used for relational databases, we propose a novel design method considering the new features brought by NOSQL and encompassing relational and co-relational design altogether.Peer ReviewedPostprint (author's final draft
Conceptual Modelling and The Quality of Ontologies: Endurantism Vs. Perdurantism
Ontologies are key enablers for sharing precise and machine-understandable
semantics among different applications and parties. Yet, for ontologies to meet
these expectations, their quality must be of a good standard. The quality of an
ontology is strongly based on the design method employed. This paper addresses
the design problems related to the modelling of ontologies, with specific
concentration on the issues related to the quality of the conceptualisations
produced. The paper aims to demonstrate the impact of the modelling paradigm
adopted on the quality of ontological models and, consequently, the potential
impact that such a decision can have in relation to the development of software
applications. To this aim, an ontology that is conceptualised based on the
Object-Role Modelling (ORM) approach (a representative of endurantism) is
re-engineered into a one modelled on the basis of the Object Paradigm (OP) (a
representative of perdurantism). Next, the two ontologies are analytically
compared using the specified criteria. The conducted comparison highlights that
using the OP for ontology conceptualisation can provide more expressive,
reusable, objective and temporal ontologies than those conceptualised on the
basis of the ORM approach
Explain3D: Explaining Disagreements in Disjoint Datasets
Data plays an important role in applications, analytic processes, and many
aspects of human activity. As data grows in size and complexity, we are met
with an imperative need for tools that promote understanding and explanations
over data-related operations. Data management research on explanations has
focused on the assumption that data resides in a single dataset, under one
common schema. But the reality of today's data is that it is frequently
un-integrated, coming from different sources with different schemas. When
different datasets provide different answers to semantically similar questions,
understanding the reasons for the discrepancies is challenging and cannot be
handled by the existing single-dataset solutions.
In this paper, we propose Explain3D, a framework for explaining the
disagreements across disjoint datasets (3D). Explain3D focuses on identifying
the reasons for the differences in the results of two semantically similar
queries operating on two datasets with potentially different schemas. Our
framework leverages the queries to perform a semantic mapping across the
relevant parts of their provenance; discrepancies in this mapping point to
causes of the queries' differences. Exploiting the queries gives Explain3D an
edge over traditional schema matching and record linkage techniques, which are
query-agnostic. Our work makes the following contributions: (1) We formalize
the problem of deriving optimal explanations for the differences of the results
of semantically similar queries over disjoint datasets. (2) We design a 3-stage
framework for solving the optimal explanation problem. (3) We develop a
smart-partitioning optimizer that improves the efficiency of the framework by
orders of magnitude. (4)~We experiment with real-world and synthetic data to
demonstrate that Explain3D can derive precise explanations efficiently
A Call to Arms: Revisiting Database Design
Good database design is crucial to obtain a sound, consistent database, and -
in turn - good database design methodologies are the best way to achieve the
right design. These methodologies are taught to most Computer Science
undergraduates, as part of any Introduction to Database class. They can be
considered part of the "canon", and indeed, the overall approach to database
design has been unchanged for years. Moreover, none of the major database
research assessments identify database design as a strategic research
direction.
Should we conclude that database design is a solved problem?
Our thesis is that database design remains a critical unsolved problem.
Hence, it should be the subject of more research. Our starting point is the
observation that traditional database design is not used in practice - and if
it were used it would result in designs that are not well adapted to current
environments. In short, database design has failed to keep up with the times.
In this paper, we put forth arguments to support our viewpoint, analyze the
root causes of this situation and suggest some avenues of research.Comment: Removed spurious column break. Nothing else was change
Testing and test-driven development of conceptual schemas
The traditional focus for Information Systems (IS) quality assurance relies on the evaluation of its implementation. However, the quality of an IS can be largely determined in the first stages of its development. Several studies reveal that more than half the errors that occur during systems development are requirements errors. A requirements error is defined as a mismatch between requirements specification and stakeholders¿ needs and expectations.
Conceptual modeling is an essential activity in requirements engineering aimed at developing the conceptual schema of an IS. The conceptual schema is the general knowledge that an IS needs to know in order to perform its functions. A conceptual schema specification has semantic quality when it is valid and complete. Validity means that the schema is correct (the knowledge it defines is true for the domain) and relevant (the knowledge it defines is necessary for the system). Completeness means that the conceptual schema includes all relevant knowledge. The validation of a conceptual schema pursues the detection of requirements errors in order to improve its semantic quality.
Conceptual schema validation is still a critical challenge in requirements engineering. In this work we contribute to this challenge, taking into account that, since conceptual schemas of IS can be specified in executable artifacts, they can be tested. In this context, the main contributions of this Thesis are (1) an approach to test conceptual schemas of information systems, and (2) a novel method for the incremental development of conceptual schemas supported by continuous test-driven validation. As far as we know, this is the first work that proposes and implements an environment for automated testing of UML/OCL conceptual schemas, and the first work that explores the use of test-driven approaches in conceptual modeling.
The testing of conceptual schemas may be an important and practical means for their validation. It allows checking correctness and completeness according to stakeholders¿ needs and expectations. Moreover, in conjunction with the automatic check of basic test adequacy criteria, we can also analyze the relevance of the elements defined in the schema. The testing environment we propose requires a specialized language for writing tests of conceptual schemas. We defined the Conceptual Schema Testing Language (CSTL), which may be used to specify automated tests of executable schemas specified in UML/OCL. We also describe a prototype implementation of a test processor that makes feasible the approach in practice.
The conceptual schema testing approach supports test-last validation of conceptual schemas, but it also makes sense to test incomplete conceptual schemas while they are developed. This fact lays the groundwork of Test-Driven Conceptual Modeling (TDCM), which is our second main contribution. TDCM is a novel conceptual modeling method based on the main principles of Test-Driven Development (TDD), an extreme programming method in which a software system is developed in short iterations driven by tests. We have applied the method in several case studies, in the context of Design Research, which is the general research framework we adopted. Finally, we also describe an integration approach of TDCM into a broad set of software development methodologies, including the Unified Process development methodology, MDD-based approaches, storytest-driven agile methods and goal and scenario-oriented requirements engineering methods.Els enfocaments per assegurar la qualitat deis sistemes d'informació s'han basal tradicional m en! en l'avaluació de la seva
implementació. No obstan! aix6, la qualitat d'un sis tema d'informació pot ser ampliament determinada en les primeres
fases del seu desenvolupament. Diversos estudis indiquen que més de la meitat deis errors de software són errors de
requisits . Un error de requisit es defineix com una desalineació entre l'especificació deis requisits i les necessitats i
expectatives de les parts im plicades (stakeholders ).
La modelització conceptual és una activitat essencial en l'enginyeria de requisits , l'objectiu de la qual és desenvolupar
!'esquema conceptual d'un sistema d'informació. L'esquema conceptual és el coneixement general que un sistema
d'informació requereix per tal de desenvolupar les seves funcions . Un esquema conceptual té qualitat semantica quan és
va lid i complet. La valides a implica que !'esquema sigui correcte (el coneixement definit és cert peral domini) i rellevant (el
coneixement definit és necessari peral sistema). La completes a significa que !'esquema conceptual inclou tot el
coneixement rellevant. La validació de !'esquema conceptual té coma objectiu la detecció d'errors de requisits per tal de
millorar la qualitat semantica.
La validació d'esquemes conceptuals és un repte crític en l'enginyeria de requisits . Aquesta te si contribueix a aquest repte i
es basa en el fet que els es quemes conceptuals de sistemes d'informació poden ser especificats en artefactes executables
i, per tant, poden ser provats. Les principals contribucions de la te si són (1) un enfocament pera les pro ves d'esquemes
conceptuals de sistemes d'informació, i (2) una metodología innovadora pel desenvolupament incremental d'esquemes
conceptuals assistit per una validació continuada basada en proves .
Les pro ves d'esquemes conceptuals poden ser una im portant i practica técnica pera la se va validació, jaque permeten
provar la correctesa i la completesa d'acord ambles necessitats i expectatives de les parts interessades. En conjunció amb
la comprovació d'un conjunt basic de criteris d'adequació de les proves, també podem analitzar la rellevancia deis elements
definits a !'esquema.
L'entorn de test proposat inclou un llenguatge especialitzat per escriure proves automatitzades d'esquemes conceptuals,
anomenat Conceptual Schema Testing Language (CSTL). També hem descrit i implementa! a un prototip de processador
de tes tos que fa possible l'aplicació de l'enfocament proposat a la practica. D'acord amb l'estat de l'art en validació
d'esquemes conceptuals , aquest és el primer treball que proposa i implementa un entorn pel testing automatitzat
d'esquemes conceptuals definits en UML!OCL.
L'enfocament de proves d'esquemes conceptuals permet dura terme la validació d'esquemes existents , pero també té
sentit provar es quemes conceptuals incomplets m entre estant sent desenvolupats. Aquest fet és la base de la metodología
Test-Driven Conceptual Modeling (TDCM), que és la segona contribució principal. El TDCM és una metodología de
modelització conceptual basada en principis basics del Test-Driven Development (TDD), un métode de programació en el
qual un sistema software és desenvolupat en petites iteracions guiades per proves. També hem aplicat el métode en
diversos casos d'estudi en el context de la metodología de recerca Design Science Research. Finalment, hem proposat
enfocaments d'integració del TDCM en diverses metodologies de desenvolupament de software
Potentially Polluting Marine Sites GeoDB: An S-100 Geospatial Database as an Effective Contribution to the Protection of the Marine Environment
Potentially Polluting Marine Sites (PPMS) are objects on, or areas of, the seabed that may release pollution in the future. A rationale for, and design of, a geospatial database to inventory and manipu-late PPMS is presented. Built as an S-100 Product Specification, it is specified through human-readable UML diagrams and implemented through machine-readable GML files, and includes auxiliary information such as pollution-control resources and potentially vulnerable sites in order to support analyses of the core data. The design and some aspects of implementation are presented, along with metadata requirements and structure, and a perspective on potential uses of the database
A filtering engine for large conceptual schemas
Postprint (published version
Generating natural language specifications from UML class diagrams
Early phases of software development are known to be problematic, difficult to manage and errors occurring during these phases are expensive to correct. Many systems have been developed to aid the transition from informal Natural Language requirements to semistructured or formal specifications. Furthermore, consistency checking is seen by many software engineers as the solution to reduce the number of errors occurring during the software development life cycle and allow early verification and validation of software systems. However, this is confined to the models developed during analysis and design and fails to include the early Natural Language requirements. This excludes proper user involvement and creates a gap between the original requirements and the updated and modified models and implementations of the system. To improve this process, we propose a system that generates Natural Language specifications from UML class diagrams. We first investigate the variation of the input language used in naming the components of a class diagram based on the study of a large number of examples from the literature and then develop rules for removing ambiguities in the subset of Natural Language used within UML. We use WordNet,a linguistic ontology, to disambiguate the lexical structures of the UML string names and generate semantically sound sentences. Our system is developed in Java and is tested on an independent though academic case study
- …