102 research outputs found
Types With Extents: On Transforming and Querying Self-Referential Data-Structures (Dissertation Proposal)
The central theme of this paper is to study the properties and expressive power of data-models which use type systems with extents in order to represent recursive or self-referential data-structures. A standard type system is extended with classes which represent the finite extents of values stored in a database. Such an extended type system expresses constraints about a database instance which go beyond those normally associated with the typing of data-values, and takes on an important part of the functionality of a database schema. Recursion in data-structures is then constrained to be defined via these finite extents, so that all values in a database have a finite representation. The idea of extending a type system with such classes is not new. In particular [2] introduced a type system and data models equivalent to those used here. However such existing work focuses on the expressive power of systems which allow the dynamic creation of recursive values, while we are concerned more with the properties of querying and manipulating databases containing known static extensions of data-values
Transforming Databases with Recursive Data Structures
This thesis examines the problems of performing structural transformations on databases involving complex data-structures and object-identities, and proposes an approach to specifying and implementing such transformations.
We start by looking at various applications of such database transformations, and at some of the more significant work in these areas. In particular we will look at work on transformations in the area of database integration, which has been one of the major motivating areas for this work. We will also look at various notions of correctness that have been proposed for database transformations, and show that the utility of such notions is limited by the dependence of transformations on certain implicit database constraints. We draw attention to the limitations of existing work on transformations, and argue that there is a need for a more general formalism for reasoning about database transformations and constraints.
We will also argue that, in order to ensure that database transformations are well-defined and meaningful, it is necessary to understand the information capacity of the data-models being transformed. To this end we give a thorough analysis of the information capacity of data-models supporting object identity, and will show that this is dependent on the operations supported by a query language for comparing object identities.
We introduce a declarative language, WOL, based on Horn-clause logic, for specifying database transformations and constraints. We also propose a method of implementing transformations specified in this language, by manipulating their clauses into a normal form which can then be translated into an underlying database programming language. Finally we will present a number of optimizations and techniques necessary in order to build a practical implementation based on these proposals, and will discuss the results of some of the trials that were carried out using a prototype of such a system
Technical Privacy Metrics: a Systematic Survey
The file attached to this record is the author's final peer reviewed versionThe goal of privacy metrics is to measure the degree of privacy enjoyed by users in a system and the amount of protection offered by privacy-enhancing technologies. In this way, privacy metrics contribute to improving user privacy in the digital world. The diversity and complexity of privacy metrics in the literature makes an informed choice of metrics challenging. As a result, instead of using existing metrics, new metrics are proposed frequently, and privacy studies are often incomparable. In this survey we alleviate these problems by structuring the landscape of privacy metrics. To this end, we explain and discuss a selection of over eighty privacy metrics and introduce categorizations based on the aspect of privacy they measure, their required inputs, and the type of data that needs protection. In addition, we present a method on how to choose privacy metrics based on nine questions that help identify the right privacy metrics for a given scenario, and highlight topics where additional work on privacy metrics is needed. Our survey spans multiple privacy domains and can be understood as a general framework for privacy measurement
Uncertainty and indistinguishability. Application to modelling with words.
El concepte d'igualtat és fonamental en qualsevol teoria donat que és una noció essencial a l'hora de discernir entre els elements objecte del seu estudi i possibilitar la definició de mecanismes de classificació.Quan totes les propietats són perfectament precises (absència d'incertesa), hom obtè la igualtat clà ssica a on dos objectes són considerats iguals si i només si comparteixen el mateix conjunt de propietats. Però, què passa quan considerem l'aparició d'incertesa, com en el cas a on els objectes compleixen una determinada propietat només fins a un cert grau?. Llavors, donat que alguns objectes seran més similars entre si que d'altres, sorgeix la necessitat de una noció gradual del concepte d'igualtat.Aquestes consideracions refermen la idea de que certs contextos requereixen una definició més flexible, que superi la rigidesa de la noció clà ssica d'igualtat. Els operadors de T-indistingibilitat semblen bons candidats per aquest nou tipus d'igualtat que cerquem.D'altra banda, La Teoria de l'Evidència de Dempster-Shafer, com a marc pel tractament d'evidències, defineix implÃcitament una noció d'indistingibilitat entre els elements del domini de discurs basada en la seva compatibilitat relativa amb l'evidència considerada. El capÃtol segon analitza diferents mètodes per definir l'operador de T-indistingibilitat associat a una evidència donada.En el capÃtol tercer, després de presentar un exhaustiu estat de l'art en mesures d'incertesa, ens centrem en la qüestió del còmput de l'entropia quan sobre els elements del domini s'ha definit una relació d'indistingibilitat. Llavors, l'entropia hauria de ser mesurada no en funció de l'ocurrència d'events diferents, sinó d'acord amb la variabilitat percebuda per un observador equipat amb la relació d'indistingibilitat considerada. Aquesta interpretació suggereix el "paradigma de l'observador" que ens porta a la introducció del concepte d'entropia observacional.La incertesa és un fenomen present al món real. El desenvolupament de tècniques que en permetin el tractament és doncs, una necessitat. La 'computació amb paraules' ('computing with words') pretén assolir aquest objectiu mitjançant un formalisme basat en etiquetes lingüÃstiques, en contrast amb els mètodes numèrics tradicionals. L'ús d'aquestes etiquetes millora la comprensibilitat del llenguatge de representació delconeixement, a l'hora que requereix una adaptació de les tècniques inductives tradicionals.En el quart capÃtol s'introdueix un nou tipus d'arbre de decisió que incorpora les indistingibilitats entre elements del domini a l'hora de calcular la impuresa dels nodes. Hem anomenat arbres de decisió observacionals a aquests nou tipus, donat que es basen en la incorporació de l'entropia observacional en la funció heurÃstica de selecció d'atributs. A més, presentem un algorisme capaç d'induir regles lingüÃstiques mitjançant un tractament adient de la incertesa present a les etiquetes lingüÃstiques o a les dades mateixes. La definició de l'algorisme s'acompanya d'una comparació formal amb altres algorismes està ndards.The concept of equality is a fundamental notion in any theory since it is essential to the ability of discerning the objects to whom it concerns, ability which in turn is a requirement for any classification mechanism that might be defined. When all the properties involved are entirely precise, what we obtain is the classical equality, where two individuals are considered equal if and only if they share the same set of properties. What happens, however, when imprecision arises as in the case of properties which are fulfilled only up to a degree? Then, because certain individuals will be more similar than others, the need for a gradual notion of equality arises.These considerations show that certain contexts that are pervaded with uncertainty require a more flexible concept of equality that goes beyond the rigidity of the classic concept of equality. T-indistinguishability operators seem to be good candidates for this more flexible and general version of the concept of equality that we are searching for.On the other hand, Dempster-Shafer Theory of Evidence, as a framework for representing and managing general evidences, implicitly conveys the notion of indistinguishability between the elements of the domain of discourse based on their relative compatibility with the evidence at hand. In chapter two we are concerned with providing definitions for the T-indistinguishability operator associated to a given body of evidence.In chapter three, after providing a comprehensive summary of the state of the art on measures of uncertainty, we tackle the problem of computing entropy when an indistinguishability relation has been defined over the elements of the domain. Entropy should then be measured not according to the occurrence of different events, but according to the variability perceived by an observer equipped with indistinguishability abilities as defined by the indistinguishability relation considered. This idea naturally leads to the introduction of the concept of observational entropy.Real data is often pervaded with uncertainty so that devising techniques intended to induce knowledge in the presence of uncertainty seems entirely advisable.The paradigm of computing with words follows this line in order to provide a computation formalism based on linguistic labels in contrast to traditional numerical-based methods.The use of linguistic labels enriches the understandability of the representation language, although it also requires adapting the classical inductive learning procedures to cope with such labels.In chapter four, a novel approach to building decision trees is introduced, addressing the case when uncertainty arises as a consequence of considering a more realistic setting in which decision maker's discernment abilities are taken into account when computing node's impurity measures. This novel paradigm results in what have been called --observational decision trees' since the main idea stems from the notion of observational entropy in order to incorporate indistinguishability concerns. In addition, we present an algorithm intended to induce linguistic rules from data by properly managing the uncertainty present either in the set of describing labels or in the data itself. A formal comparison with standard algorithms is also provided
Functional Dependencies for Object Databases: Motivation and Axiomatization
Object identification by abstract identifiers should be considered as a modeling and not as a database concept. This means that object identifiers are not appropriate for the access to specific objects using a database language. In this paper we discuss how the relational concept of a functional dependency can be adapted to object databases in order to get more convenient ways of accessing objects. Graph based object functional dependencies are proposed as a means to specify constraints between attributes and object types of an object schema. Value based identification criteria can be defined using a special type of object functional dependencies. Different definitions of satisfaction are given for these constraints, based on a so-called validation relation, and their relationships are investigated. These definitions are related to different forms of identification. Using the strongest notion of satisfaction, inference rules for the derivation of new dependencies are discussed with emphasis on the characteristics of rules combining two dependencies, like the transitivity rule. In addition to generalized relational rules further rules are needed, mainly concerned with transition from the object type level to the attribute level and vice versa
Recommended from our members
Exploiting a perdurantist foundational ontology and graph database for semantic data integration
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.The view of reality that is inherent to perdurantist philosophical ontologies, often termed four dimensional (4D) ontologies, has not been widely adopted within the mainstream of information system design practice. However, as the closed world of enterprise systems is opened to Internet scale Semantic Web and Open Data information sources, there is a need to better understand the semantics of both internal and external data and how they can be integrated. Philosophical foundational ontologies can help establish this understanding and there is, therefore, an emerging need to research how they can be applied to the problem of semantic data integration. Therefore, a prime objective of this research was to develop a framework through which to apply a 4D foundational ontology and a graph database to the problem of semantic data integration, and to assess the effectiveness of the approach. The research employed design science, a methodology which is applicable to undertaking research within information systems as it encompasses methods through which the research can be undertaken and the resultant artefacts evaluated. This methodology has a number of discrete stages: problem awareness; a core design-build-evaluate iterative cycle through which the research is conducted; and a conclusion stage. The design science research was conducted through the development of a number of artefacts, the prime being the 4D-Semantic Extract Load (4D-SETL) framework. The effectiveness of the framework was assessed by applying it to semantically interpret and integrate a number of large scale datasets and to instantiate a prototype graph database warehouse to persist the resultant ontology. A series of technical experiments confirmed that directly reflecting the model patterns of 4D ontology within a prototype data warehouse proved an effective means of both structuring and semantically integrating complex datasets and that the artefacts produced by 4D-SETL could function at scale. Through illustrative scenario, the effectiveness of the approach is described in relation to the ability of the framework to address a number of weaknesses in current approaches. Furthermore the major advantages of the 4D-SETL are elaborated; which include ability of the framework is to combine foundational, domain and instance level ontological models in a single coherent system that dispensed with much of the translation normally undertaken between conceptual, logical and physical data models. Additionally, adopting a perdurantist realist foundational ontology provided a clear means of establishing and maintaining the identity of physical objects as their constituent temporal and spatial parts unfold over the course of tim
- …