Search CORE

19 research outputs found

Programming Constructs for Unstructured Data

Author: Dan Suciu
Dan Suciu Y
Paolo Atzeni
Peter Buneman
Peter Buneman
Susan Davidson
Susan Davidson
Val Tannen (eds
Publication venue: Springer-Verlag GmbH
Publication date: 01/01/1995
Field of study

We investigate languages for querying and transforming unstructured data by which we mean languages than can be used without knowledge of the structure (schema) of the database. There are two reasons for wanting to do this. First, some data models have emerged in which the schema is either completely absent or only provides weak constraints on the data. Second, it is sometimes convenient, for the purposes of browsing, to query the database without reference to the schema. For example one may want to "grep" all character strings in the database, or one might want to find the information associated with a certain field name no matter where it occurs in the database. This paper introduces a labelled tree model of data and investigates various programming structures for querying and transforming such data. In particular, it considers various restrictions of structural recursion that give rise to well-defined queries even when the input data contains cycles. It also discusses issues of obse..

CiteSeerX

Crossref

Edinburgh Research Explorer

ScholarlyCommons@Penn

Optimisation of the enactment of fine-grained distributed data-intensive work flows

Author: Liew Chee Sun
Publication venue: The University of Edinburgh
Publication date: 29/11/2012
Field of study

The emergence of data-intensive science as the fourth science paradigm has posed a data deluge challenge for enacting scientific work-flows. The scientific community is facing an imminent flood of data from the next generation of experiments and simulations, besides dealing with the heterogeneity and complexity of data, applications and execution environments. New scientific work-flows involve execution on distributed and heterogeneous computing resources across organisational and geographical boundaries, processing gigabytes of live data streams and petabytes of archived and simulation data, in various formats and from multiple sources. Managing the enactment of such work-flows not only requires larger storage space and faster machines, but the capability to support scalability and diversity of the users, applications, data, computing resources and the enactment technologies. We argue that the enactment process can be made efficient using optimisation techniques in an appropriate architecture. This architecture should support the creation of diversified applications and their enactment on diversified execution environments, with a standard interface, i.e. a work-flow language. The work-flow language should be both human readable and suitable for communication between the enactment environments. The data-streaming model central to this architecture provides a scalable approach to large-scale data exploitation. Data-flow between computational elements in the scientific work-flow is implemented as streams. To cope with the exploratory nature of scientific work-flows, the architecture should support fast work-flow prototyping, and the re-use of work-flows and work-flow components. Above all, the enactment process should be easily repeated and automated. In this thesis, we present a candidate data-intensive architecture that includes an intermediate work-flow language, named DISPEL. We create a new fine-grained measurement framework to capture performance-related data during enactments, and design a performance database to organise them systematically. We propose a new enactment strategy to demonstrate that optimisation of data-streaming work-flows can be automated by exploiting performance data gathered during previous enactments

Edinburgh Research Archive

Memoria 2011

Author: Comisión de Investigaciones Científicas de la Provincia de Buenos Aires (CICBA)
Publication venue
Publication date: 01/01/2011
Field of study

-Autoridades. -Comisiones asesoras honorarias. -Plan general de acciones. -Estructura. -Formación de Recursos Humanos. -Centros de investigación. -Programa de modernización tecnológica. -Promoción y difusión científica. -Ejecución presupuestaria.Digitalizado en SEDICI-CIC Digita

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Centro de Servicios en Gestión de Información

Digital.CSIC

Observational Distinguishability of Databases with Object Identity

Author: Anthony Kosky
Anthony S. Kosky
Paolo Atzeni
Val Tannen (eds
Publication venue: Springer-Verlag
Publication date: 01/01/1995
Field of study

We will examine the problem of distinguishing between database instances and values in models which incorporate object-identities and recursive data-structures. We will show that the notion of observational distinguishability is intricately linked to the languages available for querying a database. In particular we will show that, given a simple query language incorporating a test for equality of object-identities, database instances are indistinguishable iff they are isomorphic, and that, in a language without any operators on object-identities, database instances are indistinguishable iff a bisimilarity relation holds between them. Further, such a bisimulation relation may be computed on values, but doing so requires the ability to recurse over all the object-identities in an instance. We will then show that systems of keys give rise to observational distinguishability relations which lie between these two extremes. We show that a system of keys satisfying certain restrictions provi..

CiteSeerX

ScholarlyCommons@Penn

Scheduling Resource Usage in Object-Oriented Queries

Author: Paolo Atzeni
Theodore W. Leung
Theodore W. Leung
Val Tannen (eds
Publication venue
Publication date
Field of study

Published in collaboration with th

CiteSeerX

British Computer Society BCS Efficient Evaluation of Aggregates on Bulk Types

Author: Guido Moerkotte
Guido Moerkotte
Paolo Atzeni
Sophie Cluet
Sophie Cluet
Val Tannen (eds
Publication venue
Publication date: 01/01/1995
Field of study

CiteSeerX

Crossref