14,647 research outputs found
Recommended from our members
A Model for Compound Type Changes Encountered in Schema Evolution
Schema evolution is a problem that is faced by long-lived data. When a schema changes, existing persistent data can become inaccessible unless the database system provides mechanisms to access data created with previous versions of the schema. Most existing systems that support schema evolution focus on changes local to individual types within the schema, thereby limiting the changes that the database maintainer can perform. We have developed a model of type changes incorporating changes local to individual types as well as compound changes involving multiple types. The model describes both type changes and their impact on data by defining derivation rules to initialize new data based on the existing data. The derivation rules can describe local and nonlocal changes to types to capture the intent of a large class of type change operations. We have built a system called Tess (Type Evolution Software System) that uses this model to recognize type changes by comparing schemas and then produces a transformer that can update data in a database to correspond to a newer version of the schema
Recovering Grammar Relationships for the Java Language Specification
Grammar convergence is a method that helps discovering relationships between
different grammars of the same language or different language versions. The key
element of the method is the operational, transformation-based representation
of those relationships. Given input grammars for convergence, they are
transformed until they are structurally equal. The transformations are composed
from primitive operators; properties of these operators and the composed chains
provide quantitative and qualitative insight into the relationships between the
grammars at hand. We describe a refined method for grammar convergence, and we
use it in a major study, where we recover the relationships between all the
grammars that occur in the different versions of the Java Language
Specification (JLS). The relationships are represented as grammar
transformation chains that capture all accidental or intended differences
between the JLS grammars. This method is mechanized and driven by nominal and
structural differences between pairs of grammars that are subject to
asymmetric, binary convergence steps. We present the underlying operator suite
for grammar transformation in detail, and we illustrate the suite with many
examples of transformations on the JLS grammars. We also describe the
extraction effort, which was needed to make the JLS grammars amenable to
automated processing. We include substantial metadata about the convergence
process for the JLS so that the effort becomes reproducible and transparent
Technical Report: CSVM Ecosystem
The CSVM format is derived from CSV format and allows the storage of tabular
like data with a limited but extensible amount of metadata. This approach could
help computer scientists because all information needed to uses subsequently
the data is included in the CSVM file and is particularly well suited for
handling RAW data in a lot of scientific fields and to be used as a canonical
format. The use of CSVM has shown that it greatly facilitates: the data
management independently of using databases; the data exchange; the integration
of RAW data in dataflows or calculation pipes; the search for best practices in
RAW data management. The efficiency of this format is closely related to its
plasticity: a generic frame is given for all kind of data and the CSVM parsers
don't make any interpretation of data types. This task is done by the
application layer, so it is possible to use same format and same parser codes
for a lot of purposes. In this document some implementation of CSVM format for
ten years and in different laboratories are presented. Some programming
examples are also shown: a Python toolkit for using the format, manipulating
and querying is available. A first specification of this format (CSVM-1) is now
defined, as well as some derivatives such as CSVM dictionaries used for data
interchange. CSVM is an Open Format and could be used as a support for Open
Data and long term conservation of RAW or unpublished data.Comment: 31 pages including 2p of Anne
Recommended from our members
A cognitive architecture for learning in reactive environments
Previous research in machine learning has viewed the process of empirical discovery as search through a space of 'theoretical' terms. In this paper, we propose a problem space for empirical discovery, specifying six complementary operators for defining new terms that ease the statement of empirical laws. The six types of terms include: numeric attributes (such as PV/T); intrinsic properties (such as mass); composite objects (such as pairs of colliding balls); classes of objects (such as acids and alkalis); composite relations (such as chemical reactions); and classes of relations (such as combustion/oxidation). We review existing machine discovery systems in light of this framework, examining which parts of the problem space were, covered by these systems. Finally, we outline an integrated discovery system (IDS) we are constructing that includes all six of the operators and which should be able to discover a broad range of empirical laws
Recommended from our members
A framework for empirical discovery
Previous research in machine learning has viewed the process of empirical discovery as search through a space of 'theoretical' terms. In this paper, we propose a problem space for empirical discovery, specifying six complementary operators for defining new terms that ease the statement of empirical laws. The six types of terms include: numeric attributes (such as PV/T); intrinsic properties (such as mass); composite objects (such as pairs of colliding balls); classes of objects (such as acids and alkalis); composite relations (such as chemical reactions); and classes of relations (such as combustion/oxidation). We review existing machine discovery systems in light of this framework, examining which parts of the problem space were, covered by these systems. Finally, we outline an integrated discovery system (IDS) we are constructing that includes all six of the operators and which should be able to discover a broad range of empirical laws
Change Management in Large-Scale Enterprise Information Systems
Abstract. The information infrastructure in today’s businesses consists of many interoperating autonomous systems. Changes to a single system can therefore have an unexpected impact on other, dependent systems. In our Caro approach we try to cope with this problem by observing each system participating in the infrastructure and analyzing the impact of any change that occurs. The analysis process is driven by declaratively defined rules and works with a generic and ex-tensible graph model to represent the relevant metadata that is subject to changes. This makes Caro applicable to heterogeneous scenarios and customizable to spe-cial needs.
Hierarchical Event Descriptors (HED): Semi-Structured Tagging for Real-World Events in Large-Scale EEG.
Real-world brain imaging by EEG requires accurate annotation of complex subject-environment interactions in event-rich tasks and paradigms. This paper describes the evolution of the Hierarchical Event Descriptor (HED) system for systematically describing both laboratory and real-world events. HED version 2, first described here, provides the semantic capability of describing a variety of subject and environmental states. HED descriptions can include stimulus presentation events on screen or in virtual worlds, experimental or spontaneous events occurring in the real world environment, and events experienced via one or multiple sensory modalities. Furthermore, HED 2 can distinguish between the mere presence of an object and its actual (or putative) perception by a subject. Although the HED framework has implicit ontological and linked data representations, the user-interface for HED annotation is more intuitive than traditional ontological annotation. We believe that hiding the formal representations allows for a more user-friendly interface, making consistent, detailed tagging of experimental, and real-world events possible for research users. HED is extensible while retaining the advantages of having an enforced common core vocabulary. We have developed a collection of tools to support HED tag assignment and validation; these are available at hedtags.org. A plug-in for EEGLAB (sccn.ucsd.edu/eeglab), CTAGGER, is also available to speed the process of tagging existing studies
Biochemical network matching and composition
This paper looks at biochemical network matching and compositio
Facilitating Transformations in a Human Genome Project Database
Human Genome Project databases present a confluence of interesting database challenges: rapid schema and data evolution, complex data entry and constraint management, and the need to integrate multiple data sources and software systems which range over a wide variety of models and formats. While these challenges are not necessarily unique to biological databases, their combination, intensity and complexity are unusual and make automated solutions imperative. We illustrate these problems in the context of the Human Genome Database for Chromosome 22 (Chr22DB), and describe a new approach to a solution for these problems, by means of a deductive language for expressing database transformations and constraints
- …