45 research outputs found

    The schema coercion problem

    Get PDF
    Journal ArticleOver the past decade, the ability to incorporate data from a wide variety of sources has become increasingly important to database users. To meet this need, significant effort has been expended in automatic database schema manipulation. However, to date this effort has focused on two aspects of this problem: schema integration and schema evolution. Schema integration results in a unified view of several databases, while schema evolution enhances an existing database design to represent additional information. This work defines and addresses a third problem, schema coercion, which defines a mapping from one database to another. This paper presents an overview of the problems associated with schema coercion and how they correspond to the problems encountered by schema integration and schema evolution. In addition, our approach to this problem is outlined. The feasibility of this approach is demonstrated by a tool which reduces the human interaction required at all steps in the integration process. The database schemata are automatically read and converted into corresponding ER representations. Then, a correspondence identification heuristic is used to identify similar concepts, and create mappings between them. Finally, a program is generated to perform the data transfer. This tool has successfully been used to coerce the Haemophilus and Methanococcus genomes from the Genbank ASN.l database to the Utah Center for Human Genome Research database. Our comprehensive approach to addressing the schema coercion problem has proven extremely valuable in reducing the interaction required to define coercions, particularly when the heuristics are unsuccessful

    Schema Coercion: Using Database Meta-Information to Facilitate Data Transfer

    No full text
    As more information becomes available, the ability to quickly incorporate new and diverse data sources into existing database systems becomes critical. Schema coercion addresses this need by defining the mapping between databases as a collection of mappings between corresponding constructs. This work defines a comprehensive schema coercion tool: it transforms schemata into corresponding ER representations, identifies correspondences between them, and uses these correspondences to generate a program that automatically transfers data between the databases. In addition to creating a useful tool, this work addresses the significant theoretical problems associated with resolving representational and semantic conflicts between heterogeneous data sources. The approach advocated by this dissertation associates confidences with correspondences, and meta-information with schemata. This approach has successfully reduced the amount of interaction required to define several coercions, including a complex coercion between diverse genetics databases

    Report on XEWA-00

    No full text

    A Distributed Garbage Collection Algorithm

    Get PDF
    Concurrent Scheme extends the Scheme programming language, providing parallel program execution on a distributed network. The ConcurrentScheme environment requires a garbage collector to reclaim global objects# objects that exist in a portion of the global heap located on the node that created them. Because a global object may be referenced byseveral nodes, traditional garbage collection algorithms cannot be used. The garbage collector used must be able to reclaim global objects with minimal disturbance to the user program, and without the use of global state information. It must operate asynchronously, have a low network overhead, and be able to handle out-of-order messages. This thesis describes a distributed reference counting garbage collector appropriate for the reclamation of global objects in the Concurrent Scheme environment

    Practical lessons in supporting large-scale computational science

    No full text
    Business needs have driven the development of commercial database systems since their inception. As a result, there has been a strong focus on supporting many users, minimizing the potential corruption or los

    Automatic Discovery and Classification of Bioinformatics Web Sources

    No full text
    Motivation: The World Wide Web provides an incredible resource to genomics researchers in the form of query access to distributed data sources—e.g. BLAST sequence homology search interfaces. The number of these autonomous sources and their rate of change outpaces the speed at which they can be manually classified, meaning that the available data is not being utilized to its full potential. Manually maintaining a wrapper library will not scale to accommodate the growth of genomics data sources on the Web, challenging us to produce an automated system that can find, classify and wrap new sources without constant human intervention. Previous research has not addressed the problem of automatically locating, classifying and integrating classes of bioinformatics data sources. Results: This paper presents an overview of a system for finding classes of bioinformatics data sources and integrating them behind a unified interface. We describe our approach for automatic classification of new Web sources into relevance categories that eliminates the human effort required to maintain a current repository of sources. Our approach is based on a meta-data description of classes of interesting sources that describes the important features of an entire class of services without tying that description to any particular Web source. We examine the features of this format in the context of BLAST sources to show how it relates to Web sources that are being described. We then show how a description can be used to determine if an arbitrary Web source is an instance of the described service. To validate the effectiveness of this approach, we have constructed a prototype that correctly classifies approximately two-thirds of the BLAST sources we tested. We conclude with a discussion of these results, the factors that affect correct automatic classification and areas for future study. Contact

    Bioinformatics: managing scientific data

    No full text
    corecore