University of Chicago CRESCAT Project

Abstract

The CRESCAT project is an interdisciplinary collaboration between computer scientists, paleobiologists, archaeologists, economic historians, and other social scientists. The goal is to demonstrate the value of an integrative software ecosystem that spans the social and natural sciences and can facilitate any research characterized by overlapping models of temporal and spatial relations or by conflicting terminologies and taxonomies. CRESCAT’s representation of scientific knowledge eschews forced standardization, which is impractical in many cases due to lack of an enforcement mechanism and is also questionable in principle since divergent ontologies often legitimately reflect different theoretical assumptions and research agendas. Central to the CRESCAT suite of tools is an innovative data-integration system that represents explicitly both research data and the ontologies inherent in the data. CRESCAT’s data-integration system operates at a level of abstraction sufficient to provide a predictable and efficiently queryable database structure based on an abstract global schema, which in turn is based on an “upper ontology” specified in terms of fundamental concepts and relationships applicable to all scientific and scholarly disciplines. The data-integration system is implemented in an enterprise-class XQuery DBMS that serves as a data warehouse (using a non-relational graph data model) to store diverse data from a wide range of research projects representing many disciplines. The terminology and conceptual distinctions of each research project are fully preserved. The approach to research data taken in the CRESCAT project is (1) coherent, tightly integrating software tools and data formats within a single analytical framework; (2) open-ended, interconnecting existing tools while allowing the addition of new tools in the future; (3) non-exclusive, in no way preventing its component tools from participating in other software ecosystems; (4) scalable, designed to handle large-scale data management, analysis, and visualization; and (5) sustainable, maintaining shared resources to meet common needs for software and technical support and thus enabling substantial economies of scale

    Similar works

    Full text

    thumbnail-image

    Available Versions