5 research outputs found

    Teaching Parallel Computing to Freshmen

    Get PDF
    Parallelism is the future of computing and computer science and should therefore be at the heart of the CS curriculum. Instead of continuing along the evolutionary path by introducing parallel computation “top down” (first in special junior-senior level courses), we are taking a radical approach and introducing parallelism at the earliest possible stages of instruction. Specifically, we are developing a completely new freshman-level course on data structures that integrates parallel computation naturally, and retains the emphasis on laboratory instruction. This will help to steer our curriculum as expeditiously as possible toward parallel computing. Our approach is novel in three distinct and essential ways. First, we will teach parallel computing to freshmen in a course designed from beginning to end to do so. Second, we will motivate the course with examples from scientific computation. Third, we use multimedia and visualization as instructional aids. We have two primary objectives: to begin a reform of our undergraduate curriculum with an laboratory-based freshman course on parallel computation, and to produce tools and methodologies that improve student understanding of the basic principles of parallel computing. Parallelism is the future of computing and computer science and should therefore be at the heart of the CS curriculum. Instead of continuing along the evolutionary path by introducing parallel computation “top down” (first in special junior-senior level courses), we are taking a radical approach and introducing parallelism at the earliest possible stages of instruction. Specifically, we are developing a completely new freshman-level course on data structures that integrates parallel computation naturally, and retains the emphasis on laboratory instruction. This will help to steer our curriculum as expeditiously as possible toward parallel computing. Our approach is novel in three distinct and essential ways. First, we will teach parallel computing to freshmen in a course designed from beginning to end to do so. Second, we will motivate the course with examples from scientific computation. Third, we use multimedia and visualization as instructional aids. We have two primary objectives: to begin a reform of our undergraduate curriculum with an laboratory-based freshman course on parallel computation, and to produce tools and methodologies that improve student understanding of the basic principles of parallel computing

    Attribute-Level Versioning: A Relational Mechanism for Version Storage and Retrieval

    Get PDF
    Data analysts today have at their disposal a seemingly endless supply of data and repositories hence, datasets from which to draw. New datasets become available daily thus making the choice of which dataset to use difficult. Furthermore, traditional data analysis has been conducted using structured data repositories such as relational database management systems (RDBMS). These systems, by their nature and design, prohibit duplication for indexed collections forcing analysts to choose one value for each of the available attributes for an item in the collection. Often analysts discover two or more datasets with information about the same entity. When combining this data and transforming it into a form that is usable in an RDBMS, analysts are forced to deconflict the collisions and choose a single value for each duplicated attribute containing differing values. This deconfliction is the source of a considerable amount of guesswork and speculation on the part of the analyst in the absence of professional intuition. One must consider what is lost by discarding those alternative values. Are there relationships between the conflicting datasets that have meaning? Is each dataset presenting a different and valid view of the entity or are the alternate values erroneous? If so, which values are erroneous? Is there a historical significance of the variances? The analysis of modern datasets requires the use of specialized algorithms and storage and retrieval mechanisms to identify, deconflict, and assimilate variances of attributes for each entity encountered. These variances, or versions of attribute values, contribute meaning to the evolution and analysis of the entity and its relationship to other entities. A new, distinct storage and retrieval mechanism will enable analysts to efficiently store, analyze, and retrieve the attribute versions without unnecessary complexity or additional alterations of the original or derived dataset schemas. This paper presents technologies and innovations that assist data analysts in discovering meaning within their data and preserving all of the original data for every entity in the RDBMS

    Concurrent access to B-trees

    No full text

    Extendible hashing for concurrent operations and distributed data

    No full text
    The extendible hash file is a dynamic data structure that is an alternative to B trees for use as a database index. While there have been many algorithms proposed to allow concurrent access to B-trees, similar solutions for extendible hash files have not appeared. In this paper, we present solutions to allow for concurrency that are based on locking protocols and minor modifications in the data structure. Another question that deserves consideration is whether these indexing structures can be adapted for use in a distributed database. Among the motivations for distributing data are increased availability and ease of growth; however, unless data structures in the access path are designed to support those goals, they may not be realized. We describe some first attempts at adapting extendible hash files for distributed data

    Extendible Hashing for Concurrent Operations and Distributed Data

    No full text
    The extendible hash file is a dynamic data structure that is an alternative to B-trees for use as a database index. While there have been many algorithms proposed to allow concurrent access to B-trees, similar solutions for extendible hash files have not appeared. In this paper, we present solutions to allow for concurrency that are based on locking protocols and minor modifications in the data structure. Another question that deserves consideration is whether these indexing structures can be adapted for use in a distributed database. Among the motivations for distributing data are increased availability and ease of growth; however, unless data structures in the access path are designed to support those goals, they may not be realized. We describe some first attempts at adapting extendible hash files for distributed data
    corecore