13 research outputs found

    An OS-Based Alternative to Full Hardware Coherence on Tiled Chip-Multiprocessors

    Get PDF
    Institute for Computing Systems ArchitectureThe interconnect mechanisms (shared bus or crossbar) used in current chip-multiprocessors (CMPs) are expected to become a bottleneck that prevents these architectures from scaling to a larger number of cores. Tiled CMPs offer better scalability by integrating relatively simple cores with a lightweight point-to-point interconnect. However, such interconnects make snooping impractical and, thus, require alternative solutions to cache coherence. This thesis proposes a novel, cost-effective hardware mechanism to support shared-memory parallel applications that forgoes hardware maintained cache coherence. The proposed mech- anism is based on the key ideas that mapping of lines to physical caches is done at the page level with OS support and that hardware supports remote cache accesses. It allows only some controlled migration and replication of data and provides a sufficient degree of flexibility in the mapping through an extra level of indirection between virtual pages and physical tiles. The proposed tiled CMP architecture is evaluated on the SPLASH-2 scientific benchmarks and ALPBench multimedia benchmarks against one with private caches and a distributed direc- tory cache coherence mechanism. Experimental results show that the performance degradation is as little as 0%, and 16% on average, compared to the cache coherent architecture across all benchmarks for 16 and 32 processors

    Run-time support for parallel object-oriented computing: the NIP lazy task creation technique and the NIP object-based software distributed shared memory

    Get PDF
    PhD ThesisAdvances in hardware technologies combined with decreased costs have started a trend towards massively parallel architectures that utilise commodity components. It is thought unreasonable to expect software developers to manage the high degree of parallelism that is made available by these architectures. This thesis argues that a new programming model is essential for the development of parallel applications and presents a model which embraces the notions of object-orientation and implicit identification of parallelism. The new model allows software engineers to concentrate on development issues, using the object-oriented paradigm, whilst being freed from the burden of explicitly managing parallel activity. To support the programming model, the semantics of an execution model are defined and implemented as part of a run-time support system for object-oriented parallel applications. Details of the novel techniques from the run-time system, in the areas of lazy task creation and object-based, distributed shared memory, are presented. The tasklet construct for representing potentially parallel computation is introduced and further developed by this thesis. Three caching techniques that take advantage of memory access patterns exhibited in object-oriented applications are explored. Finally, the performance characteristics of the introduced run-time techniques are analysed through a number of benchmark applications

    Simulation Modelling of Distributed-Shared Memory Multiprocessors

    Get PDF
    Institute for Computing Systems ArchitectureDistributed shared memory (DSM) systems have been recognised as a compelling platform for parallel computing due to the programming advantages and scalability. DSM systems allow applications to access data in a logically shared address space by abstracting away the distinction of physical memory location. As the location of data is transparent, the sources of overhead caused by accessing the distant memories are difficult to analyse. This memory locality problem has been identified as crucial to DSM performance. Many researchers have investigated the problem using simulation as a tool for conducting experiments resulting in the progressive evolution of DSM systems. Nevertheless, both the diversity of architectural configurations and the rapid advance of DSM implementations impose constraints on simulation model designs in two issues: the limitation of the simulation framework on model extensibility and the lack of verification applicability during a simulation run causing the delay in verification process. This thesis studies simulation modelling techniques for memory locality analysis of various DSM systems implemented on top of a cluster of symmetric multiprocessors. The thesis presents a simulation technique to promote model extensibility and proposes a technique for verification applicability, called a Specification-based Parameter Model Interaction (SPMI). The proposed techniques have been implemented in a new interpretation-driven simulation called DSiMCLUSTER on top of a discrete event simulation (DES) engine known as HASE. Experiments have been conducted to determine which factors are most influential on the degree of locality and to determine the possibility to maximise the stability of performance. DSiMCLUSTER has been validated against a SunFire 15K server and has achieved similarity of cache miss results, an average of +-6% with the worst case less than 15% of difference. These results confirm that the techniques used in developing the DSiMCLUSTER can contribute ways to achieve both (a) a highly extensible simulation framework to keep up with the ongoing innovation of the DSM architecture, and (b) the verification applicability resulting in an efficient framework for memory analysis experiments on DSM architecture

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Memory-Constrained Computing

    Get PDF
    University of Minnesota Ph.D. dissertation.November 2017. Major: Computer Science. Advisor: George Karypis. 1 computer file (PDF); x, 126 pages.The growing disparity between data set sizes and the amount of fast internal memory available in modern computer systems is an important challenge facing a variety of application domains. This problem is partly due to the incredible rate at which data is being collected, and partly due to the movement of many systems towards increasing processor counts without proportionate increases in fast internal memory. Without access to sufficiently large machines, many application users must balance a trade-off between utilizing the processing capabilities of their system and performing computations in memory. In this thesis we explore several approaches to solving this problem. We develop effective and efficient algorithms for compressing scientific simulation data computed on structured and unstructured grids. A paradigm for lossy compression of this data is proposed in which the data computed on the grid is modeled as a graph, which gets decomposed into sets of vertices which satisfy a user defined error constraint, epsilon. Each set of vertices is replaced by a constant value with reconstruction error bounded by epsilon. A comprehensive set of experiments is conducted by comparing these algorithms and other state-of-the-art scientific data compression methods. Over our benchmark suite, our methods obtained compression of 1% of the original size with average PSNR of 43.00 and 3% of the original size with average PSNR of 63.30. In addition, our schemes outperform other state-of-the-art lossy compression approaches and require on the average 25% of the space required by them for similar or better PSNR levels. We present algorithms and experimental analysis for five data structures for representing dynamic sparse graphs. The goal of the presented data structures is two fold. First, the data structures must be compact, as the size of the graphs being operated on continues to grow to less manageable sizes. Second, the cost of operating on the data structures must be within a small factor of the cost of operating on the static graph, else these data structures will not be useful. Of these five data structures, three are approaches, one is semi-compact, but suited for fast operation, and one is focused on compactness and is a dynamic extension of any existing technique known as the WebGraph Framework. Our results show that for well intervalized graphs, like web graphs, the semi-compact is superior to all other data structures in terms of memory and access time. Furthermore, we show that in terms of memory, the compact data structure outperforms all other data structures at the cost of a modest increase in update and access time. We present a virtual memory subsystem which we implemented as part of the BDMPI runtime. Our new virtual memory subsystem, which we call SBMA, bypasses the operating system virtual memory manager to take advantage of BDMPI's node-level cooperative multi-taking. Benchmarking using a synthetic application shows that for the use cases relevant to BDMPI, the overhead incurred by the BDMPI-SBMA system is amortized such that it performs as fast as explicit data movement by the application developer. Furthermore, we tested SBMA with three different classes of applications and our results show that with no modification to the original program, speedups from 2x--12x over a standard BDMPI implementation can be achieved for the included applications. We present a runtime system designed to be used alongside data parallel OpenMP programs for shared-memory problems requiring out-of-core execution. Our new runtime system, which we call OpenOOC, exploits the concurrency exposed by the OpenMP semantics to switch execution contexts during non-resident memory access to perform useful computation, instead of having the thread wait idle. Benchmarking using a synthetic application shows that modern operating systems support the necessary memory and execution context switching functionalities with high-enough performance that they can be used to effectively hide some of the overhead incurred when swapping data between memory and disk in out-of-core execution environments. Furthermore, we tested OpenOOC with practical computational application and our results show that with no structural modification to the original program, runtime can be reduced by an average of 21% compared with the out-of-core equivalent of the application

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Mining a Small Medical Data Set by Integrating the Decision Tree and t-test

    Get PDF
    [[abstract]]Although several researchers have used statistical methods to prove that aspiration followed by the injection of 95% ethanol left in situ (retention) is an effective treatment for ovarian endometriomas, very few discuss the different conditions that could generate different recovery rates for the patients. Therefore, this study adopts the statistical method and decision tree techniques together to analyze the postoperative status of ovarian endometriosis patients under different conditions. Since our collected data set is small, containing only 212 records, we use all of these data as the training data. Therefore, instead of using a resultant tree to generate rules directly, we use the value of each node as a cut point to generate all possible rules from the tree first. Then, using t-test, we verify the rules to discover some useful description rules after all possible rules from the tree have been generated. Experimental results show that our approach can find some new interesting knowledge about recurrent ovarian endometriomas under different conditions.[[journaltype]]國外[[incitationindex]]EI[[booktype]]紙本[[countrycodes]]FI

    3D Spatial Data Infrastructures for web-based Visualization

    Get PDF
    In this thesis, concepts for developing Spatial Data Infrastructures with an emphasis on visualizing 3D landscape and city models in distributed environments are discussed. Spatial Data Infrastructures are important for public authorities in order to perform tasks on a daily basis, and serve as research topic in geo-informatics. Joint initiatives at national and international level exist for harmonizing procedures and technologies. Interoperability is an important aspect in this context - as enabling technology for sharing, distributing, and connecting geospatial data and services. The Open Geospatial Consortium is the main driver for developing international standards in this sector and includes government agencies, universities and private companies in a consensus process. 3D city models are becoming increasingly popular not only in desktop Virtual Reality applications but also for being used in professional purposes by public authorities. Spatial Data Infrastructures focus so far on the storage and exchange of 3D building and elevation data. For efficient streaming and visualization of spatial 3D data in distributed network environments such as the internet, concepts from the area of real time 3D Computer Graphics must be applied and combined with Geographic Information Systems (GIS). For example, scene graph data structures are commonly used for creating complex and dynamic 3D environments for computer games and Virtual Reality applications, but have not been introduced in GIS so far. In this thesis, several aspects of how to create interoperable and service-based environments for 3D spatial data are addressed. These aspects are covered by publications in journals and conference proceedings. The introductory chapter provides a logic succession from geometrical operations for processing raw data, to data integration patterns, to system designs of single components, to service interface descriptions and workflows, and finally to an architecture of a complete distributed service network. Digital Elevation Models are very important in 3D geo-visualization systems. Data structures, methods and processes are described for making them available in service based infrastructures. A specific mesh reduction method is used for generating lower levels of detail from very large point data sets. An integration technique is presented that allows the combination with 2D GIS data such as roads and land use areas. This approach allows using another optimization technique that greatly improves the usability for immersive 3D applications such as pedestrian navigation: flattening road and water surfaces. It is a geometric operation, which uses data structures and algorithms found in numerical simulation software implementing Finite Element Methods. 3D Routing is presented as a typical application scenario for detailed 3D city models. Specific problems such as bridges, overpasses and multilevel networks are addressed and possible solutions described. The integration of routing capabilities in service infrastructures can be accomplished with standards of the Open Geospatial Consortium. An additional service is described for creating 3D networks and for generating 3D routes on the fly. Visualization of indoor routes requires different representation techniques. As server interface for providing access to all 3D data, the Web 3D Service has been used and further developed. Integrating and handling scene graph data is described in order to create rich virtual environments. Coordinate transformations of scene graphs are described in detail, which is an important aspect for ensuring interoperability between systems using different spatial reference systems. The Web 3D Service plays a central part in nearly all experiments that have been carried out. It does not only provide the means for interactive web-visualizations, but also for performing further analyses, accessing detailed feature information, and for automatic content discovery. OpenStreetMap and other worldwide available datasets are used for developing a complete architecture demonstrating the scalability of 3D Spatial Data Infrastructures. Its suitability for creating 3D city models is analyzed, according to requirements set by international standards. A full virtual globe system has been developed based on OpenStreetMap including data processing, database storage, web streaming and a visualization client. Results are discussed and compared to similar approaches within geo-informatics research, clarifying in which application scenarios and under which requirements the approaches in this thesis can be applied
    corecore