293 research outputs found

    The exploitation of parallelism on shared memory multiprocessors

    Get PDF
    PhD ThesisWith the arrival of many general purpose shared memory multiple processor (multiprocessor) computers into the commercial arena during the mid-1980's, a rift has opened between the raw processing power offered by the emerging hardware and the relative inability of its operating software to effectively deliver this power to potential users. This rift stems from the fact that, currently, no computational model with the capability to elegantly express parallel activity is mature enough to be universally accepted, and used as the basis for programming languages to exploit the parallelism that multiprocessors offer. To add to this, there is a lack of software tools to assist programmers in the processes of designing and debugging parallel programs. Although much research has been done in the field of programming languages, no undisputed candidate for the most appropriate language for programming shared memory multiprocessors has yet been found. This thesis examines why this state of affairs has arisen and proposes programming language constructs, together with a programming methodology and environment, to close the ever widening hardware to software gap. The novel programming constructs described in this thesis are intended for use in imperative languages even though they make use of the synchronisation inherent in the dataflow model by using the semantics of single assignment when operating on shared data, so giving rise to the term shared values. As there are several distinct parallel programming paradigms, matching flavours of shared value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council

    Synchronization Algorithms for Multi-cores and Multiprocessors

    Get PDF
    A distributed system is a group of processors that do not allocate memory. As an alternative, each processor has its own local memory, and the processors communicate with one another through communication lines such as local-area or wide-area networks. The processors in a distributed system vary in size and function. Such systems may include small handheld or real-time devices, personal computers, workstations, and large mainframe computer systems. Distributed systems, will have their own set of unique challenges, including synchronizing data and creating sense of conflicts. Effective synchronization algorithms performance depends on runtime factors that are rigid to predict. The designers have protocols to employ the synchronization operation and waiting mechanisms to wait for synchronization delays. In this paper an effort is made to investigate synchronization algorithm that vigorously select waiting mechanisms and protocols in response to runtime factors so as to attain enhanced performance. DOI: 10.17762/ijritcc2321-8169.150615

    Language independent modelling of parallelism

    Get PDF
    To make programs work in parallel contexts without any hazards, programming languages require changes to their structures and compilers. One of the most complicated parts is memory models and how programming languages deal with memory interactions. Different processors provide a different level of safety guarantees (i.e. ARM provides relaxed whereas Intel provides strong guarantees). On the other hand, different programming languages provide different structures for parallel computation and have individual protocols for communicating with parallel processes. Unfortunately, no specific choice is best in all situations. This thesis focuses on memory models of various programming languages and processors highlighting some positive and negative features from the point of view of programmability, performance and portability. In order to give some evidence of problems and performance bottlenecks, some small programs have been developed. This thesis also concentrates on incorrect behaviors, especially on data race conditions in programs, providing suggestions on how to avoid them. Also, some litmus tests on systems featuring different vendors' processors were performed to observe data races on each system. Nowadays programming paradigms also became a big issue. Some of the programming styles support observable non-determinism which is the main reason for incorrect behavior in programs. In this thesis, different programming models are also discussed based on the current state of the available research. Also, the imperative and functional paradigms in different contexts are compared. Finally, a mathematical problem was solved using two different paradigms to provide some practical evidence of the theory

    Doctor of Philosophy

    Get PDF
    dissertationWith the explosion of chip transistor counts, the semiconductor industry has struggled with ways to continue scaling computing performance in line with historical trends. In recent years, the de facto solution to utilize excess transistors has been to increase the size of the on-chip data cache, allowing fast access to an increased portion of main memory. These large caches allowed the continued scaling of single thread performance, which had not yet reached the limit of instruction level parallelism (ILP). As we approach the potential limits of parallelism within a single threaded application, new approaches such as chip multiprocessors (CMP) have become popular for scaling performance utilizing thread level parallelism (TLP). This dissertation identifies the operating system as a ubiquitous area where single threaded performance and multithreaded performance have often been ignored by computer architects. We propose that novel hardware and OS co-design has the potential to significantly improve current chip multiprocessor designs, enabling increased performance and improved power efficiency. We show that the operating system contributes a nontrivial overhead to even the most computationally intense workloads and that this OS contribution grows to a significant fraction of total instructions when executing several common applications found in the datacenter. We demonstrate that architectural improvements have had little to no effect on the performance of the OS over the last 15 years, leaving ample room for improvements. We specifically consider three potential solutions to improve OS execution on modern processors. First, we consider the potential of a separate operating system processor (OSP) operating concurrently with general purpose processors (GPP) in a chip multiprocessor organization, with several specialized structures acting as efficient conduits between these processors. Second, we consider the potential of segregating existing caching structures to decrease cache interference between the OS and application. Third, we propose that there are components within the OS itself that should be refactored to be both multithreaded and cache topology aware, which in turn, improves the performance and scalability of many-threaded applications
    • …
    corecore