943 research outputs found

    A Pure Java Parallel Flow Solver

    Get PDF
    In this paper an overview is given on the "Have Java" project to attain a pure Java parallel Navier-Stokes flow solver (JParNSS) based on the thread concept and remote method invocation (RMI). The goal of this project is to produce an industrial flow solver running on an arbitrary sequential or parallel architecture, utilizing the Internet, capable of handling the most complex 3D geometries as well as flow physics, and also linking to codes in other areas such as aeroelasticity etc. Since Java is completely object-oriented the code has been written in an object-oriented programming (OOP) style. The code also includes a graphics user interface (GUI) as well as an interactive steering package for the parallel architecture. The Java OOP approach provides profoundly improved software productivity, robustness, and security as well as reusability and maintainability. OOP allows code construction similar to the aerodynamic design process because objects can be software coded and integrated, reflecting actual design procedures. In addition, Java is the programming language of the Internet and thus Java is the programming language of the Internet and thus Java objects on disparate machines or even separate networks can be connected. We explain the motivation for the design of JParNSS along with its capabilities that set it apart from other solvers. In the first two sections we present a discussion of the Java language as the programming tool for aerospace applications. In section three the objectives of the Have Java project are presented. In the next section the layer structures of JParNSS are discussed with emphasis on the parallelization and client-server (RMI) layers. JParNSS, like its predecessor ParNSS (ANSI-C), is based on the multiblock idea, and allows for arbitrarily complex topologies. Grids are accepted in GridPro property settings, grids of any size or block number can be directly read by JParNSS without any further modifications, requiring no additional preparation time for the solver input. In the last section, computational results are presented, with emphasis on multiprocessor Pentium and Sun parallel systems run by the Solaris operating system (OS)

    Analysis of Multi-Threading and Cache Memory Latency Masking on Processor Performance Using Thread Synchronization Technique

    Get PDF
    Multithreading is a process in which a single processor executes multiple threads concurrently. This enables the processor to divide tasks into separate threads and run them simultaneously, thereby increasing the utilization of available system resources and enhancing performance. When multiple threads share an object and one or more of them modify it, unpredictable outcomes may occur. Threads that exhibit poor locality of memory reference, such as database applications, often experience delays while waiting for a response from the memory hierarchy. This observation suggests how to better manage pipeline contention. To assess the impact of memory latency on processor performance, a dual-core MT machine with four thread contexts per core is utilized. These specific benchmarks are chosen to allow the workload to include programs with both favorable and unfavorable cache locality. To eliminate the issue of wasting the wake-up signals, this work proposes an approach that involves storing all the wake-up calls. It asserts the wake-up calls to the consumer and the producer can store the wake-up call in a variable.   An assigned value in working system (or kernel) storage that each process can check is a semaphore. Semaphore is a variable that reads, and update operations automatically in bit mode. It cannot be actualized in client mode since a race condition may persistently develop when two or more processors endeavor to induce to the variable at the same time. This study includes code to measure the time taken to execute both functions and plot the graph. It should be noted that sending multiple requests to a website simultaneously could trigger a flag, ultimately blocking access to the data. This necessitates some computation on the collected statistics. The execution time is reduced to one third when using threads compared to executing the functions sequentially. This exemplifies the power of multithreading

    Assessing load-sharing within optimistic simulation platforms

    Get PDF
    The advent of multi-core machines has lead to the need for revising the architecture of modern simulation platforms. One recent proposal we made attempted to explore the viability of load-sharing for optimistic simulators run on top of these types of machines. In this article, we provide an extensive experimental study for an assessment of the effects on run-time dynamics by a load-sharing architecture that has been implemented within the ROOT-Sim package, namely an open source simulation platform adhering to the optimistic synchronization paradigm. This experimental study is essentially aimed at evaluating possible sources of overheads when supporting load-sharing. It has been based on differentiated workloads allowing us to generate different execution profiles in terms of, e.g., granularity/locality of the simulation events. © 2012 IEEE

    Doctor of Philosophy

    Get PDF
    dissertationWith the explosion of chip transistor counts, the semiconductor industry has struggled with ways to continue scaling computing performance in line with historical trends. In recent years, the de facto solution to utilize excess transistors has been to increase the size of the on-chip data cache, allowing fast access to an increased portion of main memory. These large caches allowed the continued scaling of single thread performance, which had not yet reached the limit of instruction level parallelism (ILP). As we approach the potential limits of parallelism within a single threaded application, new approaches such as chip multiprocessors (CMP) have become popular for scaling performance utilizing thread level parallelism (TLP). This dissertation identifies the operating system as a ubiquitous area where single threaded performance and multithreaded performance have often been ignored by computer architects. We propose that novel hardware and OS co-design has the potential to significantly improve current chip multiprocessor designs, enabling increased performance and improved power efficiency. We show that the operating system contributes a nontrivial overhead to even the most computationally intense workloads and that this OS contribution grows to a significant fraction of total instructions when executing several common applications found in the datacenter. We demonstrate that architectural improvements have had little to no effect on the performance of the OS over the last 15 years, leaving ample room for improvements. We specifically consider three potential solutions to improve OS execution on modern processors. First, we consider the potential of a separate operating system processor (OSP) operating concurrently with general purpose processors (GPP) in a chip multiprocessor organization, with several specialized structures acting as efficient conduits between these processors. Second, we consider the potential of segregating existing caching structures to decrease cache interference between the OS and application. Third, we propose that there are components within the OS itself that should be refactored to be both multithreaded and cache topology aware, which in turn, improves the performance and scalability of many-threaded applications

    The embedded Java benchmark suite JemBench

    Get PDF

    Improving web server efficiency on commodity hardware

    Get PDF
    El ràpid creixement de la Web requereix una gran quantitat de recursos computacionals que han de ser utilitzats eficientment. Avui en dia, els servidors basats en hardware estendard son les plataformes preferides per executar els servidors web, ja que són les plataformes amb millor relació rendiment/cost. El treball presentat en aquesta tesi esta dirigit a millorar la eficàcia en la gestió de recursos dels servidors web actuals. Per assolir els objectius d'aquesta tesis s'ha caracteritzat el funcionament dels servidors web en diverses entorns representatius, per tal de identificar el problemes i coll d'ampolla que limiten el rendiment del servidor web. Amb l'estudi dels servidors web s'ha identificat dos problemes principals que disminueixen l'eficiència dels servidors web en la utilització dels recursos hardware disponibles. El primer problema identificat és la evolució del protocol HTTP per incorporar connexions persistents i seguretat, que disminueix el rendiment e incrementa la complexitat de configuració dels servidors web. El segon problema és la naturalesa de algunes aplicacions web, les quals estan limitades per la memòria física o l'ample de banda amb el disc, que impedeix la correcta utilització dels recursos presents en les maquines multiprocessadors. Per solucionar aquests dos problemes dels servidors web hem proposat dues tècniques. En primer lloc, l'arquitectura hibrida, una evolució de l'arquitectura multi-threaded que es pot implementar fàcilment el els servidor web actuals i que millora notablement la gestió de les connexions i redueix la complexitat de configuració de tot el sistema. En segon lloc, hem implementat en el kernel del sistema operatiu Linux un comprensió de memòria principal per millorar el rendiment de les aplicacions que tenen la memòria com ha coll d'ampolla, millorant així la utilització dels recursos disponibles. Els resultats d'aquesta tesis estan avalats per una avaluació experimental exhaustiva que ha provat la efectivitat i viabilitat de les nostres propostes. Cal destacar que l'arquitectura de servidor web hybrida proposada en aquesta tesis ha estat implementada recentment per coneguts servidors web com és el cas de Apache, Tomcat i Glassfish.The unstoppable growth of the World Wide Web requires a huge amount of computational resources that must be used efficiently. Nowadays, commodity hardware is the preferred platform to run web server systems because it is the most cost-effective solution. The work presented in this thesis aims to improve the efficiency of current web server systems, allowing the web servers to make the most of hardware resources. To this end, we first characterize current web server system and identify the problems that hinder web servers from providing an efficient utilization of resources. From the study of web servers in a wide range of situations and environments, we have identified two main issues that prevents web servers systems from efficiently using current hardware resources. The first is the extension of the HTTP protocol to include connection persistence and security, which dramatically impacts the performance and configuration complexity of traditional multi-threaded web servers. The second is the memory-bounded or disk-bounded nature of some web workloads that prevents the full utilization of the abundant CPU resources available on current commodity hardware. We propose two novel techniques to overcome the main problems with current web server systems. Firstly, we propose a Hybrid web serverarchitecture which can be easily implemented in any multi-threaded web server to improve CPU utilization so as to provide better management of client connections. And secondly, we describe a main memory compression technique implemented in the Linux operating system that makes optimum use of current multiprocessor's hardware, in order to improve the performance of memory bound web applications. The thesis is supported by an exhaustive experimental evaluation that proves the effectiveness and feasibility of our proposals for current systems. It is worth noting that the main concepts behind the Hybrid architecture have recently been implemented in popular web servers like Apache, Tomcat and Glassfish
    corecore