587,438 research outputs found

    Many-Task Computing and Blue Waters

    Full text link
    This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

    Asynchronous Execution of Python Code on Task Based Runtime Systems

    Get PDF
    Despite advancements in the areas of parallel and distributed computing, the complexity of programming on High Performance Computing (HPC) resources has deterred many domain experts, especially in the areas of machine learning and artificial intelligence (AI), from utilizing performance benefits of such systems. Researchers and scientists favor high-productivity languages to avoid the inconvenience of programming in low-level languages and costs of acquiring the necessary skills required for programming at this level. In recent years, Python, with the support of linear algebra libraries like NumPy, has gained popularity despite facing limitations which prevent this code from distributed runs. Here we present a solution which maintains both high level programming abstractions as well as parallel and distributed efficiency. Phylanx, is an asynchronous array processing toolkit which transforms Python and NumPy operations into code which can be executed in parallel on HPC resources by mapping Python and NumPy functions and variables into a dependency tree executed by HPX, a general purpose, parallel, task-based runtime system written in C++. Phylanx additionally provides introspection and visualization capabilities for debugging and performance analysis. We have tested the foundations of our approach by comparing our implementation of widely used machine learning algorithms to accepted NumPy standards

    Research on Efficiency and Security for Emerging Distributed Applications

    Get PDF
    Distributed computing has never stopped its advancement since the early years of computer systems. In recent years, edge computing has emerged as an extension of cloud computing. The main idea of edge computing is to provide hardware resources in proximity to the end devices, thereby offering low network latency and high network bandwidth. However, as an emerging distributed computing paradigm, edge computing currently lacks effective system support. To this end, this dissertation studies the ways of building system support for edge computing. We first study how to support the existing, non-edge-computing applications in edge computing environments. This research leads to the design of a platform called SMOC that supports executing mobile applications on edge servers. We consider mobile applications in this project because there are a great number of mobile applications in the market and we believe that mobile-edge computing will become an important edge computing paradigm in the future. SMOC supports executing ARM-based mobile applications on x86 edge servers by establishing a running environment identical to that of the mobile device at the edge. It also exploits hardware virtualization on the mobile device to protect user input. Next, we investigate how to facilitate the development of edge applications with system support. This study leads to the design of an edge computing framework called EdgeEngine, which consists of a middleware running on top of the edge computing infrastructure and a powerful, concise programming interface. Developers can implement edge applications with minimal programming effort through the programming interface, and the middleware automatically fulfills the routine tasks, such as data dispatching, task scheduling, lock management, etc., in a highly efficient way. Finally, we envision that consensus will be an important building block for many edge applications, because we consider the consensus problem to be the most important fundamental problem in distributed computing while edge computing is an emerging distributed computing paradigm. Therefore, we investigate how to support the edge applications that rely on consensus, helping them achieve good performance. This study leads to the design of a novel, Paxos-based consensus protocol called Nomad, which rapidly orders the messages received by the edge. Nomad can quickly adapt to the workload changes across the edge computing system, and it incorporates a backend cloud to resolve the conflicts in a timely manner. By doing so, Nomad reduces the user-perceived latency as much as possible, outperforming the existing consensus protocols

    Locality-Aware Concurrency Platforms

    Get PDF
    Modern computing systems from all domains are becoming increasingly more parallel. Manufacturers are taking advantage of the increasing number of available transistors by packaging more and more computing resources together on a single chip or within a single system. These platforms generally contain many levels of private and shared caches in addition to physically distributed main memory. Therefore, some memory is more expensive to access than other and high-performance software must consider memory locality as one of the first level considerations. Memory locality is often difficult for application developers to consider directly, however, since many of these NUMA affects are invisible to the application programmer and only show up in low performance. Moreover, on parallel platforms, the performance depends on both locality and load balance and these two metrics are often at odds with each other. Therefore, directly considering locality and load balance at the application level may make the application much more complex to program. In this work, we develop locality-conscious concurrency platforms for multiple different structured parallel programming models, including streaming applications, task-graphs and parallel for loops. In all of this work, the idea is to minimally disrupt the application programming model so that the application developer is either unimpacted or must only provide high-level hints to the runtime system. The runtime system then schedules the application to provide good locality of access while, at the same time also providing good load balance. In particular, we address cache locality for streaming applications through static partitioning and developed an extensible platform to execute partitioned streaming applications. For task-graphs, we extend a task-graph scheduling library to guide scheduling decisions towards better NUMA locality with the help of user-provided locality hints. CilkPlus parallel for loops utilize a randomized dynamic scheduler to distribute work which, in many loop based applications, results in poor locality at all levels of the memory hierarchy. We address this issue with a novel parallel for loop implementation that can get good cache and NUMA locality while providing support to maintain good load balance dynamically

    Effective Large Scale Computing Software for Parallel Mesh Generation

    Get PDF
    Scientists commonly turn to supercomputers or Clusters of Workstations with hundreds (even thousands) of nodes to generate meshes for large-scale simulations. Parallel mesh generation software is then used to decompose the original mesh generation problem into smaller sub-problems that can be solved (meshed) in parallel. The size of the final mesh is limited by the amount of aggregate memory of the parallel machine. Also, requesting many compute nodes on a shared computing resource may result in a long waiting, far surpassing the time it takes to solve the problem.;These two problems (i.e., insufficient memory when computing on a small number of nodes, and long waiting times when using many nodes from a shared computing resource) can be addressed by using out-of-core algorithms. These are algorithms that keep most of the dataset out-of-core (i.e., outside of memory, on disk) and load only a portion in-core (i.e., into memory) at a time.;We explored two approaches to out-of-core computing. First, we presented a traditional approach, which is to modify the existing in-core algorithms to enable out-of-core computing. While we achieved good performance with this approach the task is complex and labor intensive. An alternative approach, we presented a runtime system designed to support out-of-core applications. It requires little modification of the existing in-core application code and still produces acceptable results. Evaluation of the runtime system showed little performance degradation while simplifying and shortening the development cycle of out-of-core applications. The overhead from using the runtime system for small problem sizes is between 12% and 41% while the overlap of computation, communication and disk I/O is above 50% and as high as 61% for large problems.;The main contribution of our work is the ability to utilize computing resources more effectively. The user has a choice of either solving larger problems, that otherwise would not be possible, or solving problems of the same size but using fewer computing nodes, thus reducing the waiting time on shared clusters and supercomputers. We demonstrated that the latter could potentially lead to substantially shorter wall-clock time

    Decision Support System On Operating System

    Get PDF
    The micro-computing world of the ÂŽ 90s is more volatile ever, not only having choices on which hardware to buy (and the brands to choose from seem to be endless). Nowadays, many stuff based on computer technology. Computer needs an operating system to handle the task. Open source gave paradigm in operating system developing. Now, we have to decide which operating system ( OS ) we are going to run on our computers. MS Windows is a leader operating system in desktop, Linux will be a leader operating system in server of internet. Currently, Linux give high effort to bring the operating system to desktop and entertainment. Many platform of hardware also were produced to fulfill the need of IT activities. A lot of applications have highly increased since twenty years ago. Many application can run in more operating system and hardware platform. The user need to consider which operating system appropriate to his system and purpose. In this paper, we will demonstrate matrix decision approach for decision support system (DSS) in choosing operating system. Section one will look at the relations between applications and hardware platform which related to operating system. Methodology of decision based on matrix decision will discuss in section two. Section three will look at some example scenarios in choosing operating system

    Autonomic Management Policy SpeciïŹcation: from UML to DSML

    Get PDF
    International audienceAutonomic computing is recognized as one of the most promizing solutions to address the increasingly complex task of distributed environments' administration. In this context, many projects relied on software components and architectures to provide autonomic management frameworks. We designed such a component-based autonomic management framework, but observed that the interfaces of a component model are too low-level and difficult to use. Therefore, we introduced UML diagrams for the modeling of deployment and management policies. However, we had to adapt/twist the UML semantics in order to meet our requirements, which led us to define DSMLs. In this paper, we present our experience in designing the Tune system and its support for management policy specification, relying on UML diagrams and on DSMLs. We analyse these two approaches, pinpointing the benefits of DSMLs over UML
    • 

    corecore