334 research outputs found

    Dynamic Simultaneous Multithreaded Architecture

    Get PDF

    An Expressive Language and Efficient Execution System for Software Agents

    Full text link
    Software agents can be used to automate many of the tedious, time-consuming information processing tasks that humans currently have to complete manually. However, to do so, agent plans must be capable of representing the myriad of actions and control flows required to perform those tasks. In addition, since these tasks can require integrating multiple sources of remote information ? typically, a slow, I/O-bound process ? it is desirable to make execution as efficient as possible. To address both of these needs, we present a flexible software agent plan language and a highly parallel execution system that enable the efficient execution of expressive agent plans. The plan language allows complex tasks to be more easily expressed by providing a variety of operators for flexibly processing the data as well as supporting subplans (for modularity) and recursion (for indeterminate looping). The executor is based on a streaming dataflow model of execution to maximize the amount of operator and data parallelism possible at runtime. We have implemented both the language and executor in a system called THESEUS. Our results from testing THESEUS show that streaming dataflow execution can yield significant speedups over both traditional serial (von Neumann) as well as non-streaming dataflow-style execution that existing software and robot agent execution systems currently support. In addition, we show how plans written in the language we present can represent certain types of subtasks that cannot be accomplished using the languages supported by network query engines. Finally, we demonstrate that the increased expressivity of our plan language does not hamper performance; specifically, we show how data can be integrated from multiple remote sources just as efficiently using our architecture as is possible with a state-of-the-art streaming-dataflow network query engine

    Improving IRWLS algorithm for GLM with Intel Xeon Family

    Get PDF
    This study investigates utilizing the characteristics of Intel Xeon to improve the performance of training generalized linear models. The classic approach to fnd the maximum likelihood estimation of linear model requires loading entire data into memory for computation which is infeasible when data size is bigger than memory size. With the approach analyzed by Zhang and Yang (2017), the process of model fitting will be achieved iteratively through iterating each row. However, one limitation of this approach could be the iterative manner will impact performance when implementing it on Intel Xeon processor which delivers parallelism and vectorization. The study will focus on the tuning of application process and configuration on Xeon family processor based on the architecture of GLM model fitting algorithm

    Investigation of a simultaneous multithreaded architecture

    Get PDF
    Many enhancements have been made to the traditional general purpose load-store computer architectures. Among the enhancements are memory hierarchy improvements, branch prediction, and multiple issue processors. A major problem that exists with current microprocessor design is the disparity in the much larger increase in speed of the CPU versus the moderate increase in speed accessing main memory. The simultaneous multithreaded architecture is an extension of the single-threaded architecture that helps hide the performance penalty created by long-latency instructions, branch mispredictions, and memory accesses. Simultaneous multithreaded architectures use a more flexible parallelism, which takes advantage of both instruction-level, and thread-level parallelism. The goal of this project was to design, simulate, and analyze a model of a simultaneous multithreaded architecture in order to evaluate design alternatives. The simulator was created by modifying a version of the Simple Scalar toolset, developed at the University of Wisconsin. The simulations provide documentation for an overall system performance improvement of a simulta neous multithreaded architecture. In early simulation results, performed with the same number of functional units, an improvement in the number of instructions per cycle (IPC) of between 43% and 58% was found using four threads versus a single thread. The horizontal waste rate, which measures the number of unused issue slots, was reduced between 35% and 46%. The vertical waste rate, which measures the percentage- of unused issue cycles (no issue slots used in a cycle), was reduced between 46% and 61%. These results are derived from a set of four sample programs. It was also found that increasing the number of certain functional units did not improve performance, whereas increasing the number of other types of functional units did have a significant positive impact on performance
    • …
    corecore