Search CORE

334 research outputs found

Dynamic Simultaneous Multithreaded Architecture

Author: Lee B.
Ortiz-Arroyo Daniel
Publication venue: International Society of Computers and Their Applications
Publication date: 01/01/2003
Field of study

An Expressive Language and Efficient Execution System for Software Agents

Author: Barish G.
Knoblock C. A.
Publication venue: 'AI Access Foundation'
Publication date: 09/09/2011
Field of study

Software agents can be used to automate many of the tedious, time-consuming information processing tasks that humans currently have to complete manually. However, to do so, agent plans must be capable of representing the myriad of actions and control flows required to perform those tasks. In addition, since these tasks can require integrating multiple sources of remote information ? typically, a slow, I/O-bound process ? it is desirable to make execution as efficient as possible. To address both of these needs, we present a flexible software agent plan language and a highly parallel execution system that enable the efficient execution of expressive agent plans. The plan language allows complex tasks to be more easily expressed by providing a variety of operators for flexibly processing the data as well as supporting subplans (for modularity) and recursion (for indeterminate looping). The executor is based on a streaming dataflow model of execution to maximize the amount of operator and data parallelism possible at runtime. We have implemented both the language and executor in a system called THESEUS. Our results from testing THESEUS show that streaming dataflow execution can yield significant speedups over both traditional serial (von Neumann) as well as non-streaming dataflow-style execution that existing software and robot agent execution systems currently support. In addition, we show how plans written in the language we present can represent certain types of subtasks that cannot be accomplished using the languages supported by network query engines. Finally, we demonstrate that the increased expressivity of our plan language does not hamper performance; specifically, we show how data can be integrated from multiple remote sources just as efficiently using our architecture as is possible with a state-of-the-art streaming-dataflow network query engine

arXiv.org e-Print Archive

Crossref

Design space exploration for GPU-based architecture

Author: Ye Z.
Publication venue
Publication date: 01/01/2009
Field of study

Repository TU/e

Pure OAI Repository

Improving IRWLS algorithm for GLM with Intel Xeon Family

Author: Xu Zhenzhi
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2018
Field of study

This study investigates utilizing the characteristics of Intel Xeon to improve the performance of training generalized linear models. The classic approach to fnd the maximum likelihood estimation of linear model requires loading entire data into memory for computation which is infeasible when data size is bigger than memory size. With the approach analyzed by Zhang and Yang (2017), the process of model fitting will be achieved iteratively through iterating each row. However, one limitation of this approach could be the iterative manner will impact performance when implementing it on Intel Xeon processor which delivers parallelism and vectorization. The study will focus on the tuning of application process and configuration on Xeon family processor based on the architecture of GLM model fitting algorithm

Purdue E-Pubs

Asynchrony in parallel computing: from dataflow to multithreading

Author: Robic B.
Silc J.
Ungerer Theo
Publication venue
Publication date: 02/08/2007
Field of study

KITopen

Investigation of a simultaneous multithreaded architecture

Author: Torrant Marc
Publication venue: RIT Scholar Works
Publication date: 01/08/1999
Field of study

Many enhancements have been made to the traditional general purpose load-store computer architectures. Among the enhancements are memory hierarchy improvements, branch prediction, and multiple issue processors. A major problem that exists with current microprocessor design is the disparity in the much larger increase in speed of the CPU versus the moderate increase in speed accessing main memory. The simultaneous multithreaded architecture is an extension of the single-threaded architecture that helps hide the performance penalty created by long-latency instructions, branch mispredictions, and memory accesses. Simultaneous multithreaded architectures use a more flexible parallelism, which takes advantage of both instruction-level, and thread-level parallelism. The goal of this project was to design, simulate, and analyze a model of a simultaneous multithreaded architecture in order to evaluate design alternatives. The simulator was created by modifying a version of the Simple Scalar toolset, developed at the University of Wisconsin. The simulations provide documentation for an overall system performance improvement of a simulta neous multithreaded architecture. In early simulation results, performed with the same number of functional units, an improvement in the number of instructions per cycle (IPC) of between 43% and 58% was found using four threads versus a single thread. The horizontal waste rate, which measures the number of unused issue slots, was reduced between 35% and 46%. The vertical waste rate, which measures the percentage- of unused issue cycles (no issue slots used in a cycle), was reduced between 46% and 61%. These results are derived from a set of four sample programs. It was also found that increasing the number of certain functional units did not improve performance, whereas increasing the number of other types of functional units did have a significant positive impact on performance

RIT Scholar Works