1,018,694 research outputs found

    Quasar: A Programming Framework for Rapid Prototyping

    Get PDF
    We present a new programming framework, Quasar, which facilitates GPU programming. Our high-level programming language relieves the developer of all implementation details such that he can focus on the algorithm and the required accuracy. The proposed programming framework consists of a simple high-level programming language, an advanced compiler system, a runtime system and IDE. The Quasar language is a high level scripting language with an easy to learn syntax similar to python and MATLAB. This makes Quasar well suited for fast development and prototyping. A Quasar program is first processed by a front-end compiler that automatically detects serial and parallel loops that could be accelerated by heterogeneous hardware. In a second compilation phase, a number of back-end compilers processes the output of the front-end compiler, thus generating C++, OpenCL or CUDA code. Based on the generated code the runtime system can dynamically switch between CPU and GPU. This automatic scheduling at runtime is done by analyzing the load of all devices, the memory transfer cost and the complexity of the task. This way, the programmer is relieved from complicated implementation issues that are common for programming heterogeneous hardware. We validated the use of Quasar on a number of complex image processing and computer vision algorithms. These programs range from denoising to automated image segmentation applications. Using Quasar we get speed-up factors of 4 to over 60, depending on the application. All results were achieved using an NVIDIA GeForce M750

    MARBLE: A system for executing expert systems in parallel

    Get PDF
    This paper details the MARBLE 2.0 system which provides a parallel environment for cooperating expert systems. The work has been done in conjunction with the development of an intelligent computer-aided design system, ICADS, by the CAD Research Unit of the Design Institute at California Polytechnic State University. MARBLE (Multiple Accessed Rete Blackboard Linked Experts) is a system of C Language Production Systems (CLIPS) expert system tool. A copied blackboard is used for communication between the shells to establish an architecture which supports cooperating expert systems that execute in parallel. The design of MARBLE is simple, but it provides support for a rich variety of configurations, while making it relatively easy to demonstrate the correctness of its parallel execution features. In its most elementary configuration, individual CLIPS expert systems execute on their own processors and communicate with each other through a modified blackboard. Control of the system as a whole, and specifically of writing to the blackboard is provided by one of the CLIPS expert systems, an expert control system

    Zooplankton visualization system: design and real-time lossless image compression

    Get PDF
    In this thesis, I present a design of a small, self-contained, underwater plankton imaging system. I base the imaging system’s design on an embedded PC architecture based on PC/104-Plus standards to meet the compact size and low power requirements. I developed a simple graphical user interface to run on a real-time operating system to control the imaging system. I also address how a real-time image compression scheme implemented on an FPGA chip speeds up image transfer speeds of the imaging system. Since lossless compression of the image is required in order to retain all image details, I began with an established compression scheme like SPIHT, and latter proposed a new compression scheme that suits the imaging system’s requirements. I provide an estimate of the total amount of resources required and propose suitable FPGA chips to implement the compression scheme. Finally, I present various parallel designs by which the FPGA chip can be integrated into the imaging system

    Optimal Vertex Cover for the Small-World Hanoi Networks

    Full text link
    The vertex-cover problem on the Hanoi networks HN3 and HN5 is analyzed with an exact renormalization group and parallel-tempering Monte Carlo simulations. The grand canonical partition function of the equivalent hard-core repulsive lattice-gas problem is recast first as an Ising-like canonical partition function, which allows for a closed set of renormalization group equations. The flow of these equations is analyzed for the limit of infinite chemical potential, at which the vertex-cover problem is attained. The relevant fixed point and its neighborhood are analyzed, and non-trivial results are obtained both, for the coverage as well as for the ground state entropy density, which indicates the complex structure of the solution space. Using special hierarchy-dependent operators in the renormalization group and Monte-Carlo simulations, structural details of optimal configurations are revealed. These studies indicate that the optimal coverages (or packings) are not related by a simple symmetry. Using a clustering analysis of the solutions obtained in the Monte Carlo simulations, a complex solution space structure is revealed for each system size. Nevertheless, in the thermodynamic limit, the solution landscape is dominated by one huge set of very similar solutions.Comment: RevTex, 24 pages; many corrections in text and figures; final version; for related information, see http://www.physics.emory.edu/faculty/boettcher

    System and component design and test of a 10 hp, 18,000 rpm AC dynamometer utilizing a high frequency AC voltage link, part 1

    Get PDF
    Hard and soft switching test results conducted with one of the samples of first generation MOS-controlled thyristor (MCTs) and similar test results with several different samples of second generation MCT's are reported. A simple chopper circuit is used to investigate the basic switching characteristics of MCT under hard switching and various types of resonant circuits are used to determine soft switching characteristics of MCT under both zero voltage and zero current switching. Next, operation principles of a pulse density modulated converter (PDMC) for three phase (3F) to 3F two-step power conversion via parallel resonant high frequency (HF) AC link are reviewed. The details for the selection of power switches and other power components required for the construction of the power circuit for the second generation 3F to 3F converter system are discussed. The problems encountered in the first generation system are considered. Design and performance of the first generation 3F to 3F power converter system and field oriented induction moter drive based upon a 3 kVA, 20 kHz parallel resonant HF AC link are described. Low harmonic current at the input and output, unity power factor operation of input, and bidirectional flow capability of the system are shown via both computer and experimental results. The work completed on the construction and testing of the second generation converter and field oriented induction motor drive based upon specifications for a 10 hp squirrel cage dynamometer and a 20 kHz parallel resonant HF AC link is discussed. The induction machine is designed to deliver 10 hp or 7.46 kW when operated as an AC-dynamo with power fed back to the source through the converter. Results presented reveal that the proposed power level requires additional energy storage elements to overcome difficulties with a peak link voltage variation problem that limits reaching to the desired power level. The power level test of the second generation converter after the addition of extra energy storage elements to the HF link are described. The importance of the source voltage level to achieve a better current regulation for the source side PDMC is also briefly discussed. The power levels achieved in the motoring mode of operation show that the proposed power levels achieved in the generating mode of operation can also be easily achieved provided that no mechanical speed limitation were present to drive the induction machine at the proposed power level

    HIGH PERFORMANCE, LOW COST SUBSPACE DECOMPOSITION AND POLYNOMIAL ROOTING FOR REAL TIME DIRECTION OF ARRIVAL ESTIMATION: ANALYSIS AND IMPLEMENTATION

    Get PDF
    This thesis develops high performance real-time signal processing modules for direction of arrival (DOA) estimation for localization systems. It proposes highly parallel algorithms for performing subspace decomposition and polynomial rooting, which are otherwise traditionally implemented using sequential algorithms. The proposed algorithms address the emerging need for real-time localization for a wide range of applications. As the antenna array size increases, the complexity of signal processing algorithms increases, making it increasingly difficult to satisfy the real-time constraints. This thesis addresses real-time implementation by proposing parallel algorithms, that maintain considerable improvement over traditional algorithms, especially for systems with larger number of antenna array elements. Singular value decomposition (SVD) and polynomial rooting are two computationally complex steps and act as the bottleneck to achieving real-time performance. The proposed algorithms are suitable for implementation on field programmable gated arrays (FPGAs), single instruction multiple data (SIMD) hardware or application specific integrated chips (ASICs), which offer large number of processing elements that can be exploited for parallel processing. The designs proposed in this thesis are modular, easily expandable and easy to implement. Firstly, this thesis proposes a fast converging SVD algorithm. The proposed method reduces the number of iterations it takes to converge to correct singular values, thus achieving closer to real-time performance. A general algorithm and a modular system design are provided making it easy for designers to replicate and extend the design to larger matrix sizes. Moreover, the method is highly parallel, which can be exploited in various hardware platforms mentioned earlier. A fixed point implementation of proposed SVD algorithm is presented. The FPGA design is pipelined to the maximum extent to increase the maximum achievable frequency of operation. The system was developed with the objective of achieving high throughput. Various modern cores available in FPGAs were used to maximize the performance and details of these modules are presented in detail. Finally, a parallel polynomial rooting technique based on Newton’s method applicable exclusively to root-MUSIC polynomials is proposed. Unique characteristics of root-MUSIC polynomial’s complex dynamics were exploited to derive this polynomial rooting method. The technique exhibits parallelism and converges to the desired root within fixed number of iterations, making this suitable for polynomial rooting of large degree polynomials. We believe this is the first time that complex dynamics of root-MUSIC polynomial were analyzed to propose an algorithm. In all, the thesis addresses two major bottlenecks in a direction of arrival estimation system, by providing simple, high throughput, parallel algorithms

    Structural Damping by Layers of Fibrous Media Applied to a Periodically-Constrained Vibrating Panel

    Get PDF
    It has recently been demonstrated that layers of fibrous, acoustical material can effectively damp structural vibration in the sub-critical frequency range. In that frequency range, the acoustical near-field of a panel consists of oscillatory flow oriented primarily parallel with the panel surface. When a fibrous layer occupies that region, energy is dissipated by the viscous interaction of the near-field and the fibrous medium, and the result is a damping of the panel motion. Previously, the damping effect has been demonstrated to occur for line-driven, infinite panels and panels with isolated constraints. In this article, the focus is instead on periodically-constrained panels driven into motion by a convective pressure distribution. The constraints are allowed to have translational and rotational inertias and stiffnesses. This arrangement is intended to represent a very simple model of an aircraft fuselage structure. By considering the power flows in this system, it is possible to compute an equivalent loss factor, and then to identify the fibrous layer macroscopic parameters that result in optimal damping at a given mass per unit area. Finally, given that information, it is possible to identify the microstructural details, e.g., fiber size, that would be required to achieve that damping in practice
    • …
    corecore