338 research outputs found

    Automatic Generation of Efficient Linear Algebra Programs

    Full text link
    The level of abstraction at which application experts reason about linear algebra computations and the level of abstraction used by developers of high-performance numerical linear algebra libraries do not match. The former is conveniently captured by high-level languages and libraries such as Matlab and Eigen, while the latter expresses the kernels included in the BLAS and LAPACK libraries. Unfortunately, the translation from a high-level computation to an efficient sequence of kernels is a task, far from trivial, that requires extensive knowledge of both linear algebra and high-performance computing. Internally, almost all high-level languages and libraries use efficient kernels; however, the translation algorithms are too simplistic and thus lead to a suboptimal use of said kernels, with significant performance losses. In order to both achieve the productivity that comes with high-level languages, and make use of the efficiency of low level kernels, we are developing Linnea, a code generator for linear algebra problems. As input, Linnea takes a high-level description of a linear algebra problem and produces as output an efficient sequence of calls to high-performance kernels. In 25 application problems, the code generated by Linnea always outperforms Matlab, Julia, Eigen and Armadillo, with speedups up to and exceeding 10x

    Translating parameter estimation problems from EASY-FIT to SOCS

    Get PDF
    Mathematical models often involve unknown parameters that must be fit to experimental data. These so-called parameter estimation problems have many applications that may involve differential equations, optimization, and control theory. EASY-FIT and SOCS are two software packages that solve parameter estimation problems. In this thesis, we discuss the design and implementation of a source-to-source translator called EFtoSOCS used to translate EASY FIT input into SOCS input. This makes it possible to test SOCS on a large number of parameter estimation problems available in the EASY-FIT problem database that vary both in size and difficulty.Parameter estimation problems typically have many locally optimal solutions, and the solution obtained often depends critically on the initial guess for the solution. A 3-stage approach is followed to enhance the convergence of solutions in SOCS. The stages are designed to use an initial guess that is progressively closer to the optimal solution found by EASY-FIT. Using this approach we run EFtoSOCS on all translatable problems (691) from the EASY-FIT database. We find that all but 7 problems produce converged solutions in SOCS. We describe the reasons that SOCS was not able solve these problems, compare the solutions found by SOCS and EASY-FIT, and suggest possible improvements to both EFtoSOCS and SOCS

    Search-based Model-driven Loop Optimizations for Tensor Contractions

    Get PDF
    Complex tensor contraction expressions arise in accurate electronic structure models in quantum chemistry, such as the coupled cluster method. The Tensor Contraction Engine (TCE) is a high-level program synthesis system that facilitates the generation of high-performance parallel programs from tensor contraction equations. We are developing a new software infrastructure for the TCE that is designed to allow experimentation with optimization algorithms for modern computing platforms, including for heterogeneous architectures employing general-purpose graphics processing units (GPGPUs). In this dissertation, we present improvements and extensions to the loop fusion optimization algorithm, which can be used with cost models, e.g., for minimizing memory usage or for minimizing data movement costs under a memory constraint. We show that our data structure and pruning improvements to the loop fusion algorithm result in significant performance improvements that enable complex cost models being use for large input equations. We also present an algorithm for optimizing the fused loop structure of handwritten code. It determines the regions in handwritten code that are safe to be optimized and then runs the loop fusion algorithm on the dependency graph of the code. Finally, we develop an optimization framework for generating GPGPU code consisting of loop fusion optimization with a novel cost model, tiling optimization, and layout optimization. Depending on the memory available on the GPGPU and the sizes of the tensors, our framework decides which processor (CPU or GPGPU) should perform an operation and where the result should be moved. We present extensive measurements for tuning the loop fusion algorithm, for validating our optimization framework, and for measuring the performance characteristics of GPGPUs. Our measurements demonstrate that our optimization framework outperforms existing general-purpose optimization approaches both on multi-core CPUs and on GPGPUs

    Image stitching algorithm based on feature extraction

    Get PDF
    This paper proposes a novel edge-based stitching method to detect moving objects and construct\ud mosaics from images. The method is a coarse-to-fine scheme which first estimates a\ud good initialization of camera parameters with two complementary methods and then refines\ud the solution through an optimization process. The two complementary methods are the edge\ud alignment and correspondence-based approaches, respectively. The edge alignment method\ud estimates desired image translations by checking the consistencies of edge positions between\ud images. This method has better capabilities to overcome larger displacements and lighting variations\ud between images. The correspondence-based approach estimates desired parameters from\ud a set of correspondences by using a new feature extraction scheme and a new correspondence\ud building method. The method can solve more general camera motions than the edge alignment\ud method. Since these two methods are complementary to each other, the desired initial estimate\ud can be obtained more robustly. After that, a Monte-Carlo style method is then proposed for\ud integrating these two methods together. In this approach, a grid partition scheme is proposed to\ud increase the accuracy of each try for finding the correct parameters. After that, an optimization\ud process is then applied to refine the above initial parameters. Different from other optimization\ud methods minimizing errors on the whole images, the proposed scheme minimizes errors only on\ud positions of features points. Since the found initialization is very close to the exact solution and\ud only errors on feature positions are considered, the optimization process can be achieved very\ud quickly. Experimental results are provided to verify the superiority of the proposed method

    ISCR Annual Report: Fical Year 2004

    Full text link

    A Numerical Investigation of a Spark Ignition Opposed Piston Linear Engine Fueled by Hydrogen

    Get PDF
    The traditional Slider-Crank Engine, also known as the Internal Combustion Engine (ICE), has been criticized for its complex structure, friction loss, low efficiency and high maintenance cost. In contrast, the Free-Piston Linear Engine (FPLE) reduces this friction due to its simpler design. With rising concerns about air quality and stricter regulations, there\u27s a renewed interest in hydrogen as a carbon-free fuel for ICEs. The Opposed Piston Linear Generator (OPLG) system is an integrated arrangement of parts operating harmoniously to generate power efficiently. By leveraging synchronized pistons, accurate fuel distribution, and seamless thermodynamic cycles, it transforms kinetic energy into electrical energy. Dynamic control ensures its operations are both efficient and eco-friendly. However, despite the potential benefits of the OPLG, the intricacies of its operations have posed several challenges, such as misfiring, overfueling, instantaneous transient changes, stalling, and piston control, which can reduce its efficiency and increased emission levels. This dissertation deployed the nuances of suitable correlational, analytical, numerical and control models to better illustrate the performance of the OPLG as against its limitations. The model introduces a non-dimensional streamlined symmetric analytical solution, succeeded by a broader general analytical resolution, with careful attention to the significant influence of thermodynamic effects at each phase, offering insights into the engine\u27s dynamic performance. The Runge-Kutta technique guarantees swift and dependable computational outcomes, which captures cyclic variation as typical ICE compression ratios are reproduced. The study showed a nearly linear interaction between thermal efficiency and the translator\u27s starting position within specific ranges for OPLE. However, maintaining the engine within a narrow high-efficiency band requires precise control, crucial for harnessing the full potential of the OPLG system without compromising its performance. Precise control of TDC piston clearance is crucial for optimizing combustion efficiency and load management. The Model Predictive Control (MPC) algorithm forecasts the pistons\u27 future positions by considering current control inputs and system behaviors. Concurrently, the closed-loop bisection method observer refines the piston position estimates by comparing actual and projected outputs. This algorithm is specifically chosen for its dichotomy feature in finding the roots of equations, which is essential for confining the pistons’ locus within the line of symmetry. This method plays a significant role in refining the control strategy, making it more responsive and accurate in adjusting to the engine\u27s dynamic needs and operational changes. Comparative studies are conducted between the Opposed Piston Linear Engine (OPLE) and the slider-crank engine, fueled by hydrogen. This comparison incorporates hydrogen metering and its interaction with nitrogen oxides. The engine\u27s characteristic includes a volumetric compression ratio of 12 and a fuel equivalence ratio of 0.5, so chosen because hydrogen engines typically require nearly double the air amount for complete combustion. This lean mixture is essential as it leads to combustion temperatures below the threshold for thermal nitric oxide (NOx) formation. Consequently, NOx emissions are virtually non-existent, underscoring a notable environmental benefit of FPLE with its unique combustion characteristics compared to the equivalent slider-crank engines. Simulations highlight OPLE\u27s potential superiorities in engine dynamics, performance, and emission patterns

    Interactive supercomputing

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (leaves 92-96).by Parry Jones Reginald Husbands.Ph.D

    Compiling dataflow graphs into hardware

    Get PDF
    Department Head: L. Darrell Whitley.2005 Fall.Includes bibliographical references (pages 121-126).Conventional computers are programmed by supplying a sequence of instructions that perform the desired task. A reconfigurable processor is "programmed" by specifying the interconnections between hardware components, thereby creating a "hardwired" system to do the particular task. For some applications such as image processing, reconfigurable processors can produce dramatic execution speedups. However, programming a reconfigurable processor is essentially a hardware design discipline, making programming difficult for application programmers who are only familiar with software design techniques. To bridge this gap, a programming language, called SA-C (Single Assignment C, pronounced "sassy"), has been designed for programming reconfigurable processors. The process involves two main steps - first, the SA-C compiler analyzes the input source code and produces a hardware-independent intermediate representation of the program, called a dataflow graph (DFG). Secondly, this DFG is combined with hardware-specific information to create the final configuration. This dissertation describes the design and implementation of a system that performs the DFG to hardware translation. The DFG is broken up into three sections: the data generators, the inner loop body, and the data collectors. The second of these, the inner loop body, is used to create a computational structure that is unique for each program. The other two sections are implemented by using prebuilt modules, parameterized for the particular problem. Finally, a "glue module" is created to connect the various pieces into a complete interconnection specification. The dissertation also explores optimizations that can be applied while processing the DFG, to improve performance. A technique for pipelining the inner loop body is described that uses an estimation tool for the propagation delay of the nodes within the dataflow graph. A scheme is also described that identifies subgraphs with the dataflow graph that can be replaced with lookup tables. The lookup tables provide a faster implementation than random logic in some instances
    • …
    corecore