75 research outputs found

    Efficient Compilation of a Class of Variational Forms

    Full text link
    We investigate the compilation of general multilinear variational forms over affines simplices and prove a representation theorem for the representation of the element tensor (element stiffness matrix) as the contraction of a constant reference tensor and a geometry tensor that accounts for geometry and variable coefficients. Based on this representation theorem, we design an algorithm for efficient pretabulation of the reference tensor. The new algorithm has been implemented in the FEniCS Form Compiler (FFC) and improves on a previous loop-based implementation by several orders of magnitude, thus shortening compile-times and development cycles for users of FFC.Comment: ACM Transactions on Mathematical Software 33(3), 20 pages (2007

    Topological Optimization of the Evaluation of Finite Element Matrices

    Full text link
    We present a topological framework for finding low-flop algorithms for evaluating element stiffness matrices associated with multilinear forms for finite element methods posed over straight-sided affine domains. This framework relies on phrasing the computation on each element as the contraction of each collection of reference element tensors with an element-specific geometric tensor. We then present a new concept of complexity-reducing relations that serve as distance relations between these reference element tensors. This notion sets up a graph-theoretic context in which we may find an optimized algorithm by computing a minimum spanning tree. We present experimental results for some common multilinear forms showing significant reductions in operation count and also discuss some efficient algorithms for building the graph we use for the optimization

    Automated code generation for discontinuous Galerkin methods

    Full text link
    A compiler approach for generating low-level computer code from high-level input for discontinuous Galerkin finite element forms is presented. The input language mirrors conventional mathematical notation, and the compiler generates efficient code in a standard programming language. This facilitates the rapid generation of efficient code for general equations in varying spatial dimensions. Key concepts underlying the compiler approach and the automated generation of computer code are elaborated. The approach is demonstrated for a range of common problems, including the Poisson, biharmonic, advection--diffusion and Stokes equations

    Reflections on fiscalist divergent price-paths

    Get PDF
    In this paper I analyze the classes of price-paths arising from a non-Ricardian fiscal-monetary plan along the lines of the Fiscal Theory of the Price Level (FTPL), under a price-invariant nominal money supply rule in a standard Sidrauski-Brock model. I first show that fiscalist speculative deflationary paths are irrational bubbles. Then I argue that a fully autonomous fiscal policy is, in most cases, no implementable, regardless of the time-horizon, thus complementing Buiter's (2001, 2002) findings. Finally, I claim that, contrary to the FTPL's arguments, a speculative hyperinflation can never be a necessary result. This latter observation is taken as an evidence against the analogy drawn between the equilibrium value of a firm's stock and money, as recently suggested by some proponents of this new paradigm in monetary economics.[resumen de autor

    Optimizing Protein Expression of An Elastin Like Polypeptidegreen Fluorescent Protein Fusion In Bacterial Systems

    Get PDF
    The ability to express and purify large quantities of recombinant protein allows for biotechnological applications such as: protein characterization, usage in industrial processes, the development of commercial goods, and other advanced studies. In the classical methods of protein expression, cells are grown to the mid-log phase and are induced via a simple sugar or a chemical agent. Post-induction, the energy used for cellular growth is redirected for the production of ELPs. Samples are collected and analyzed for cell count and harvested for fluorescence spectroscopy and protein purification. Optimization of protein expression of an elastin-like-polypeptide and green fluorescence protein fusion (ELP-GFP fusion protein) is achieved by targeting the organisms, expression systems, altering media formulations, antibiotic resistance, and selective inducers for optimal production of ELP-GFP fusion proteins. We investigate which organism and expression system, after optimizing the conditions, yields maximum amount of ELP-GFP fusion protein by quantifying with fluorescence and absorbance at 280 nm. The bacterial host, BL21 (DE3), gave optimal expression of ELP-GFP when induced with lactose and galactose. The yields were between 400-600 mg/L of culture, much higher than when expressed in LB media and induced with IPTG. The Vmax strain, which has a doubling rate of less than 10 minutes, however, it is not optimized for protein expression. For a different ELP-GFP fusion protein, the protein yields after purification v were between 200-300 mg/L for BL21 (DE3) and between 100-200 mg/L for Vmax derived cells. The lac and the ara operons do not behave the same way during protein expression. The lac system suffers from “leaky” expression whereas the ara system is more tightly regulated. Fluorescence was used as a reliable means to understand the behavior of my ELP-GFP proteins. The fluorescence data revealed that the lac operon is more stable for longer fermentations times whereas, the ara operon is more suitable for shorter fermentation times

    Appendix I: Drafting legislation for development: lessons from a Chinese project

    Get PDF
    A discussion on the different theoretical issues regarding development legislation that divide economists and lawyers

    Optimizing group-by and aggregation using GPU-CPU co-processing

    Get PDF
    While GPU query processing is a well-studied area, real adoption is limited in practice as typically GPU execution is only significantly faster than CPU execution if the data resides in GPU memory, which limits scalability to small data scenarios where performance tends to be less critical. Another problem is that not all query code (e.g. UDFs) will realistically be able to run on GPUs. We therefore investigate CPU-GPU co-processing, where both the CPU and GPU are involved in evaluating the query in scenarios where the data does not fit in the GPU memory.As we wish to deeply explore opportunities for optimizing execution speed, we narrow our focus further to a specific well-studied OLAP scenario, amenable to such co-processing, in the form of the TPC-H benchmark Query 1.For this query, and at large scale factors, we are able to improve performance significantly over the state-of-the-art for GPU implementations; we present competitive performance of a GPU versus a state-of-the-art multi-core CPU baseline a novelty for data exceeding GPU memory size; and finally, we show that co-processing does provide significant additional speedup over any of the processors individually.We achieve this performance improvement by utilizing parallelism-friendly compression to alleviate the PCIe transfer bottleneck, query-compilation-like fusion of the processing operations, and a simple yet effective scheduling mechanism. We hope that some of these features can inspire future work on GPU-focused and heterogeneous analytic DBMSes.</p

    The Matrix Reloaded: Multiplication Strategies in FrodoKEM

    Get PDF
    Lattice-based schemes are promising candidates to replace the current public-key cryptographic infrastructure in wake of the looming threat of quantum computers. One of the Round 3 candidates of the ongoing NIST post-quantum standardization effort is FrodoKEM. It was designed to provide conservative security, which comes with the caveat that implementations are often bigger and slower compared to alternative schemes. In particular, the most time-consuming arithmetic operation of FrodoKEM is the multiplication of matrices with entries in Z_q. In this work, we investigate the performance of different matrix multiplication approaches in the specific setting of FrodoKEM. We consider both optimized “naïve” matrix multiplication with cubic complexity, as well as the Strassen multiplication algorithm which has a lower asymptotic run-time complexity. Our results show that for the proposed parameter sets of FrodoKEM we can improve over the state-of-the-art implementation with a row-wise blocking and packing approach, denoted as RWCF in the following. For the matrix multiplication in FrodoKEM, this results in a factor two speed-up. The impact of these improvements on the full decapsulation operation is up to 22 percent. We additionally show that for batching use-cases, where many inputs are processed at once, the Strassen approach can be the best choice from batch size 8 upwards. For a practically-relevant batch size of 128 inputs the observed speed-up is in the range of 5 to 11 percent over using the efficient RWCF approach and this speed-up grows with the batch size
    corecore