7,524 research outputs found

    Undergraduate Catalog of Studies, 2023-2024

    Get PDF

    Graduate Catalog of Studies, 2023-2024

    Get PDF

    Undergraduate Catalog of Studies, 2023-2024

    Get PDF

    Faster inference from state space models via GPU computing

    Get PDF
    Funding: C.F.-J. is funded via a doctoral scholarship from the University of St Andrews, School of Mathematics and Statistics.Inexpensive Graphics Processing Units (GPUs) offer the potential to greatly speed up computation by employing their massively parallel architecture to perform arithmetic operations more efficiently. Population dynamics models are important tools in ecology and conservation. Modern Bayesian approaches allow biologically realistic models to be constructed and fitted to multiple data sources in an integrated modelling framework based on a class of statistical models called state space models. However, model fitting is often slow, requiring hours to weeks of computation. We demonstrate the benefits of GPU computing using a model for the population dynamics of British grey seals, fitted with a particle Markov chain Monte Carlo algorithm. Speed-ups of two orders of magnitude were obtained for estimations of the log-likelihood, compared to a traditional ‘CPU-only’ implementation, allowing for an accurate method of inference to be used where this was previously too computationally expensive to be viable. GPU computing has enormous potential, but one barrier to further adoption is a steep learning curve, due to GPUs' unique hardware architecture. We provide a detailed description of hardware and software setup, and our case study provides a template for other similar applications. We also provide a detailed tutorial-style description of GPU hardware architectures, and examples of important GPU-specific programming practices.Publisher PDFPeer reviewe

    On-board Payload Data Processing Combined with the Roofline Model for Hardware/Software Design

    No full text
    International audienceHigh-performance on-board payload data processing has become more interesting with the development of radiationhardened multiprocessor System-on-chip (MPSoC). As recent space-qualified MPSoCs include Arm Central Processing Units (CPUs) and Field Programmable Gate Arrays (FPGAs), an efficient design method is required to deal with complex heterogeneous embedded systems. Both data bit-width (data accuracy) and processing performance are important in astronomy, thus the design methodology should concern application-specific Multi-Objective Optimization Problems (MOOPs). This paper proposes to combine the roofline performance model with Design Space Exploration (DSE) of hardware/software designs as a methodology. We use High-Level Synthesis (HLS) for FPGA design to configure different hardware architectures based on C/C++ and pragmas. We develop a benchmark for payload data processing on Arm CPUs and embedded FPGA on a heterogeneous MPSoC by adapting open-source libraries for one of the most commonly used algorithms to provide validated libraries to payload teams. The benchmark takes as constraints the SVOM ECLAIRs payload requirements, and as input data the CCSDS test images, executes applications, and verifies output data. We chose an AMD-Xilinx Zynq UltraScale+ evaluation board and the two-Dimensional Fast Fourier Transform (2-D FFT) as a DSE use case. We designed the benchmark on an Arm Cortex-A53 in bare-metal and an embedded FPGA based on Vitis HLS. The results show a customized roofline model with the hardware/software design. The implemented design has 1.6-55 times faster performance compared to the payload execution time requirement. Based on the proposed roofline model and the DSE results, future payload teams can study the trade-off between execution time and area efficiency to select the most suitable implementation

    Graduate Catalog of Studies, 2023-2024

    Get PDF

    GPU-optimized approaches to molecular docking-based virtual screening in drug discovery: A comparative analysis

    Get PDF
    Finding a novel drug is a very long and complex procedure. Using computer simulations, it is possible to accelerate the preliminary phases by performing a virtual screening that filters a large set of drug candidates to a manageable number. This paper presents the implementations and comparative analysis of two GPU-optimized implementations of a virtual screening algorithm targeting novel GPU architectures. This work focuses on the analysis of parallel computation patterns and their mapping onto the target architecture. The first method adopts a traditional approach that spreads the computation for a single molecule across the entire GPU. The second uses a novel batched approach that exploits the parallel architecture of the GPU to evaluate more molecules in parallel. Experimental results showed a different behavior depending on the size of the database to be screened, either reaching a performance plateau sooner or having a more extended initial transient period to achieve a higher throughput (up to 5x), which is more suitable for extreme-scale virtual screening campaigns

    Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems

    Get PDF
    This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems

    UMSL Bulletin 2022-2023

    Get PDF
    The 2022-2023 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1087/thumbnail.jp
    • 

    corecore