452 research outputs found

    Reducing energy usage in resource-intensive Java-based scientific applications via micro-benchmark based code refactorings

    Get PDF
    In-silico research has grown considerably. Today's scientific code involves long-running computer simulations and hence powerful computing infrastructures are needed. Traditionally, research in high-performance computing has focused on executing code as fast as possible, while energy has been recently recognized as another goal to consider. Yet, energy-driven research has mostly focused on the hardware and middleware layers, but few efforts target the application level, where many energy-aware optimizations are possible. We revisit a catalog of Java primitives commonly used in OO scientific programming, or micro-benchmarks, to identify energy-friendly versions of the same primitive. We then apply the micro-benchmarks to classical scientific application kernels and machine learning algorithms for both single-thread and multi-thread implementations on a server. Energy usage reductions at the micro-benchmark level are substantial, while for applications obtained reductions range from 3.90% to 99.18%.Fil: Longo, Mathias. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - Tandil. Instituto Superior de IngenierĂ­a del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de IngenierĂ­a del Software; Argentina. University of Southern California; Estados UnidosFil: Rodriguez, Ana Virginia. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - Tandil. Instituto Superior de IngenierĂ­a del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de IngenierĂ­a del Software; ArgentinaFil: Mateos Diaz, Cristian Maximiliano. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - Tandil. Instituto Superior de IngenierĂ­a del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de IngenierĂ­a del Software; ArgentinaFil: Zunino Suarez, Alejandro Octavio. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - Tandil. Instituto Superior de IngenierĂ­a del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de IngenierĂ­a del Software; Argentin

    BIBO stability robustness in the presence of coprime factor perturbations

    Get PDF
    Cover title.Includes bibliographical references (leaf 8).Research supported by the Center for Intelligent Control Systems under an Army Research Office grant. DAAL03-86-K-0171 Research supported by the NSF. 8810178-ECSM.A. Dahleh

    Excited-State Electronic Structure with Configuration Interaction Singles and Tamm–Dancoff Time-Dependent Density Functional Theory on Graphical Processing Units

    Get PDF
    Excited-state calculations are implemented in a development version of the GPU-based TeraChem software package using the configuration interaction singles (CIS) and adiabatic linear response Tamm–Dancoff time-dependent density functional theory (TDA-TDDFT) methods. The speedup of the CIS and TDDFT methods using GPU-based electron repulsion integrals and density functional quadrature integration allows full ab initio excited-state calculations on molecules of unprecedented size. CIS/6-31G and TD-BLYP/6-31G benchmark timings are presented for a range of systems, including four generations of oligothiophene dendrimers, photoactive yellow protein (PYP), and the PYP chromophore solvated with 900 quantum mechanical water molecules. The effects of double and single precision integration are discussed, and mixed precision GPU integration is shown to give extremely good numerical accuracy for both CIS and TDDFT excitation energies (excitation energies within 0.0005 eV of extended double precision CPU results)

    Design and Analysis of a Small Form Factor Desktop Computer Enclosure

    Get PDF
    A thesis presented to the faculty of the College of Science and Technology at Morehead State University in partial fulfillment of the requirements for the Degree of Masters of Science Engineering Technology by Curt Adkins on March 13, 2013

    Compiling an Array Language to a Graphics Processor

    Get PDF
    Graphics processors are significantly faster than traditional processors, particularly for numerical code, and in recent years have become flexible enough to permit general-purpose use, rather than just graphics use. NVIDIA\u27s CUDA makes general-purpose graphics processor computing feasible, but it still requires significant programmer effort. My thesis is that array programming can be an effective way to program graphics processors, and that a restricted, functionally pure array language coupled with simple optimizations can have performance competitive with handwritten GPU programs. I support this thesis through the research language Barracuda, an array language embedded within Haskell that generates optimized CUDA code

    On the application of graphics processor to wireless receiver design

    Get PDF
    In many wireless systems, a Turbo decoder is often combined with a soft-output multiple-input and multiple-output (MIMO) detector at the receiver to maximize performance in many 4G and beyond wireless standards. Although custom application specific designs are usually used to meet this challenge, programmable graphics processing units (GPU) has become an alternative to the traditional ASIC and FPGA solution for wireless applications. However, careful architecture-aware algorithm design and mapping are required to maximize performance of a communication block on GPU. For MIMO soft detection, we implemented a new MIMO soft detection algorithm, multi-pass trellis traversal (MTT). For Turbo decoding, we used a parallel window algorithm. We showed that our implementations can achieve high throughput while maintaining good performance. This work will allow us to implement a complete iterative MIMO receiver in software on GPU in the future

    On the Real-Time Performance, Robustness and Accuracy of Medical Image Non-Rigid Registration

    Get PDF
    Three critical issues about medical image non-rigid registration are performance, robustness and accuracy. A registration method, which is capable of responding timely with an accurate alignment, robust against the variation of the image intensity and the missing data, is desirable for its clinical use. This work addresses all three of these issues. Unacceptable execution time of Non-rigid registration (NRR) often presents a major obstacle to its routine clinical use. We present a hybrid data partitioning method to parallelize a NRR method on a cooperative architecture, which enables us to get closer to the goal: accelerating using architecture rather than designing a parallel algorithm from scratch. to further accelerate the performance for the GPU part, a GPU optimization tool is provided to automatically optimize GPU execution configuration.;Missing data and variation of the intensity are two severe challenges for the robustness of the registration method. A novel point-based NRR method is presented to resolve mapping function (deformation field) with the point correspondence missing. The novelty of this method lies in incorporating a finite element biomechanical model into an Expectation and Maximization (EM) framework to resolve the correspondence and mapping function simultaneously. This method is extended to deal with the deformation induced by tumor resection, which imposes another challenge, i.e. incomplete intra-operative MRI. The registration is formulated as a three variable (Correspondence, Deformation Field, and Resection Region) functional minimization problem and resolved by a Nested Expectation and Maximization framework. The experimental results show the effectiveness of this method in correcting the deformation in the vicinity of the tumor. to deal with the variation of the intensity, two different methods are developed depending on the specific application. For the mono-modality registration on delayed enhanced cardiac MRI and cine MRI, a hybrid registration method is designed by unifying both intensity- and feature point-based metrics into one cost function. The experiment on the moving propagation of suspicious myocardial infarction shows effectiveness of this hybrid method. For the multi-modality registration on MRI and CT, a Mutual Information (MI)-based NRR is developed by modeling the underlying deformation as a Free-Form Deformation (FFD). MI is sensitive to the variation of the intensity due to equidistant bins. We overcome this disadvantage by designing a Top-to-Down K-means clustering method to naturally group similar intensities into one bin. The experiment shows this method can increase the accuracy of the MI-based registration.;In image registration, a finite element biomechanical model is usually employed to simulate the underlying movement of the soft tissue. We develop a multi-tissue mesh generation method to build a heterogeneous biomechanical model to realistically simulate the underlying movement of the brain. We focus on the following four critical mesh properties: tissue-dependent resolution, fidelity to tissue boundaries, smoothness of mesh surfaces, and element quality. Each mesh property can be controlled on a tissue level. The experiments on comparing the homogeneous model with the heterogeneous model demonstrate the effectiveness of the heterogeneous model in improving the registration accuracy

    The Daily Egyptian, December 02, 1975

    Get PDF

    Mutable Objects, Spatial Manipulation and Performance Optimization

    Get PDF
    Contemporary digital design techniques are powerful, but disjoint. There are myriad emerging ways of manipulating design components, and generating both functional forms and formal functions. With the combination of selective agglomeration, sequencing, and heuristics, it is possible to use these techniques to focus on optimizing performance criteria, and selecting for defined characteristics. With these techniques, complex, performance oriented systems can emerge, with minimal input and high effectiveness and e""ciency. These processes depend on iterative loops for stability and directionality, and are the basis for optimization and refinement. They begin to approach cybernetic principles of self-organization and equilibrium. By rapidly looping this process, design ‘attractors’– shared solution components–become visible and accessible. In the past, we have been dedicated to selecting the contents of the design space. With these tools, we can now ask, what are the inputs to the design process, what is the continuum or spectrum of design inputs, and what are the selection criteria for the success of a design-aspect? These new questions allow for a greater coherence within a particular cognitive model for the designed and desired object. There are ways of using optimization criteria that enable design freedom within these boundaries, while enforcing constraints and maintaining consistency for selected processes and product aspects. The identification and codification of new rules for the process support both flexibility and the potential for cognitive restructuring of the process and sequences of design
    • 

    corecore