10 research outputs found

    On the Tradeoff between Speedup and Energy Consumption in High Performance Computing – A Bioinformatics Case Study

    Get PDF
    High Performance Computing has been very useful to researchers in the Bioinformatics, Medical and related fields. The bioinformatics domain is rich in applications that require extracting useful information from very large and continuously growing sequence of databases. Automated techniques such as DNA sequencers, DNA microarrays & others are continually growing the dataset that is stored in large public databases such as GenBank and Protein DataBank. Most methods used for analyzing genetic/protein data have been found to be extremely computationally intensive, providing motivation for the use of powerful computers or systems with high throughput characteristics. In this paper, we provide a case study for one such bioinformatics application called BLAT running in a high performance computing environment. We use sequences gathered from researchers and parallelize the runs to study the performance characteristics under three different query and data partitioning models. This research highlights the need to carefully develop a parallel model with energy awareness in mind, based on our understanding of the application and then appropriately designing a parallel model that works well for the specific application and domain. We found that the BLAT program is highly parallelizable and a high degree of speedup is achievable. The experiments suggest that the speed up depends on model used for query and database segmentation

    On the Tradeoff between Speedup and Energy Consumption in High Performance Computing – A Bioinformatics Case Study

    Get PDF
    High Performance Computing has been very useful to researchers in the Bioinformatics, Medical and related fields. The bioinformatics domain is rich in applications that require extracting useful information from very large and continuously growing sequence of databases. Automated techniques such as DNA sequencers, DNA microarrays & others are continually growing the dataset that is stored in large public databases such as GenBank and Protein DataBank. Most methods used for analyzing genetic/protein data have been found to be extremely computationally intensive, providing motivation for the use of powerful computers or systems with high throughput characteristics. In this paper, we provide a case study for one such bioinformatics application called BLAT running in a high performance computing environment. We use sequences gathered from researchers and parallelize the runs to study the performance characteristics under three different query and data partitioning models. This research highlights the need to carefully develop a parallel model with energy awareness in mind, based on our understanding of the application and then appropriately designing a parallel model that works well for the specific application and domain. We found that the BLAT program is highly parallelizable and a high degree of speedup is achievable. The experiments suggest that the speed up depends on model used for query and database segmentation

    Amdahl's law for predicting the future of multicores considered harmful

    Get PDF
    Several recent works predict the future of multicore systems or identify scalability bottlenecks based on Amdahl's law. Amdahl's law implicitly assumes, however, that the problem size stays constant, but in most cases more cores are used to solve larger and more complex problems. There is a related law known as Gustafson's law which assumes that runtime, not the problem size, is constant. In other words, it is assumed that the runtime on p cores is the same as the runtime on 1 core and that the parallel part of an application scales linearly with the number of cores. We apply Gustafson's law to symmetric, asymmetric, and dynamic multicores and show that this leads to fundamentally different results than when Amdahl's law is applied. We also generalize Amdahl's and Gustafson's law and study how this quantitatively effects the dimensioning of future multicore systems

    Energy Awareness and Scheduling in Mobile Devices and High End Computing

    Get PDF
    In the context of the big picture as energy demands rise due to growing economies and growing populations, there will be greater emphasis on sustainable supply, conservation, and efficient usage of this vital resource. Even at a smaller level, the need for minimizing energy consumption continues to be compelling in embedded, mobile, and server systems such as handheld devices, robots, spaceships, laptops, cluster servers, sensors, etc. This is due to the direct impact of constrained energy sources such as battery size and weight, as well as cooling expenses in cluster-based systems to reduce heat dissipation. Energy management therefore plays a paramount role in not only hardware design but also in user-application, middleware and operating system design. At a higher level Datacenters are sprouting everywhere due to the exponential growth of Big Data in every aspect of human life, the buzz word these days is Cloud computing. This dissertation, focuses on techniques, specifically algorithmic ones to scale down energy needs whenever the system performance can be relaxed. We examine the significance and relevance of this research and develop a methodology to study this phenomenon. Specifically, the research will study energy-aware resource reservations algorithms to satisfy both performance needs and energy constraints. Many energy management schemes focus on a single resource that is dedicated to real-time or nonreal-time processing. Unfortunately, in many practical systems the combination of hard and soft real-time periodic tasks, a-periodic real-time tasks, interactive tasks and batch tasks must be supported. Each task may also require access to multiple resources. Therefore, this research will tackle the NP-hard problem of providing timely and simultaneous access to multiple resources by the use of practical abstractions and near optimal heuristics aided by cooperative scheduling. We provide an elegant EAS model which works across the spectrum which uses a run-profile based approach to scheduling. We apply this model to significant applications such as BLAT and Assembly of gene sequences in the Bioinformatics domain. We also provide a simulation for extending this model to cloud computing to answers “what if” scenario questions for consumers and operators of cloud resources to help answers questions of deadlines, single v/s distributed cluster use and impact analysis of energy-index and availability against revenue and ROI

    Amdahl's Reliability Law: A Simple Quantification of the Weakest-Link Phenomenon

    Full text link

    Assessment of Two Task Frameworks with Dependencies for Matrix Factorizations on a Multicore Architecture

    Get PDF
    In this study, we evaluate two task frameworks with dependencies for important application kernels coming from the numerical linear algebra. In this approach, the algorithms of the matrix factorization are considered, namely the tiled LU and the WZ factorizations both without pivoting. In tiled algorithms, the operations are represented as a sequence of small tasks which operate on square blocks (tiles) of the data. The dependencies among tasks are expressed as a direct acyclic graph and the runtime system runs the graph on a multicore architecture. The performance of applications based on the task dependencies is related to efficient compilers and the runtime systems. We report the performance and the scalability of two task frameworks with dependencies on the multicore architecture for the matrix factorizations. Namely, we compare OpenMP and Intel Thread Building Blocks. Our results show that the number of tiles in both factorizations always have an impact on the performance and the speedup. Both the frameworks show their suitability for efficient parallelization of such applications, although both have their own merits and flaws

    Vector coprocessor sharing techniques for multicores: performance and energy gains

    Get PDF
    Vector Processors (VPs) created the breakthroughs needed for the emergence of computational science many years ago. All commercial computing architectures on the market today contain some form of vector or SIMD processing. Many high-performance and embedded applications, often dealing with streams of data, cannot efficiently utilize dedicated vector processors for various reasons: limited percentage of sustained vector code due to substantial flow control; inherent small parallelism or the frequent involvement of operating system tasks; varying vector length across applications or within a single application; data dependencies within short sequences of instructions, a problem further exacerbated without loop unrolling or other compiler optimization techniques. Additionally, existing rigid SIMD architectures cannot tolerate efficiently dynamic application environments with many cores that may require the runtime adjustment of assigned vector resources in order to operate at desired energy/performance levels. To simultaneously alleviate these drawbacks of rigid lane-based VP architectures, while also releasing on-chip real estate for other important design choices, the first part of this research proposes three architectural contexts for the implementation of a shared vector coprocessor in multicore processors. Sharing an expensive resource among multiple cores increases the efficiency of the functional units and the overall system throughput. The second part of the dissertation regards the evaluation and characterization of the three proposed shared vector architectures from the performance and power perspectives on an FPGA (Field-Programmable Gate Array) prototype. The third part of this work introduces performance and power estimation models based on observations deduced from the experimental results. The results show the opportunity to adaptively adjust the number of vector lanes assigned to individual cores or processing threads in order to minimize various energy-performance metrics on modern vector- capable multicore processors that run applications with dynamic workloads. Therefore, the fourth part of this research focuses on the development of a fine-to-coarse grain power management technique and a relevant adaptive hardware/software infrastructure which dynamically adjusts the assigned VP resources (number of vector lanes) in order to minimize the energy consumption for applications with dynamic workloads. In order to remove the inherent limitations imposed by FPGA technologies, the fifth part of this work consists of implementing an ASIC (Application Specific Integrated Circuit) version of the shared VP towards precise performance-energy studies involving high- performance vector processing in multicore environments
    corecore