Search CORE

384,069 research outputs found

High performance low-energy buildings

Author: Cheung Chun K.
Luther Mark B.
Publication venue: Australian and New Zealand Solar Energy Society
Publication date: 01/01/2003
Field of study

The era of legislation and creditable methods towards producing sustainable buildings is upon us. Yet, a major barrier to achieving environmental responsive design is in the lack of available information at the programming or pre-design phases of a project. The review and evaluation of climate as well as energy-efficient strategies could be difficult to consider at these preliminary stages. Until recently, introducing energy simulation tools at the design stage has been difficult and perhaps next to impossible at a pre-design or programming stage. However, analysis of this sort is essential to ‘green building rating’ or performance assessment schemes such as LEED (Leadership in Energy and Environmental Design) and BREEAM (Building Research Establishment Environment Assessment Method). This paper discusses the implementation of a particular tool, ENERGY-10, where ‘basecase’ building defaults are compared to a low-energy case which has applied multiple energy-efficient strategies automatically. An annual hour-by-hour simulation provides a daylighting calculation with a subsequent thermal evaluation. Calculation results provide energy consumption, peak load equipment sizing, a RANK feature of the energy-efficient strategies, reporting of CO2, SO2 and NOx reduction, optimum glazing type as well as excellent graphic output. Consideration is given as to the approach of how such information can be introduced into the building project brief enforcing a low-energyperformance target.<br /

Deakin Research Online

Code Generation for Efficient Query Processing in Managed Runtimes

Author: Bierman Gavin M.
Nagel Fabian
Viglas Stratis D.
Publication venue
Publication date: 01/01/2014
Field of study

In this paper we examine opportunities arising from the conver-gence of two trends in data management: in-memory database sys-tems (IMDBs), which have received renewed attention following the availability of affordable, very large main memory systems; and language-integrated query, which transparently integrates database queries with programming languages (thus addressing the famous ‘impedance mismatch ’ problem). Language-integrated query not only gives application developers a more convenient way to query external data sources like IMDBs, but also to use the same querying language to query an application’s in-memory collections. The lat-ter offers further transparency to developers as the query language and all data is represented in the data model of the host program-ming language. However, compared to IMDBs, this additional free-dom comes at a higher cost for query evaluation. Our vision is to improve in-memory query processing of application objects by introducing database technologies to managed runtimes. We focus on querying and we leverage query compilation to im-prove query processing on application objects. We explore dif-ferent query compilation strategies and study how they improve the performance of query processing over application data. We take C] as the host programming language as it supports language-integrated query through the LINQ framework. Our techniques de-liver significant performance improvements over the default LINQ implementation. Our work makes important first steps towards a future where data processing applications will commonly run on machines that can store their entire datasets in-memory, and will be written in a single programming language employing language-integrated query and IMDB-inspired runtimes to provide transparent and highly efficient querying. 1

CiteSeerX

Crossref

Edinburgh Research Explorer

Unbalanced tree search on a manycore system using the GPI programming model

Author: Abreu Salvador
Lojewski Carsten
Machado Rui
Pfreundt Franz-Josef
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2011
Field of study

The recent developments in computer architectures progress towards systems with large core count (Manycore) which expose more parallelism to applications. Some applications named irregular and unbalanced applications demand a dynamic and asynchronous load balance implementation to utilize the full performance a Manycore system. For example, the recently established Graph500 benchmark aims at such applications. The UTS benchmark characterizes the performance of such irregular and unbalanced computations with a tree-structured search space that requires continuous dynamic load balancing. GPI is a PGAS API that delivers the full performance of RDMA-enabled networks directly to the application. Its programming model focuses the use of one-sided asynchronous communication, overlapping computation and communication. In this paper we address the dynamic load balancing requirements of unbalanced applications using the GPI programming model. Using the UTS benchmark, we detail the implementation of a work stealing algorithm using GPI and present the performance results. Our performance evaluation shows significant improvements when compared with the optimized MPI version with a maximum performance of 9.5 billion nodes per second on 3072 cores

Fraunhofer-ePrints

Repositório Científico da Universidade de Évora

Recommended from our members

Galois : a system for parallel execution of irregular algorithms

Author: Nguyen Donald Do
Publication venue
Publication date: 04/09/2015
Field of study

textA programming model which allows users to program with high productivity and which produces high performance executions has been a goal for decades. This dissertation makes progress towards this elusive goal by describing the design and implementation of the Galois system, a parallel programming model for shared-memory, multicore machines. Central to the design is the idea that scheduling of a program can be decoupled from the core computational operator and data structures. However, efficient programs often require application-specific scheduling to achieve best performance. To bridge this gap, an extensible and abstract scheduling policy language is proposed, which allows programmers to focus on selecting high-level scheduling policies while delegating the tedious task of implementing the policy to a scheduler synthesizer and runtime system. Implementations of deterministic and prioritized scheduling also are described. An evaluation of a well-studied benchmark suite reveals that factoring programs into operators, schedulers and data structures can produce significant performance improvements over unfactored approaches. Comparison of the Galois system with existing programming models for graph analytics shows significant performance improvements, often orders of magnitude more, due to (1) better support for the restrictive programming models of existing systems and (2) better support for more sophisticated algorithms and scheduling, which cannot be expressed in other systems.Computer Science

Texas ScholarWorks

Storage-heterogeneity aware task-based programming models to optimize I/O intensive applications

Author: Badia Sala Rosa Maria
Ejarque Artigas Jorge
Elshazly Hatem Mohamed Abdelfattah Eid
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2022
Field of study

Task-based programming models have enabled the optimized execution of the computation workloads of applications. These programming models can take advantage of large-scale distributed infrastructures by allowing the parallel and distributed execution of applications in high-level work components called tasks. Nevertheless, in the era of Big Data and Exascale, the amount of data produced by modern scientific applications has already surpassed terabytes and is rapidly increasing. Hence, I/O performance became the bottleneck to overcome in order to achieve more total performance improvement. New storage technologies offer higher bandwidth and faster solutions than traditional Parallel File Systems (PFS). Such storage devices are deployed in modern day infrastructures to boost I/O performance by offering a fast layer that absorbs the generated data. Therefore, it is necessary for any programming model targeting more performance to manage this heterogeneity and take advantage of it to improve the I/O performance of applications. Towards this goal, we propose in this paper a set of programming model capabilities that we refer to as Storage-Heterogeneity Awareness. Such capabilities include: (i) abstracting the heterogeneity of storage systems, and (ii) optimizing I/O performance by supporting dedicated I/O schedulers and an automatic data flushing technique. The evaluation section of this paper presents the performance results of different applications on the MareNostrum CTE-Power heterogeneous storage cluster. Our experiments demonstrate that a storage-heterogeneity aware programming model can achieve up to almost 5x I/O performance speedup and 48% total time improvement compared to the reference PFS-based usage of the execution infrastructure.This work is partially supported by the European Union through the Horizon 2020 research and innovation programme under contracts 721865 (EXPERTISE Project) by the Spanish Government (PID2019-107255GB) and the Generalitat de Catalunya (contract 2014-SGR-1051).Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Functional pearl: a SQL to C compiler in 500 lines of code

Author: Jones N. D.
Rompf T.
Stonebraker M.
Svenningsson J.
Zukowski M.
Publication venue: Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming
Publication date: 29/08/2015
Field of study

We present the design and implementation of a SQL query processor that outperforms existing database systems and is written in just about 500 lines of Scala code - a convincing case study that high-level functional programming can handily beat C for systems-level programming where the last drop of performance matters. The key enabler is a shift in perspective towards generative programming. The core of the query engine is an interpreter for relational algebra operations, written in Scala. Using the open-source LMS Framework (Lightweight Modular Staging), we turn this interpreter into a query compiler with very low effort. To do so, we capitalize on an old and widely known result from partial evaluation known as Futamura projections, which state that a program that can specialize an interpreter to any given input program is equivalent to a compiler. In this pearl, we discuss LMS programming patterns such as mixed-stage data structures (e.g. data records with static schema and dynamic field components) and techniques to generate low-level C code, including specialized data structures and data loading primitives

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Apollo (Cambridge)

HeTM: Transactional Memory for Heterogeneous Systems

Author: Castro Daniel
Ilic Aleksandar
Khan Amin M.
Romano Paolo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/09/2019
Field of study

Modern heterogeneous computing architectures, which couple multi-core CPUs with discrete many-core GPUs (or other specialized hardware accelerators), enable unprecedented peak performance and energy efficiency levels. Unfortunately, though, developing applications that can take full advantage of the potential of heterogeneous systems is a notoriously hard task. This work takes a step towards reducing the complexity of programming heterogeneous systems by introducing the abstraction of Heterogeneous Transactional Memory (HeTM). HeTM provides programmers with the illusion of a single memory region, shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with support for atomic transactions. Besides introducing the abstract semantics and programming model of HeTM, we present the design and evaluation of a concrete implementation of the proposed abstraction, which we named Speculative HeTM (SHeTM). SHeTM makes use of a novel design that leverages on speculative techniques and aims at hiding the inherently large communication latency between CPUs and discrete GPUs and at minimizing inter-device synchronization overhead. SHeTM is based on a modular and extensible design that allows for easily integrating alternative TM implementations on the CPU's and GPU's sides, which allows the flexibility to adopt, on either side, the TM implementation (e.g., in hardware or software) that best fits the applications' workload and the architectural characteristics of the processing unit. We demonstrate the efficiency of the SHeTM via an extensive quantitative study based both on synthetic benchmarks and on a porting of a popular object caching system.Comment: The current work was accepted in the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT'19

arXiv.org e-Print Archive

Crossref