10,828 research outputs found

    Source Code Classification for Energy Efficiency in Parallel Ultra Low-Power Microcontrollers

    Get PDF
    The analysis of source code through machine learning techniques is an increasingly explored research topic aiming at increasing smartness in the software toolchain to exploit modern architectures in the best possible way. In the case of low-power, parallel embedded architectures, this means finding the configuration, for instance in terms of the number of cores, leading to minimum energy consumption. Depending on the kernel to be executed, the energy optimal scaling configuration is not trivial. While recent work has focused on general-purpose systems to learn and predict the best execution target in terms of the execution time of a snippet of code or kernel (e.g. offload OpenCL kernel on multicore CPU or GPU), in this work we focus on static compile-time features to assess if they can be successfully used to predict the minimum energy configuration on PULP, an ultra-low-power architecture featuring an on-chip cluster of RISC-V processors. Experiments show that using machine learning models on the source code to select the best energy scaling configuration automatically is viable and has the potential to be used in the context of automatic system configuration for energy minimisation

    High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm

    Full text link
    We implement a master-slave parallel genetic algorithm (PGA) with a bespoke log-likelihood fitness function to identify emergent clusters within price evolutions. We use graphics processing units (GPUs) to implement a PGA and visualise the results using disjoint minimal spanning trees (MSTs). We demonstrate that our GPU PGA, implemented on a commercially available general purpose GPU, is able to recover stock clusters in sub-second speed, based on a subset of stocks in the South African market. This represents a pragmatic choice for low-cost, scalable parallel computing and is significantly faster than a prototype serial implementation in an optimised C-based fourth-generation programming language, although the results are not directly comparable due to compiler differences. Combined with fast online intraday correlation matrix estimation from high frequency data for cluster identification, the proposed implementation offers cost-effective, near-real-time risk assessment for financial practitioners.Comment: 10 pages, 5 figures, 4 tables, More thorough discussion of implementatio

    Density Functional Theory calculation on many-cores hybrid CPU-GPU architectures

    Get PDF
    The implementation of a full electronic structure calculation code on a hybrid parallel architecture with Graphic Processing Units (GPU) is presented. The code which is on the basis of our implementation is a GNU-GPL code based on Daubechies wavelets. It shows very good performances, systematic convergence properties and an excellent efficiency on parallel computers. Our GPU-based acceleration fully preserves all these properties. In particular, the code is able to run on many cores which may or may not have a GPU associated. It is thus able to run on parallel and massive parallel hybrid environment, also with a non-homogeneous ratio CPU/GPU. With double precision calculations, we may achieve considerable speedup, between a factor of 20 for some operations and a factor of 6 for the whole DFT code.Comment: 14 pages, 8 figure

    Governance and Water Management: Progress and Tools in Mediterranean Countries

    Get PDF
    This paper reviews the progress with respect to Integrated Water Resource Management (IWRM) in Mediterranean countries, as addressed within the activities of the Nostrum-Dss project, a Coordination Action funded by the 6th Framework Programme of the EC, with a particular emphasis on the current use of decision support tools (DSS). The IWRM paradigm is a comprehensive management framework, which integrates the different aspects of water resources – from the underlying ecological and physical aspects, to the socio-economic values and needs (horizontal integration); and calls for increasing decentralisation and privatisation of water services (vertical integration), and the devolution of planning authority, without however forgetting the need to ensure equitable access to water resources. Substantial progress has been made in the last decades in Nostrum-Dss Partner countries, although a disparity can still be seen between the Northern and Southern banks. New institutions have been established for implementing IWRM, existing institutions have been reformed, and decision making processes increasingly require public participation. Decentralisation of decision making, implementation and monitoring are also well underway, although improvements are still needed to ensure that the traditional power structures do not prevail. More efficient technologies and infrastructures are in place, especially for the production of high value goods or in agriculture. Finally, several DSS have been developed: yet, while operational/technical DSS instruments have been successfully employed, DSSs tools developed in a participatory way, or tackling more complex, political as well as environmental and economic problems are still de-linked from actual decision making processes. Laws and regulations for water management in most Mediterranean countries embrace and support the paradigms of IWRM – and EU framework directives have played an important role in fostering this shift from more traditional, vertical governance to new, horizontal governance based on soft laws. Yet, the implementation of such laws and regulations is often only partial – often because of the lack of a clear monitoring and enforcement strategy, but also because of governments’ financial and human resources constraints. Strong overlaps of roles and competences among different government institutions remain, hampering effective implementation of water management. The tendency to centralisation of decision making persists, and actors’ involvement is scanty. The shift towards the use of demand side policies as opposed to supply side policies is not yet completed: yet, supply side policies are very costly, as they are based on greater mobilisation of financial resources. Full cost recovery pricing is not practiced widely. This reluctance to introduce full cost recovery pricing in developing countries may be due to ethical and moral considerations, but in developed countries it is often associated with strong lobbying power of interest groups. This study was supported by funding under the Sixth Research Framework of the European Union within the project "Network on Governance, Science and Technology for Sustainable Water Resource Management in the Mediterranean- The role of Dss tools” (NOSTRUM-Dss, contract number INCO-CT-2004-509158).Integrated Water Resources Management, Decision Support Systems, Environmental Governance

    The role and effectiveness of e-learning: key issues in an industrial context

    Get PDF
    This paper identifies the current role and effectiveness of e-learning and its key issues in an industrial context. The first objective is to identify the role of e-learning, particularly in staff training and executive education, where e-learning (online, computer-based or videoconferencing learning) has made significant impacts and contributions to several organisations such as the Royal Bank of Scotland, Cisco and Cap Gemini Earnst Young. With e-learning, staff training and executive education provides more benefits and better efficiency than traditional means. The second objective of this research is to understand the effectiveness of e-learning. This can be classified into two key issues: (1) methods of e-learning implementations; and (2) factors influencing effective and ineffective e-learning implementations. One learning point from (1) is that centralized e-learning implementations may prevail for big organizations. How-ever, more organizations adopt decentralized e-learning implementations due to various reasons, which will be discussed in this paper. From the research results, a proposed way is to retain the decentralized way. The second learning point is about interactive learning (IL), the combination of both e-learning and face-to-face learning. IL has been making contributions to several organizations, including the increase in motivation, learning interests and also efficiency. The popular issues about IL are (a) how to minimize the disadvantages of IL and (b) the degree of interactivity for maximizing learning efficiency. One learning point from (2) is to analyze the factors influencing effective and ineffective implementations, which reflect the different focuses between industrialists and academics. In terms of effective e-learning implementations, factors identified by both groups can map to particular cases in industry. In contrast, factors causing ineffective implementations rely more on primary source data. In order to find out these factors and analyze the rationale behind, case studies and interviews were used as research methodology that matched the objective of the research

    Graphics Processing Units: Abstract Modelling and Applications in Bioinformatics

    Get PDF
    The Graphical Processing Unit is a specialised piece of hardware that contains many low powered cores, available on both the consumer and industrial market. The original Graphical Processing Units were designed for processing high quality graphical images, for presentation to the screen, and were therefore marketed to the computer games market segment. More recently, frameworks such as CUDA and OpenCL allowed the specialised highly parallel architecture of the Graphical Processing Unit to be used for not just graphical operations, but for general computation. This is known as General Purpose Programming on Graphical Processing Units, and it has attracted interest from the scientific community, looking for ways to exploit this highly parallel environment, which was cheaper and more accessible than the traditional High Performance Computing platforms, such as the supercomputer. This interest in developing algorithms that exploit the parallel architecture of the Graphical Processing Unit has highlighted the need for scientists to be able to analyse proposed algorithms, just as happens for proposed sequential algorithms. In this thesis, we study the abstract modelling of computation on the Graphical Processing Unit, and the application of Graphical Processing Unit-based algorithms in the field of bioinformatics, the field of using computational algorithms to solve biological problems. We show that existing abstract models for analysing parallel algorithms on the Graphical Processing Unit are not able to sufficiently and accurately model all that is required. We propose a new abstract model, called the Abstract Transferring Graphical Processing Unit Model, which is able to provide analysis of Graphical Processing Unit-based algorithms that is more accurate than existing abstract models. It does this by capturing the data transfer between the Central Processing Unit and the Graphical Processing Unit. We demonstrate the accuracy and applicability of our model with several computational problems, showing that our model provides greater accuracy than the existing models, verifying these claims using experiments. We also contribute novel Graphics Processing Unit-base solutions to two bioinformatics problems: DNA sequence alignment, and Protein spectral identification, demonstrating promising levels of improvement against the sequential Central Processing Unit experiments

    Lock Inference for Java

    No full text
    Atomicity is an important property for concurrent software, as it provides a stronger guarantee against errors caused by unanticipated thread interactions than race-freedom does. However, concurrency control in general is tricky to get right because current techniques are too low-level and error-prone. With the introduction of multicore processors, the problems are compounded. Consequently, a new software abstraction is gaining popularity to take care of concurrency control and the enforcing of atomicity properties, called atomic sections. One possible implementation of their semantics is to acquire a global lock upon entry to each atomic section, ensuring that they execute in mutual exclusion. However, this cripples concurrency, as non-interfering atomic sections cannot run in parallel. Transactional memory is another automated technique for providing atomicity, but relies on the ability to rollback conflicting atomic sections and thus places restrictions on the use of irreversible operations, such as I/O and system calls, or serialises all sections that use such features. Therefore, from a language designer's point of view, the challenge is to implement atomic sections without compromising performance or expressivity. This thesis explores the technique of lock inference, which infers a set of locks for each atomic section, while attempting to balance the requirements of maximal concurrency, minimal locking overhead and freedom from deadlock. We focus on lock-inference techniques for tackling large Java programs that make use of mature libraries. This improves upon existing work, which either (i) ignores libraries, (ii) requires library implementors to annotate which locks to take, or (iii) only considers accesses performed up to one-level deep in library call chains. As a result, each of these prior approaches may result in atomicity violations. This is a problem because even simple uses of I/O in Java programs can involve large amounts of library code. Our approach is the first to analyse library methods in full and thus able to soundly handle atomic sections involving complicated real-world side effects, while still permitting atomic sections to run concurrently in cases where their lock sets are disjoint. To validate our claims, we have implemented our techniques in Lockguard, a fully automatic tool that translates Java bytecode containing atomic sections to an equivalent program that uses locks instead. We show that our techniques scale well and despite protecting all library accesses, we obtain performance comparable to the original locking policy of our benchmarks

    All-Pairs Shortest Path Algorithms Using CUDA

    Get PDF
    Utilising graph theory is a common activity in computer science. Algorithms that perform computations on large graphs are not always cost effective, requiring supercomputers to achieve results in a practical amount of time. Graphics Processing Units provide a cost effective alternative to supercomputers, allowing parallel algorithms to be executed directly on the Graphics Processing Unit. Several algorithms exist to solve the All-Pairs Shortest Path problem on the Graphics Processing Unit, but it can be difficult to determine whether the claims made are true and verify the results listed. This research asks "Which All-Pairs Shortest Path algorithms solve the All-Pairs Shortest Path problem the fastest, and can the authors' claims be verified?" The results we obtain when answering this question show why it is important to be able to collate existing work, and analyse them on a common platform to observe fair results retrieved from a single system. In this way, the research shows us how effective each algorithm is at performing its task, and suggest when a certain algorithm might be used over another
    • …
    corecore