5,052 research outputs found

    Personalised trails and learner profiling within e-learning environments

    Get PDF
    This deliverable focuses on personalisation and personalised trails. We begin by introducing and defining the concepts of personalisation and personalised trails. Personalisation requires that a user profile be stored, and so we assess currently available standard profile schemas and discuss the requirements for a profile to support personalised learning. We then review techniques for providing personalisation and some systems that implement these techniques, and discuss some of the issues around evaluating personalisation systems. We look especially at the use of learning and cognitive styles to support personalised learning, and also consider personalisation in the field of mobile learning, which has a slightly different take on the subject, and in commercially available systems, where personalisation support is found to currently be only at quite a low level. We conclude with a summary of the lessons to be learned from our review of personalisation and personalised trails

    Technical report and user guide: the 2010 EU kids online survey

    Get PDF
    This technical report describes the design and implementation of the EU Kids Online survey of 9-16 year old internet using children and their parents in 25 countries European countries

    LIKWID: Lightweight Performance Tools

    Full text link
    Exploiting the performance of today's microprocessors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in thread and cache topology. LIKWID is a set of command line utilities that addresses four key problems: Probing the thread and cache topology of a shared-memory node, enforcing thread-core affinity on a program, measuring performance counter metrics, and microbenchmarking for reliable upper performance bounds. Moreover, it includes a mpirun wrapper allowing for portable thread-core affinity in MPI and hybrid MPI/threaded applications. To demonstrate the capabilities of the tool set we show the influence of thread affinity on performance using the well-known OpenMP STREAM triad benchmark, use hardware counter tools to study the performance of a stencil code, and finally show how to detect bandwidth problems on ccNUMA-based compute nodes.Comment: 12 page

    Mira: A Framework for Static Performance Analysis

    Full text link
    The performance model of an application can pro- vide understanding about its runtime behavior on particular hardware. Such information can be analyzed by developers for performance tuning. However, model building and analyzing is frequently ignored during software development until perfor- mance problems arise because they require significant expertise and can involve many time-consuming application runs. In this paper, we propose a fast, accurate, flexible and user-friendly tool, Mira, for generating performance models by applying static program analysis, targeting scientific applications running on supercomputers. We parse both the source code and binary to estimate performance attributes with better accuracy than considering just source or just binary code. Because our analysis is static, the target program does not need to be executed on the target architecture, which enables users to perform analysis on available machines instead of conducting expensive exper- iments on potentially expensive resources. Moreover, statically generated models enable performance prediction on non-existent or unavailable architectures. In addition to flexibility, because model generation time is significantly reduced compared to dynamic analysis approaches, our method is suitable for rapid application performance analysis and improvement. We present several scientific application validation results to demonstrate the current capabilities of our approach on small benchmarks and a mini application

    Validation of hardware events for successful performance pattern identification in High Performance Computing

    Full text link
    Hardware performance monitoring (HPM) is a crucial ingredient of performance analysis tools. While there are interfaces like LIKWID, PAPI or the kernel interface perf\_event which provide HPM access with some additional features, many higher level tools combine event counts with results retrieved from other sources like function call traces to derive (semi-)automatic performance advice. However, although HPM is available for x86 systems since the early 90s, only a small subset of the HPM features is used in practice. Performance patterns provide a more comprehensive approach, enabling the identification of various performance-limiting effects. Patterns address issues like bandwidth saturation, load imbalance, non-local data access in ccNUMA systems, or false sharing of cache lines. This work defines HPM event sets that are best suited to identify a selection of performance patterns on the Intel Haswell processor. We validate the chosen event sets for accuracy in order to arrive at a reliable pattern detection mechanism and point out shortcomings that cannot be easily circumvented due to bugs or limitations in the hardware

    Integration of a parallel efficiency monitoring tool into an HPC production system

    Get PDF
    This thesis presents the design, implementation, and evaluation of an extension of a library called TALP for tracing useful computation and performance metrics in MPI-based applications (Message Passing Interface). The extension also integrates the tool with a web portal called UserPortal. They are both developed at the Barcelona Supercomputing Center (BSC). The library captures information about communication patterns and computation performed by MPI applications, and makes this information available to users. The extension developed in this project adds the functionality of reading PAPI (Performance Application Programming Interface) counters, allowing users to know the instructions per cycle of their application and identify bottlenecks in their code. UserPortal provides an easy-to-use interface for visualizing and analyzing the captured information and allows users to easily monitor the status of their jobs, such as memory usage, CPU usage, and their evolution over time. The integration of the library with the BSC system involves several stages of design and development, including a software wrapper, a modulefile, scripts for retrieving and processing data, and web development to display the data on the UserPortal. However, users must be educated in performance analysis in order to effectively make a good reading and interpretation of the reported metrics and optimize their codes. A public documentation has been developed as well as a reference on how to use these tools on BSC machines, along with links to educational resources on related topics. Overall, this work provides a valuable tool for developers and researchers working with MPI-based applications, making performance optimization more approachable and efficient.Aquesta tesi presenta el disseny, la implementació i l'avaluació d'una extensió d'una llibreria anomenada TALP per traçar el càlcul útil i les mètriques de rendiment en aplicacions basades en MPI (Message Programming Interface por les seves sigles en anglès). L'extensió també integra l'eina amb un portal web anomenat UserPortal. Tots dos són desenvolupats al Barcelona Supercomputing Center (BSC). La llibreria captura informació sobre patrons de comunicació i càlcul realitzat per aplicacions MPI i posa aquesta informació a disposició dels usuaris. L'extensió desenvolupada en aquest projecte afegeix la funcionalitat de llegir comptadors PAPI (Performance Application Programming Interface), permetent als usuaris saber les instruccions per cicle de la seva aplicació i identificar colls d'ampolla en el seu codi. UserPortal proporciona una interfície fàcil d'utilitzar per visualitzar i analitzar la informació capturada i permet als usuaris monitoritzar fàcilment l'estat dels seus treballs, com ara l'ús de la memòria, l'ús de la CPU i la seva evolució en el temps. La integració de la llibreria amb el sistema del BSC implica diverses etapes de disseny i desenvolupament, incloent un envolupador de programari, un modulefile, scripts per recuperar i processar dades i desenvolupament web per mostrar les dades en el UserPortal. No obstant, els usuaris han de ser educats en l'anàlisi de rendiment per tal de fer una lectura i interpretació efectiva de les mètriques reportades i ser capaços d'optimitzar els seus codis. S'ha desenvolupat una documentació pública així com una referència sobre com utilitzar aquestes eines en les màquines BSC, juntament amb enllaços a recursos educatius sobre temes relacionats. En general, aquest treball proporciona una eina valuosa per als desenvolupadors i investigadors que treballen amb aplicacions basades en MPI, fent que l'optimització del rendiment sigui més accessible i eficient

    A Comparison of Parallel Graph Processing Implementations

    Full text link
    The rapidly growing number of large network analysis problems has led to the emergence of many parallel and distributed graph processing systems---one survey in 2014 identified over 80. Since then, the landscape has evolved; some packages have become inactive while more are being developed. Determining the best approach for a given problem is infeasible for most developers. To enable easy, rigorous, and repeatable comparison of the capabilities of such systems, we present an approach and associated software for analyzing the performance and scalability of parallel, open-source graph libraries. We demonstrate our approach on five graph processing packages: GraphMat, the Graph500, the Graph Algorithm Platform Benchmark Suite, GraphBIG, and PowerGraph using synthetic and real-world datasets. We examine previously overlooked aspects of parallel graph processing performance such as phases of execution and energy usage for three algorithms: breadth first search, single source shortest paths, and PageRank and compare our results to Graphalytics.Comment: 10 pages, 10 figures, Submitted to EuroPar 2017 and rejected. Revised and submitted to IEEE Cluster 201

    Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi

    Get PDF
    Graph500 is a data intensive application for high performance computing and it is an increasingly important workload because graphs are a core part of most analytic applications. So far there is no work that examines if Graph500 is suitable for vectorization mostly due a lack of vector memory instructions for irregular memory accesses. The Xeon Phi is a massively parallel processor recently released by Intel with new features such as a wide 512-bit vector unit and vector scatter/gather instructions. Thus, the Xeon Phi allows for more efficient parallelization of Graph500 that is combined with vectorization. In this paper we vectorize Graph500 and analyze the impact of vectorization and prefetching on the Xeon Phi. We also show that the combination of parallelization, vectorization and prefetching yields a speedup of 27% over a parallel version with prefetching that does not leverage the vector capabilities of the Xeon Phi.The research leading to these results has received funding from the European Research Council under the European Unions 7th FP (FP/2007- 2013) / ERC GA n. 321253. It has been partially funded by the Spanish Government (TIN2012-34557)Peer ReviewedPostprint (published version

    Pulmonary Artery Pulsatility Index Predicts Mechanical Circulatory Support Following Heart Transplantation

    Get PDF
    The incidence of MCS for early graft dysfunction (EGD) following heart transplantation varies from 2.3% - 28.2%. Low pulmonary pulsatility index (PAPi) is associated with higher mortality in advanced heart failure and cardiogenic shock. We hypothesised that a lower pulmonary pulsatility index following heart transplantation is associated with MCS use for EGD. Methods Two-centre study of consecutive heart transplantation from May 2018 to December 2022. Haemodynamic parameters and Inotropic/Vasoconstrictor data were investigated on admission to intensive care unit (T0) and at six hours later (T6). Results Of the 173 patients included in this study, 24 had MCS for EGD. PAPi in the group that required MCS were lower at T0 (1.21(0.84) vs 1.67(1.23), p=0.001) and T6 (0.77(0.52) vs 1.44(0.82), p=<0.001). There was no significant difference in recipient characteristics, donor characteristics (donor age and sex matching) and operative factors (warm/cold ischaemic time, total ischaemic time, cardiopulmonary bypass time) between the two groups. On multiple variable regression, PAPi at T6 was associated with delayed MCS independent of total donor organ ischaemic time and short term MCS bridge to transplantation (OR 0.1 (0.036-0.276), p= <0.001). ROC analysis showed an AUC of 0.694 for T0 PAPi and 0.832 for T6 PAPi; a cut-off T6 PAPi of 1.22 had sensitivity and specificity of 81% and 65% respectively.Conclusions Lower PAPi at T6 (<1.22) is independently associated with MCS use for severe EGD post-heart transplantation
    corecore