300 research outputs found

    TaskInsight: Understanding Task Schedules Effects on Memory and Performance

    Get PDF
    Recent scheduling heuristics for task-based applications have managed to improve their by taking into account memory-related properties such as data locality and cache sharing. However, there is still a general lack of tools that can provide insights into why, and where, different schedulers improve memory behavior, and how this is related to the applications' performance. To address this, we present TaskInsight, a technique to characterize the memory behavior of different task schedulers through the analysis of data reuse between tasks. TaskInsight provides high-level, quantitative information that can be correlated with tasks' performance variation over time to understand data reuse through the caches due to scheduling choices. TaskInsight is useful to diagnose and identify which scheduling decisions affected performance, when were they taken, and why the performance changed, both in single and multi-threaded executions. We demonstrate how TaskInsight can diagnose examples where poor scheduling caused over 10% difference in performance for tasks of the same type, due to changes in the tasks' data reuse through the private and shared caches, in single and multi-threaded executions of the same application. This flexible insight is key for optimization in many contexts, including data locality, throughput, memory footprint or even energy efficiency.We thank the reviewers for their feedback. This work was supported by the Swedish Research Council, the Swedish Foundation for Strategic Research project FFL12-0051 and carried out within the Linnaeus Centre of Excellence UPMARC, Uppsala Programming for Multicore Architectures Research Center. This paper was also published with the support of the HiPEAC network that received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 687698.Peer ReviewedPostprint (published version

    TaskPoint: sampled simulation of task-based programs

    Get PDF
    Sampled simulation is a mature technique for reducing simulation time of single-threaded programs, but it is not directly applicable to simulation of multi-threaded architectures. Recent multi-threaded sampling techniques assume that the workload assigned to each thread does not change across multiple executions of a program. This assumption does not hold for dynamically scheduled task-based programming models. Task-based programming models allow the programmer to specify program segments as tasks which are instantiated many times and scheduled dynamically to available threads. Due to system noise and variation in scheduling decisions, two consecutive executions on the same machine typically result in different instruction streams processed by each thread. In this paper, we propose TaskPoint, a sampled simulation technique for dynamically scheduled task-based programs. We leverage task instances as sampling units and simulate only a fraction of all task instances in detail. Between detailed simulation intervals we employ a novel fast-forward mechanism for dynamically scheduled programs. We evaluate the proposed technique on a set of 19 task-based parallel benchmarks and two different architectures. Compared to detailed simulation, TaskPoint accelerates architectural simulation with 64 simulated threads by an average factor of 19.1 at an average error of 1.8% and a maximum error of 15.0%.This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493, SEV-2011-00067), the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), the RoMoL ERC Advanced Grant (GA 321253), the European HiPEAC Network of Excellence and the Mont-Blanc project (EU-FP7-610402 and EU-H2020-671697). M. Moreto has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship JCI-2012-15047. M. Casas is supported by the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the EUFP7 (contract 2013BP B 00243). T.Grass has been partially supported by the AGAUR of the Generalitat de Catalunya (grant 2013FI B 0058).Peer ReviewedPostprint (author's final draft

    Simulation methodologies for future large-scale parallel systems

    Get PDF
    Since the early 2000s, computer systems have seen a transition from single-core to multi-core systems. While single-core systems included only one processor core on a chip, current multi-core processors include up to tens of cores on a single chip, a trend which is likely to continue in the future. Today, multi-core processors are ubiquitous. They are used in all classes of computing systems, ranging from low-cost mobile phones to high-end High-Performance Computing (HPC) systems. Designing future multi-core systems is a major challenge [12]. The primary design tool used by computer architects in academia and industry is architectural simulation. Simulating a computer system executing a program is typically several orders of magnitude slower than running the program on a real system. Therefore, new techniques are needed to speed up simulation and allow the exploration of large design spaces in a reasonable amount of time. One way of increasing simulation speed is sampling. Sampling reduces simulation time by simulating only a representative subset of a program in detail. In this thesis, we present a workload analysis of a set of task-based programs. We then use the insights from this study to propose TaskPoint, a sampled simulation methodology for task-based programs. Task-based programming models can reduce the synchronization costs of parallel programs on multi-core systems and are becoming increasingly important. Finally, we present MUSA, a simulation methodology for simulating applications running on thousands of cores on a hybrid, distributed shared-memory system. The simulation time required for simulation with MUSA is comparable to the time needed for native execution of the simulated program on a production HPC system. The techniques developed in the scope of this thesis permit researchers and engineers working in computer architecture to simulate large workloads, which were infeasible to simulate in the past. Our work enables architectural research in the fields of future large-scale shared-memory and hybrid, distributed shared-memory systems.Des dels principis dels anys 2000, els sistemes d'ordinadors han experimentat una transició de sistemes d'un sol nucli a sistemes de múltiples nuclis. Mentre els sistemes d'un sol nucli incloïen només un nucli en un xip, els sistemes actuals de múltiples nuclis n'inclouen desenes, una tendència que probablement continuarà en el futur. Avui en dia, els processadors de múltiples nuclis són omnipresents. Es fan servir en totes les classes de sistemes de computació, de telèfons mòbils de baix cost fins a sistemes de computació d'alt rendiment. Dissenyar els futurs sistemes de múltiples nuclis és un repte important. L'eina principal usada pels arquitectes de computadors, tant a l'acadèmia com a la indústria, és la simulació. Simular un ordinador executant un programa típicament és múltiples ordres de magnitud més lent que executar el mateix programa en un sistema real. Per tant, es necessiten noves tècniques per accelerar la simulació i permetre l'exploració de grans espais de disseny en un temps raonable. Una manera d'accelerar la velocitat de simulació és la simulació mostrejada. La simulació mostrejada redueix el temps de simulació simulant en detall només un subconjunt representatiu d¿un programa. En aquesta tesi es presenta una anàlisi de rendiment d'una col·lecció de programes basats en tasques. Com a resultat d'aquesta anàlisi, proposem TaskPoint, una metodologia de simulació mostrejada per programes basats en tasques. Els models de programació basats en tasques poden reduir els costos de sincronització de programes paral·lels executats en sistemes de múltiples nuclis i actualment estan guanyant importància. Finalment, presentem MUSA, una metodologia de simulació per simular aplicacions executant-se en milers de nuclis d'un sistema híbrid, que consisteix en nodes de memòria compartida que formen un sistema de memòria distribuïda. El temps que requereixen les simulacions amb MUSA és comparable amb el temps que triga l'execució nativa en un sistema d'alt rendiment en producció. Les tècniques desenvolupades al llarg d'aquesta tesi permeten simular execucions de programes que abans no eren viables, tant als investigadors com als enginyers que treballen en l'arquitectura de computadors. Per tant, aquest treball habilita futura recerca en el camp d'arquitectura de sistemes de memòria compartida o distribuïda, o bé de sistemes híbrids, a gran escala.A principios de los años 2000, los sistemas de ordenadores experimentaron una transición de sistemas con un núcleo a sistemas con múltiples núcleos. Mientras los sistemas single-core incluían un sólo núcleo, los sistemas multi-core incluyen decenas de núcleos en el mismo chip, una tendencia que probablemente continuará en el futuro. Hoy en día, los procesadores multi-core son omnipresentes. Se utilizan en todas las clases de sistemas de computación, de teléfonos móviles de bajo coste hasta sistemas de alto rendimiento. Diseñar sistemas multi-core del futuro es un reto importante. La herramienta principal usada por arquitectos de computadores, tanto en la academia como en la industria, es la simulación. Simular un computador ejecutando un programa típicamente es múltiples ordenes de magnitud más lento que ejecutar el mismo programa en un sistema real. Por ese motivo se necesitan nuevas técnicas para acelerar la simulación y permitir la exploración de grandes espacios de diseño dentro de un tiempo razonable. Una manera de aumentar la velocidad de simulación es la simulación muestreada. La simulación muestreada reduce el tiempo de simulación simulando en detalle sólo un subconjunto representativo de la ejecución entera de un programa. En esta tesis presentamos un análisis de rendimiento de una colección de programas basados en tareas. Como resultado de este análisis presentamos TaskPoint, una metodología de simulación muestreada para programas basados en tareas. Los modelos de programación basados en tareas pueden reducir los costes de sincronización de programas paralelos ejecutados en sistemas multi-core y actualmente están ganando importancia. Finalmente, presentamos MUSA, una metodología para simular aplicaciones ejecutadas en miles de núcleos de un sistema híbrido, compuesto de nodos de memoria compartida que forman un sistema de memoria distribuida. El tiempo de simulación que requieren las simulaciones con MUSA es comparable con el tiempo necesario para la ejecución del programa simulado en un sistema de alto rendimiento en producción. Las técnicas desarolladas al largo de esta tesis permiten a los investigadores e ingenieros trabajando en la arquitectura de computadores simular ejecuciones largas, que antes no se podían simular. Nuestro trabajo facilita nuevos caminos de investigación en los campos de sistemas de memoria compartida o distribuida y en sistemas híbridos

    Lethal Carbon Monoxide Poisoning in Wood Pellet Storerooms—Two Cases and a Review of the Literature

    Get PDF
    The installation of wood pellet heating as a cost-effective and climatically neutral source of energy for private households has increased steadily in recent years. We report two deaths that occurred within the space of about a year in wood pellet storerooms of private households in German-speaking countries and were investigated by forensic medical teams. This is the first report of fatalities in this special context as is shown in the literature review. Both victims died of carbon monoxide (CO) poisoning; one of the victims was a woman who was 4 months pregnant. Measurements at the scene detected life-threatening CO concentrations (7500 ppm, >500 ppm), which were not significantly reduced after ventilation of the storerooms as required by regulations. We carried out a series of experiments in order to confirm CO production by wood pellets. Thirty kilograms of freshly produced pellets from two different manufacturers were stored for 16 days in airtight containers at 26°C with different relative humidities. CO concentrations between 3100 and 4700 ppm were measured in all containers. There were no notable differences between the wood pellet products or storage at different humidities. Emission of CO from wood pellets has already been described, but fatal accidents have previously been reported only in association with pellet transport on cargo ships or storage in silos. It is therefore a new finding that fatal accidents may also occur in the wood pellet storerooms of private households. We show that significant CO concentrations can build up even when these rooms are ventilated in accordance with the regulations and that such levels may cause the death of healthy persons, as described in the following. As the safety recommendations from the wood pellet industry are inadequate, we consider that further fatal accidents are likely to occur and recommend urgent revision of the safety regulation

    Sampled simulation of task-based programs

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksSampled simulation is a mature technique for reducing simulation time of single-threaded programs. Nevertheless, current sampling techniques do not take advantage of other execution models, like task-based execution, to provide both more accurate and faster simulation. Recent multi-threaded sampling techniques assume that the workload assigned to each thread does not change across multiple executions of a program. This assumption does not hold for dynamically scheduled task-based programming models. Task-based programming models allow the programmer to specify program segments as tasks which are instantiated many times and scheduled dynamically to available threads. Due to variation in scheduling decisions, two consecutive executions on the same machine typically result in different instruction streams processed by each thread. In this paper, we propose TaskPoint, a sampled simulation technique for dynamically scheduled task-based programs. We leverage task instances as sampling units and simulate only a fraction of all task instances in detail. Between detailed simulation intervals, we employ a novel fast-forwarding mechanism for dynamically scheduled programs. We evaluate different automatic techniques for clustering task instances and show that DBSCAN clustering combined with analytical performance modeling provides the best trade-off of simulation speed and accuracy. TaskPoint is the first technique combining sampled simulation and analytical modeling and provides a new way to trade off simulation speed and accuracy. Compared to detailed simulation, TaskPoint accelerates architectural simulation with 8 simulated threads by an average factor of 220x at an average error of 0.5 percent and a maximum error of 7.9 percent.Peer ReviewedPostprint (author's final draft

    Environmental heterogeneity predicts global species richness patterns better than area

    Get PDF
    Aim It is widely accepted that biodiversity is influenced by both niche‐related and spatial processes from local to global scales. Their relative importance, however, is still disputed, and empirical tests are surprisingly scarce at the global scale. Here, we compare the importance of area (as a proxy for pure spatial processes) and environmental heterogeneity (as a proxy for niche‐related processes) for predicting native mammal species richness world‐wide and within biogeographical regions. Location Global. Time period We analyse a spatial snapshot of richness data collated by the International Union for Conservation of Nature. Major taxa studied All terrestrial mammal species, including possibly extinct species and species with uncertain presence. Methods We applied a spreading dye algorithm to analyse how native mammal species richness changes with area and environmental heterogeneity. As measures for environmental heterogeneity, we used elevation ranges and precipitation ranges, which are well‐known correlates of species richness. Results We found that environmental heterogeneity explained species richness relationships better than did area, suggesting that niche‐related processes are more prevalent than pure area effects at broad scales. Main conclusions Our results imply that niche‐related processes are essential to understand broad‐scale species–area relationships and that habitat diversity is more important than area alone for the protection of global biodiversity

    Aceleración de Time-Series sismográficas en Python

    Get PDF
    Python se ha convertido en un lenguaje de programación muy popular, pero también es uno de los menos eficientes en términos de prestaciones y consumo energético. Este artículo describe el proceso que hemos seguido para acelerar una aplicación Python de tratamiento masivo de datos orientada a las Time-Series sismográficas, de manera que al usuario final se le sigue ofreciendo la productiva interfaz Python que tanta aceptación tiene. Este proceso se ha desplegado siguiendo una estrategia en tres fases. En la primera fase se ha aplicado un cambio algorítmico cuyo objetivo ha sido reducir la complejidad computacional del principal kernel (hot-spot) del código: las correlaciones cruzadas. Para ello se ha optado por implementar dichas correlaciones aplicando el Teorema de la Convolución. En la segunda fase se ha aplicado un cambio de modelo de programación que ha consisitido en la implementación en C++ del kernel, lo que nos ha permitido la utilización de la muy optimizada biblioteca FFTW. En la tercera fase, gracias al cambio del modelo de programación, aplicamos optimizaciones conscientes de la arquitectura, entre ellas OpenMP, para aprovechar los nodos multicore de nuestro sistema, o ArrayFire que nos permite hacer uso de aceleradores gráficos (con soporte en CUDA y OpenCL). Tras este proceso de optimización hemos obtenido una aceleración de 6121x sobre la aplicación original de partida.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Evaluation of the Suitability of a Commercially Available ELISA Test as a Monitoring Tool for Estimating the Salmonella Prevalence of Commercial Swine Herds

    Get PDF
    In this study we evaluate the suitability of the Salmotype® (Labordiagnostik Leipzig) Enzyme Linked Immunosorbent Assay (ELISA) in swine. The study demonstrated no association between either individual or pen fecal culture and serologic status when examined by linear regression. Culture positive pigs had a tendency to be seropositive based on the individual fecal culture and only at pen and individual levels based on the pen fecal culture. Lowering the suggested cutoff of 40 % to 13 % gave an equal number of culture and seropositive individuals. Therefore further adaptation of Salmotype®to the US swine industry and further additional field studies need to be done

    Translational Studies Using the MALT1 Inhibitor (S)-Mepazine to Induce Treg Fragility and Potentiate Immune Checkpoint Therapy in Cancer

    Get PDF
    INTRODUCTION: Regulatory T cells (Tregs) play a critical role in the maintenance of immune homeostasis but also protect tumors from immune-mediated growth control or rejection and pose a significant barrier to effective immunotherapy. Inhibition of MALT1 paracaspase activity can selectively reprogram immune-suppressive Tregs in the tumor microenvironment to adopt a proinflammatory fragile state, which offers an opportunity to impede tumor growth and enhance the efficacy of immune checkpoint therapy (ICT). METHODS: We performed preclinical studies with the orally available allosteric MALT1 inhibitor (S)-mepazine as a single-agent and in combination with anti-programmed cell death protein 1 (PD-1) ICT to investigate its pharmacokinetic properties and antitumor effects in several murine tumor models as well as patient-derived organotypic tumor spheroids (PDOTS). RESULTS: (S)-mepazine demonstrated significant antitumor effects and was synergistic with anti-PD-1 therapy in vivo and ex vivo but did not affect circulating Treg frequencies in healthy rats at effective doses. Pharmacokinetic profiling revealed favorable drug accumulation in tumors to concentrations that effectively blocked MALT1 activity, potentially explaining preferential effects on tumor-infiltrating over systemic Tregs. CONCLUSIONS: The MALT1 inhibitor (S)-mepazine showed single-agent anticancer activity and presents a promising opportunity for combination with PD-1 pathway-targeted ICT. Activity in syngeneic tumor models and human PDOTS was likely mediated by induction of tumor-associated Treg fragility. This translational study supports ongoing clinical investigations (ClinicalTrials.gov Identifier: NCT04859777) of MPT-0118, (S)-mepazine succinate, in patients with advanced or metastatic treatment-refractory solid tumors

    Evaluation of cross-protection afforded by a Salmonella Choleraesuis vaccine against Salmonella infections in pigs under field conditions

    Get PDF
    This field study investigated the efficacy of a Salmonella Choleraesuis live vaccine (Argus SC™) to reduce the number of infections with Salmonella. Twelve groups of about 380 pigs each were randomly allocated to either vaccination (V) or no vaccination (C). The vaccine was applied orally at 3 and 16 weeks. Forty pigs per group were blood sampled at 3, 10, 16 and 24 weeks to detect possible antibodies against Salmonella. The prevalence of Salmonella in the lymph nodes as the major variable. In the V groups, only 0.6 % of the lymph nodes was positive, whereas 7.2 % was positive in the C groups (p \u3c 0.001). The percentage of seropositive pigs at 24 weeks (cut-off OD \u3e 10) was 26 % and 9 % in the V and C groups, respectively (p \u3c 0.00 I). The present study documented that vaccination with a live modified S. Choleraesuis vaccine is a useful tool to lower the prevalence of Salmonella in swine herds
    corecore