448 research outputs found

    Occurrence, Persistence, and Expansion of Saltcedar (Tamarix Spp.) Populations in the Great Plains of Montana

    Get PDF
    Saltcedar (Tamarix spp.), a shrub native to Eurasia, is associated with major alterations to wetland and riparian systems in the southwestern United States. Since the 1960s saltcedar has been naturalized in northern states of the U.S. where its growth potential and impacts are not well known. Here, we describe the occurrence, age, size, and relative cover of saltcedar populations in several river basins in central eastern Montana, USA, to identify potential patterns of spread across the region and changes in individual populations as they age. Stands were aged according to the oldest saltcedar individuals and were sampled for dominant plant cover and soil properties. Multiple introductions appear to have occurred in Montana, with the oldest stands occurring on the Bighorn River in southern Montana. Saltcedar absolute and relative cover and stand area increased significantly with stand age, while native tree and shrub relative cover remained low across all stand ages. These results suggest that saltcedar stands establish where woody natives are not abundant and that they persist and expand over time. Although soil salinity remained constant, soil pH decreased with saltcedar stand age, indicating a possible effect of organic matter inputs. An analysis of annual wood increment of saltcedar and sandbar willow (a native with analogous growth form) stems along a latitudinal gradient showed that stern growth of both species did not differ significantly among regions. Stem growth decreased inversely with elevation for both species while growth responses to elevation did not differ between species. Our results show an increase in number of populations and continued viability of these populations. Mechanisms of saltcedar increases in this region are yet to be determined. Anthropogenic influences, such as saltcedar plantings, watershed alterations (e.g., river flow control), and habitat disturbances (e.g., cattle grazing or habitat clearing) may facilitate its spread in similar climates of the Great Plains

    TACL: Interoperating asynchronous device APIs with task-based programming models

    Get PDF
    Heterogeneous architectures have become commonplace in modern HPC systems. Eight of the world’s top ten supercomputers have accelerators, and the up-and-coming MareNostrum 5 will feature accelerated partitions. However, programming these heterogeneous systems is difficult, as users have to insert data transfer operations, kernel launches and synchronizations manually from the host system to its accelerators. This is even more challenging in distributed heterogeneous systems, as programmers have to coordinate the previous activities with internode communications between hosts. This work presents the Task-Aware Ascend Computing Language (TACL), which interoperates with the OmpSs-2 programming model and greatly simplifies kernel execution, data transfers and synchronizations between host and accelerators by naturally leveraging the dataflow execution model of OmpSs-2

    Introducing the Task-Aware Storage I/O (TASIO) Library

    Get PDF
    Task-based programming models are excellent tools to parallelize and seamlessly load balance an application workload. However, the integration of I/O intensive applications and task-based programming models is lacking. Typically, I/O operations stall the requesting thread until the data is serviced by the backing device. Because the core where the thread was running becomes idle, it should be possible to overlap the data query operation with either computation workloads or even more I/O operations. Nonetheless, overlapping I/O tasks with other tasks entails an extra degree of complexity currently not managed by programming models’ runtimes. In this work, we focus on integrating storage I/O into the tasking model by introducing the Task-Aware Storage I/O (TASIO) library. We test TASIO extensively with a custom benchmark for a number of configurations and conclude that it is able to achieve speedups up to 2x depending on the workload, although it might lead to slowdowns if not used with the right settings.This project is supported by the European Union's Horizon 2021 research and innovation programme under the grant agreement No 754304 (DEEP-EST), the Ministry of Economy of Spain through the Severo Ochoa Center of Excellence Program (SEV-2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P) and by the Generalitat de Catalunya (2017-SGR- 1481). Also, the authors would like to acknowledge that the test environment (Cobi) was ceded by Intel Corporation in the frame of the BSC - Intel collabo- ration.Peer ReviewedPostprint (author's final draft

    Porting the OmpSs programming model to the Argobots runtime system

    Get PDF
    En aquest projecte es presenta un nou runtime que suporta el model de programació paral·lela d'OmpSs i que s'implementa amb la llibreria d'Argobots. Es dissenya i implementa l'estructura del runtime, un sistema de dependències puntuals i de regió contigua i la interoperabilitat amb MPI.The main objective of this project is to develop a new runtime which supports the OmpSs programming model implemented with the Argobots runtime library. It designs and implements the basic structure of the runtime, the plain and linear region dependencies, a new scheduler, the interoperability wit

    Enhancing the interoperability between distributed-memory and task-based programming models

    Get PDF
    Hybrid applications allow to exploit both inter- and intra-node parallelism, however the programming models currently used are not designed to be combined. For this reason, we propose a generic mechanism to enhance the interoperability between distributed-memory and task-based programming models

    Measuring traffic lane-changing by converting video into space–time still images

    Get PDF
    Empirical data is needed in order to extend our knowledge of traffic behavior. Video recordings are used to enrich typical data from loop detectors. In this context, data extraction from videos becomes a challenging task. Setting automatic video processing systems is costly, complex, and the accuracy achieved is usually not enough to improve traffic flow models. In contrast “visual” data extraction by watching the recordings requires extensive human intervention. A semiautomatic video processing methodology to count lane-changing in freeways is proposed. The method allows counting lane changes faster than with the visual procedure without falling into the complexities and errors of full automation. The method is based on converting the video into a set of space–time still images, from where to visually count. This methodology has been tested at several freeway locations near Barcelona (Spain) with good results. A user-friendly implementation of the method is available on http://bit.ly/2yUi08M.Peer ReviewedPostprint (published version

    Worksharing tasks: An efficient way to exploit irregular and fine-grained loop parallelism

    Get PDF
    Shared memory programming models usually provide worksharing and task constructs. The former relies on the efficient fork-join execution model to exploit structured parallelism; while the latter relies on fine-grained synchronization among tasks and a flexible data-flow execution model to exploit dynamic, irregular, and nested parallelism. On applications that show both structured and unstructured parallelism, both worksharing and task constructs can be combined. However, it is difficult to mix both execution models without penalizing the data-flow execution model. Hence, on many applications structured parallelism is also exploited using tasks to leverage the full benefits of a pure data-flow execution model. However, task creation and management might introduce a non-negligible overhead that prevents the efficient exploitation of fine-grained structured parallelism, especially on many-core processors. In this work, we propose worksharing tasks. These are tasks that internally leverage worksharing techniques to exploit fine-grained structured loop-based parallelism. The evaluation shows promising results on several benchmarks and platforms.This work is supported by the Spanish Ministerio de Ciencia, Innovacion y Universidades (TIN2015-65316-P), by the Generalitat de Catalunya (2014-SGR-1051) and by the European Union’s Seventh Framework Programme (FP7/2007-2013) and the H2020 funding framework under grant agreement no. H2020-FETHPC-754304 (DEEP-EST).Peer ReviewedPostprint (author's final draft

    Advanced synchronization techniques for task-based runtime systems

    Get PDF
    Task-based programming models like OmpSs-2 and OpenMP provide a flexible data-flow execution model to exploit dynamic, irregular and nested parallelism. Providing an efficient implementation that scales well with small granularity tasks remains a challenge, and bottlenecks can manifest in several runtime components. In this paper, we analyze the limiting factors in the scalability of a task-based runtime system and propose individual solutions for each of the challenges, including a wait-free dependency system and a novel scalable scheduler design based on delegation. We evaluate how the optimizations impact the overall performance of the runtime, both individually and in combination. We also compare the resulting runtime against state of the art OpenMP implementations, showing equivalent or better performance, especially for fine-grained tasks.This project is supported by the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No.s 754304 (DEEP-EST), by the Spanish Ministry of Science and Innovation (contract PID2019-107255GB and TIN2015-65316P) and by the Generalitat de Catalunya (2017-SGR-1414).Peer ReviewedPostprint (author's final draft

    On the adequacy of lightweight thread approaches for high-level parallel programming models

    Get PDF
    High-level parallel programming models (PMs) are becoming crucial in order to extract the computational power of current on-node multi-threaded parallelism. The most popular PMs, such as OpenMP or OmpSs, are directive-based: the complexity of the hardware is hidden by the underlying runtime system, improving coding productivity. The implementations of OpenMP usually rely on POSIX threads (pthreads), offering excellent performance for coarse-grained parallelism and a perfect match with the current hardware. OmpSs is a task oriented PM based on an ad hoc runtime solution called Nanos++; it is the precursor of the tasking parallelism in the OpenMP tasking specification. A recent trend in runtimes and applications points to leveraging massive on-node parallelism in conjunction with fine-grained and dynamic scheduling paradigms. In this paper we analyze the behavior of the OpenMP and OmpSs PMs on top of the recently emerged Generic Lightweight Threads (GLT) API. GLT exposes a common API for lightweight thread (LWT) libraries that offers the possibility of running the same application over different native LWT solutions. We describe the design details of those high-level PMs implemented on top of GLT and analyze different scenarios in order to assess where the use of LWTs may benefit application performance. Our work reveals those scenarios where LWTs overperform pthread-based solutions and compares the performance between an ad hoc solution and a generic implementation.The researchers from the Universitat Jaume I de Castelló were supported by project TIN2014-53495-R of the MINECO, Spain and FEDER, Spain, the Generalitat Valenciana fellowship programme, Spain Vali+d 2015. Antonio J. Peña is cofinanced by the Spanish Ministry of Economy and Competitiveness, Spain under Juan de la Cierva fellowship number IJCI-2015-23266. This work was partially supported by the U.S. Dept. of Energy, Office of Science, Office of Advanced Scientific Computing Research (SC-21), under contract DE-AC02-06CH11357. We gratefully acknowledge Enrique S. Quintana-Ortí (Universitat Jaume I) and Sangmin Seo (Samsung Corp.) for their advice in this work and the computing resources provided and operated by the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory.Peer ReviewedPostprint (author's final draft
    corecore