315 research outputs found

    On the nature of progress

    Get PDF
    15th International Conference, OPODIS 2011, Toulouse, France, December 13-16, 2011. ProceedingsWe identify a simple relationship that unifies seemingly unrelated progress conditions ranging from the deadlock-free and starvation-free properties common to lock-based systems, to non-blocking conditions such as obstruction-freedom, lock-freedom, and wait-freedom. Properties can be classified along two dimensions based on the demands they make on the operating system scheduler. A gap in the classification reveals a new non-blocking progress condition, weaker than obstruction-freedom, which we call clash-freedom. The classification provides an intuitively-appealing explanation why programmers continue to devise data structures that mix both blocking and non-blocking progress conditions. It also explains why the wait-free property is a natural basis for the consensus hierarchy: a theory of shared-memory computation requires an independent progress condition, not one that makes demands of the operating system scheduler

    SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures

    Get PDF
    Near-Data-Processing (NDP) architectures present a promising way to alleviate data movement costs and can provide significant performance and energy benefits to parallel applications. Typically, NDP architectures support several NDP units, each including multiple simple cores placed close to memory. To fully leverage the benefits of NDP and achieve high performance for parallel workloads, efficient synchronization among the NDP cores of a system is necessary. However, supporting synchronization in many NDP systems is challenging because they lack shared caches and hardware cache coherence support, which are commonly used for synchronization in multicore systems, and communication across different NDP units can be expensive. This paper comprehensively examines the synchronization problem in NDP systems, and proposes SynCron, an end-to-end synchronization solution for NDP systems. SynCron adds low-cost hardware support near memory for synchronization acceleration, and avoids the need for hardware cache coherence support. SynCron has three components: 1) a specialized cache memory structure to avoid memory accesses for synchronization and minimize latency overheads, 2) a hierarchical message-passing communication protocol to minimize expensive communication across NDP units of the system, and 3) a hardware-only overflow management scheme to avoid performance degradation when hardware resources for synchronization tracking are exceeded. We evaluate SynCron using a variety of parallel workloads, covering various contention scenarios. SynCron improves performance by 1.27×\times on average (up to 1.78×\times) under high-contention scenarios, and by 1.35×\times on average (up to 2.29×\times) under low-contention real applications, compared to state-of-the-art approaches. SynCron reduces system energy consumption by 2.08×\times on average (up to 4.25×\times).Comment: To appear in the 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA-27

    Interaction of Mechatronic Modules in Distributed Technological Installations

    Get PDF
    The article deals with the interaction of mechatronic devices in real time through events and messages. The interaction of distributed network devices is necessary to coordinate their work, including synchronization when implementing a distributed algorithm. The approach in the development of a distributed control system (DCS) for mechatronic devices based on the IEC 61499 standard has been analyzed. Using only a LAN for interaction purposes is not always justified, since messages transmitted over a LAN do not provide transmission determinism. To eliminate this problem, a fast local network is needed, which would not utilize resources of the main computer and hardware (e.g., based on the model of terminal machines) to carry out a network communication. It is proposed to implement LAN controllers on the field-programmable gate array (FPGA) platform. Data-strobe coding (DS coding) with a signal level of LVDS was used for keeping the transmitted data intact and improving the overall reliability of the systems

    Parallel-FST: A feature selection library for multicore clusters

    Get PDF
    Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract]: Feature selection is a subfield of machine learning focused on reducing the dimensionality of datasets by performing a computationally intensive process. This work presents Parallel-FST, a publicly available parallel library for feature selection that includes seven methods which follow a hybrid MPI/multithreaded approach to reduce their runtime when executed on high performance computing systems. Performance tests were carried out on a 256-core cluster, where Parallel-FST obtained speedups of up to 229x for representative datasets and it was able to analyze a 512 GB dataset, which was not previously possible with a sequential counterpart library due to memory constraints.This research was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00/AEI/10.13039/ 501100011033), by the Ministry of Universities of Spain under grant FPU20/00997, and by Xunta de Galicia and FEDER funds of the EU (CITIC, Centro de Investigación de Galicia accreditation 2019-2022, ref. ED431G 2019/01; Consolidation Program of Competitive Reference Groups, ED431C 2021/30).Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3

    Asynchronous programming in the abstract behavioural specification language

    Get PDF
    Chip manufacturers are rapidly moving towards so-called manycore chips with thousands of independent processors on the same silicon real estate. Current programming languages can only leverage the potential power by inserting code with low level concurrency constructs, sacrificing clarity. Alternatively, a programming language can integrate a thread of execution with a stable notion of identity, e.g., in active objects.Abstract Behavioural Specification (ABS) is a language for designing executable models of parallel and distributed object-oriented systems based on active objects, and is defined in terms of a formal operational semantics which enables a variety of static and dynamic analysis techniques for the ABS models.The overall goal of this thesis is to extend the asynchronous programming model and the corresponding analysis techniques in ABS.Algorithms and the Foundations of Software technolog

    Compiling and optimizing spreadsheets for FPGA and multicore execution

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007."September 2007."Includes bibliographical references (p. 102-104).A major barrier to developing systems on multicore and FPGA chips is an easy-to-use development environment. This thesis presents the RhoZeta spreadsheet compiler and Catalyst optimization system for programming multiprocessors and FPGAs. Any spreadsheet frontend may be extended to work with RhoZeta's multiple interpreters and behavioral abstraction mechanisms. RhoZeta synchronizes a variety of cell interpreters acting on a global memory space. RhoZeta can also compile a group of cells to multithreaded C or Verilog. The result is an easy-to-use interface for programming multicore microprocessors and FPGAs. A spreadsheet environment presents parallelism and locality issues of modem hardware directly to the user and allows for a simple global memory synchronization model. Catalyst is a spreadsheet graph rewriting system based on performing behaviorally invariant guarded atomic actions while a system is being interpreted by RhoZeta. A number of optimization macros were developed to perform speculation, resource sharing and propagation of static assignments through a circuit. Parallelization of a 64-bit serial leading-zero-counter is demonstrated with Catalyst. Fault tolerance macros were also developed in Catalyst to protect against dynamic faults and to offset costs associated with testing semiconductors for static defects. A model for partitioning, placing and profiling spreadsheet execution in a heterogeneous hardware environment is also discussed. The RhoZeta system has been used to design several multithreaded and FPGA applications including a RISC emulator and a MIDI controlled modular synthesizer.by Amir Hirsch.M.Eng
    • …
    corecore