117 research outputs found
Hard real-time performances in multiprocessor-embedded systems using ASMP-Linux
Multiprocessor systems, especially those based on multicore or multithreaded processors, and new operating system architectures can satisfy the ever increasing computational requirements of embedded systems.ASMP-LINUX is a modified, high responsiveness, open-source hard real-time operating system for multiprocessorsystems capable of providing high real-time performance while maintaining the code simple and not impacting on theperformances of the rest of the system. Moreover, ASMP-LINUX does not require code changing or application recompiling/relinking.In order to assess the performances of ASMP-LINUX, benchmarks have been performed on several hardware platformsand configurations
Towards Work-Efficient Parallel Parameterized Algorithms
Parallel parameterized complexity theory studies how fixed-parameter
tractable (fpt) problems can be solved in parallel. Previous theoretical work
focused on parallel algorithms that are very fast in principle, but did not
take into account that when we only have a small number of processors (between
2 and, say, 1024), it is more important that the parallel algorithms are
work-efficient. In the present paper we investigate how work-efficient fpt
algorithms can be designed. We review standard methods from fpt theory, like
kernelization, search trees, and interleaving, and prove trade-offs for them
between work efficiency and runtime improvements. This results in a toolbox for
developing work-efficient parallel fpt algorithms.Comment: Prior full version of the paper that will appear in Proceedings of
the 13th International Conference and Workshops on Algorithms and Computation
(WALCOM 2019), February 27 - March 02, 2019, Guwahati, India. The final
authenticated version is available online at
https://doi.org/10.1007/978-3-030-10564-8_2
Portable, scalable, per-core power estimation for intelligent resource management
Performance, power, and temperature are now all first-order design constraints. Balancing power efficiency, thermal constraints, and performance requires some means to convey data about real-time power consumption and temperature to intelligent resource managers. Resource managers can use this information to meet performance goals, maintain power budgets, and obey thermal constraints. Unfortunately, obtaining the required machine introspection is challenging. Most current chips provide no support for per-core power monitoring, and when support exists, it is not exposed to software. We present a methodology for deriving per-core power models using sampled performance counter values and temperature sensor readings. We develop application-independent models for four different (four- to eight-core) platforms, validate their accuracy, and show how they can be used to guide scheduling decisions in power-aware resource managers. Model overhead is negligible, and estimations exhibit 1.1%-5.2% per-suite median error on the NAS, SPEC OMP, and SPEC 2006 benchmarks (and 1.2%-4.4% overall)
Independent Sets in Restricted Line of Sight Networks
Line of Sight (LoS) networks were designed to model wireless networks in settings which may contain obstacles restricting visibility of sensors. A grap
Sparse parameterized problems
Sparse languages play an important role in classical structural complexity theory. In this paper we introduce a natural definition of sparse problems for parameterized complexity theory. We prove an analog of Mahaney's theorem: there is no sparse parameterized problem which is hard for the tth level of the W hierarchy, unless the W hierarchy itself collapses up to level t. The main result is proved for the most general form of parametric many:1 reducibility, where the parameter functions are not assumed to be recursive. This provides one of the few instances in parameterized complexity theory of a full analog of a major classical theorem. The proof involves not only the standard technique of left sets, but also substantial circuit combinatorics to deal with the problem of small weft, and a diagonalization to cope with potentially nonrecursive parameter functions. The latter techniques are potentially of interest for further explorations of parameterized complexity analogs of classical structural results
Real-time cache management framework for multi-core architectures
Multi-core architectures are shaking the fundamental assumption that in real-time systems the WCET, used to compute the schedulability of the complete system, is calculated on individual tasks. This is not even true in an approximate sense in a modern multi-core chip, due to interference caused by hardware resource sharing. In this work we propose a complete framework to (1) analyze and profile task memory access patterns and (2) a novel kernel-level cache management technique to enforce a deterministic cache allocation of the most frequently accessed memory areas. In this way, we provide a powerful tool to address the main sources of interference in a system where the last level of cache is shared among two or more CPUs. The technique has been implemented on commercial hardware and our evaluations show that it can be used to significantly improve the predictability of a given set of critical tasks
A global operating system for HPC clusters
Modern supercomputers consist of clusters of thousands of independent nodes interconnected through fast networks. These nodes run independent operating system kernels, thus synchronization among them is demanded for user mode programs. This means that temporal synchronization of the nodes is a daunting task. On the other hand, HPC cluster applications often require a rather strict temporal synchronization for activities like performance analysis, application debugging, or data checkpointing. Therefore, the performance of an HPC parallel application may be severely impaired by the lack of temporal synchronization among the activities of the nodes of the cluster; this poses a severe limit on the scalability of such architectures. In this paper we introduce CAOS, an extension of the Linux kernel that aims to address the temporal synchronization problems of modern HPC clusters. We describe the general ideas behind CAOS, and we discuss some details of a possible implementation. We also illustrate some experiments performed on a prototype implementation of CAOS including a centralized network time tick, which allows a master node to synchronize the activities of all other nodes in the cluster, and a specific task scheduler tailored for HPC applications. These experiments, performed on a modern HPC cluster, witness that this new component has no measurable impact on the efficiency of the nodes while reducing the OS noise and providing better performance prediction. An implementation of CAOS based on this component can achieve a significant gain in terms of synchronization, global control, and scalability of the cluster
- …