25 research outputs found
Evaluating spintronics-compatible implementations of Ising machines
The commercial and industrial demand for the solution of hard combinatorial
optimization problems push forward the development of efficient solvers. One of
them is the Ising machine which can solve combinatorial problems mapped to
Ising Hamiltonians. In particular, spintronic hardware implementations of Ising
machines can be very efficient in terms of area and performance, and are
relatively low-cost considering the potential to create hybrid CMOS-spintronic
technology. Here, we perform a comparison of coherent and probabilistic
paradigms of Ising machines on several hard Max-Cut instances, analyzing their
scalability and performance at software level. We show that probabilistic Ising
machines outperform coherent Ising machines in terms of the number of
iterations required to achieve the problem s solution. Nevertheless, high
frequency spintronic oscillators with sub-nanosecond synchronization times
could be very promising as ultrafast Ising machines. In addition, considering
that a coherent Ising machine acts better for Max-Cut problems because of the
absence of the linear term in the Ising Hamiltonian, we introduce a procedure
to encode Max-3SAT to Max-Cut. We foresee potential synergic interplays between
the two paradigms.Comment: 26 pages, 6 Figures, submitted for publication in Phys. Rev. Applied
(it will be presented at intermag 2023 in Japan
Driving Big Data – Integration and Synchronization of Data Sources for Artificial Intelligence Applications with the Example of Truck Driver Work Stress and Strain Analysis
This paper contributes to the issue of big data analysis and data quality with the specific field of time synchronization. As a highly relevant use case, big data analysis of work stress and strain factors for driving professions is outlined. Drivers experience work stress and strain due to trends like traffic congestion, time pressure or worsening work conditions. Although a large professional group with 2.5 million (US) and 3.5 million (EU) truck drivers, scientific analysis of work stress and strain factors is scarce. Driver shortage is growing into a large-scale economic and societal challenge, especially for small businesses. Empirical investigations require big data approaches with sources like physiological and truck, traffic, weather, planning or accident data. For such challenges, accurate data is required, especially regarding time synchronization. Awareness among researchers and practitioners is key and first solution approaches are provided, connecting to many further Machine Learning and big data applications
Combination of IoT framework with liquid software
To mass-deploy and manage IoT applications, an IoT framework was developed by TUT. The capabilities of this framework have been expanded to include liquid functionalities. To limit the extra work an IoT programmer has to add to their IoT applications, the liquid functionalities were added to the application non-specific code rather than the application specific code. To limit power consumption, a polling technique was introduced to check for changes in the state of the applications. To limit the data communication, two ways were created to communicate state changes between applications. One uses a peer-to-peer topology to communicate and the other a master-slave topology. Synchronization collisions are also solved using timestamps.
A network of four IoT devices was used to test the speed of the liquid functionalities as well as the amount of communication between the devices when synchronized. It was found that cloning takes marginally longer than migrating or forking, that liquid transfer speeds are greatly influenced by the presence of a resources folder and that communication between devices works as predicted. To limit power consumption when initiating a liquid transfer, a new way to initiate a liquid transfer has been discussed. It migrates the power to the RR rather than the IoT device. Data communication can be limited by saving all synchronized applications on the device instead of using a syncID
Accelerating MPI collective communications through hierarchical algorithms with flexible inter-node communication and imbalance awareness
This work presents and evaluates algorithms for MPI collective communication operations on high performance systems. Collective communication algorithms are extensively investigated, and a universal algorithm to improve the performance of MPI collective operations on hierarchical clusters is introduced. This algorithm exploits shared-memory buffers for efficient intra-node communication while still allowing the use of unmodified, hierarchy-unaware traditional collectives for inter-node communication. The universal algorithm shows impressive performance results with a variety of collectives, improving upon the MPICH algorithms as well as the Cray MPT algorithms. Speedups average 15x - 30x for most collectives with improved scalability up to 65536 cores.^ Further novel improvements are also proposed for inter-node communication. By utilizing algorithms which take advantage of multiple senders from the same shared memory buffer, an additional speedup of 2.5x can be achieved. The discussion also evaluates special-purpose extensions to improve intra-node communication. These extensions return a shared memory or copy-on-write protected buffer from the collective, which reduces or completely eliminates the second phase of intra-node communication.^ The second part of this work improves the performance of MPI collective communication operations in the presence of imbalanced processes arrival times. High performance collective communications are crucial for the performance and scalability of applications, and imbalanced process arrival times are common in these applications. A micro-benchmark is used to investigate the nature of process imbalance with perfectly balanced workloads, and understand the nature of inter- versus intra-node imbalance. These insights are then used to develop imbalance tolerant reduction, broadcast, and alltoall algorithms, which minimize the synchronization delay observed by early arriving processes. These algorithms have been implemented and tested on a Cray XE6 using up to 32k cores with varying buffer sizes and levels of imbalance. Results show speedups over MPICH averaging 18.9x for reduce, 5.3x for broadcast, and 6.9x for alltoall in the presence of high, but not unreasonable, imbalance