In the world of supercomputers, the large number of processors requires to minimize the inefficiencies of parallelization, which appear as a sequential part of the program from the point of view of Amdahl's law. The recently suggested new figure of merit is applied to the recently presented supercomputer, and the timeline of "Top 500" supercomputers is scrutinized using the metric. It is demonstrated, that in addition to the computing performance and power consumption, the new supercomputer is also excellent in the efficiency of parallelization. Based on the suggested merit, a "Moorelaw" like observation is derived for the timeline of parallelization efficacy of supercomputers.
Introduction
Supercomputers are ranked (TOP500.org (2016)) according to their parameter "Rmax (TFlop/s)", which parameter depends of two factors: how many processors are comprised and how effectively they are put together. Increasing the number of processors only is useless, as pointed out early by Amdahl: (Amdahl, G. M. (1967) ) the effort expended on achieving high parallel processing rates is wasted unless it is accompanied by achievements in sequential processing rates of very nearly the same magnitude".
Most of the users of supercomputers are not using all available processors, they are rather interested in the efficiency of parallelization of their program.
To find a proper merit was always subject of serious debates (see Sun and Gustafson (1991) ). It looks like the recently introduced figure of merit (see Végh et al. (2016) ), the effective parallelization, is a good merit not only to characterize the effectivity of parallelizing software execution, but also to characterize the engineering ingenuity of parallelizing the hardware operation, and so allows to characterize the timeline of supercomputer development itself.
THE MERIT α EF F
According to Amdahl (Amdahl, G. M. (1967) ), the speedup can be expressed as
where k is the number of parallelized processors, α is the ratio of the parallelizable part to the total sequential part, S is the measurable speedup. The same relation can be expressed (see Végh et al. (2016) ) also in the form
The first form is an architectural view, the second one is empirical: no matter, what causes the (apparently) sequential part, (1 − α) part decreases the parallelism, and so can be used to quantitize the goodness of the implementation of parallelisation.
In general, the efficiency (in the case of supercomputers:
) is used, which cannot be used as a single parameter to describe the efficacy of the implementation. When using several processors, one of them makes the sequential calculation, the others are waiting (use the same amount of time). So, when calculating the speedup, one calculates
hence the efficiency
This explains the behavior of diagram S k in function of k: the more processors, the lower efficiency, and the larger (1 − α), the lower is the reachable speedup. is a linear function of the number of the processors, and its slope equals to (1 − α), i.e. from the speedup data one can estimate value of α even for the individual regions, i.e. without knowing the execution time on 1 processor (from technical reasons, it is the usual case in the case of supercomputers).
Notice also that through using Equ. (4), S k can be equally good for describing the efficiency of parellelization efficiency of a setup, if the number of processors is also known. From Equ. (4) Karp and Flatt (1990) This quantity of course assumes that α is independent from the number of the processors. Its numerical value equals to the value calculated using differences over the full range of processors, and so is not displayed in Fig. 1 . The supercomputer technology, according to the need mentioned above, is focussing on decreasing the (apparently) sequential part (1 − α), so this quantity is shown on the diagrams rather than α itself.
Characterizing effect of communication method in SOC
As mentioned, in the Amdahl's model there are only two categories: everything which does not make useful computational work, but needs time, contributes to the sequential part. Such contribution is the internal communication between cores inside a chip. In their work de Macedo Mourelle et al. ) steadily raises with increasing the processor numbers; in the region typical for supercomputers, is not usable any more.
Characterizing supercomputer architecture
In supercomputers, the "sequential part" is technically of different origin, but has the same effect on (1 − α ef f ). The recent chinese supercomputer (Fu et al. (2016) ) provided also performance data, from which diagrams on Fig 2 were derived. Compare these values (and consider the different scales!) to the former supercomputer data (Karp and Flatt (1990) ) shown in Fig 2; the change is imposant. The new chinese supercomputer is not only good in energy consumption, and the raw computing power, but also the coordination of the parallel work is excellently organized (the scale is the same as in Fig. 1 , where inside-chip organization takes place, although there the benchmark is different). 
Characterizing the supercomputer timeline
When comparing the performance scales one sees an imposant change in the performance. There are (not fully detailed) data available on site TOP500.org (2016), covering the "supercomputer age", so using the data R max and R peak , and using Equ. (5), (1−α) can be calculated in function of time and ranking, see Fig 3 . It looks like (1 − α) changes in an exponential-like way, both with the time and the ranking in a given year. To establish a more quantitative description, it is worth to derive a timeline for the past 24 years. In Fig. 4 , the (1 − α) values are displayed, for the top 3 supercomputers, in function of the time. The figure also contains the diagram of the best (1 − α) in the year, which confirms that high computing performance strongly correlates with the efficiency of parallelization. It looks like this development path (independently of technology, manufacturer, number and type of processors) shows a semi-logarithmic behavior, and only part of the tendency is caused by the Moore-observation. It is able to forecast the expected behavior of performance in the coming years, and its validity can provoke debates like the Moore observation does.
Conclusions
The recently introduced figure of merit "effective parallelization" can excellently used to characterize the quality of hardware implementation, too. In addition to qualifying manual or compiler optimized parallelization, it can qualify the effect of method of inter-core communication in SoC, can characterize "goodness" of supercomputer implementation. Since a single figure of merit describing their performance can be attached to the supercomputers, the timeline of the development of supercomputing technology can be described. Interestingly enough, the timeline of the introduced parameters follow a tendency, similar to the Moore "law". 
