Accelerating Green Computing with Hybrid Asymmetric Multicore
  Architectures and Safe Parallelism by Mogale, Hope et al.
Accelerating Green Computing with Hybrid Asymmetric Multicore
Architectures and Safe Parallelism
Hope Mogale * 1 Michael Esiefarienrhe * 1 2 Naison Gasela 2 Lucia Letlonkane 3
Abstract
In this paper we present a novel strategy for acce-
larating green computing by utilizing and adopt-
ing the Hybrid Asymmetric Multicore Archi-
tectures (HAMA) model with Safe Parallelism.
Most of the modern computing is serial and con-
tributes to the global footprint of energy con-
sumption. These impacts are often witnessed
and experienced in many server farms and cloud
computing platforms where the majority of the
world’s information resides. Evidently in this pa-
per we present a novel strategy that can help de-
celerate the global footprint of energy consump-
tion caused by computing. Through our strategy
we prove that by adopting HAMA and utilizing
safe parallelism energy consumption per compu-
tation can be minimized.
1. Introduction
Multicore microprocessors have been evolving since their
inception in the early 2000s. However, since their inception
they have not been redesigned in a manner that is truly eco-
friendly and energy efficient. Microprocessors continue to
be hazardous and none eco-friendly and the core of their
problem lies in how they are manufactured as semicon-
ductor devices. The current process of fabricating mod-
ern semiconductor devices is not energy efficient accord-
ing to research in (Gutowski et al., 2009). Multicore ar-
chitectures, when fully utilized, present us with great ad-
vantages in computation speeds as compared to the tradi-
tional monolithic processors. These parallel architectures
can allow us to realize vast amounts of computing power
enough to boost internet speeds, desired unlimited gam-
ing, video streaming while at the same time multi tasking.
*Equal contribution 1North-West University, Mafikeng, South
Africa 2North-West University, Mafikeng, South Africa 3North-
West University, Mafikeng, South Africa. Correspondence to:
Hope Mogale <Hope.Mogale@ieee.org>.
Proceedings of the 2nd International Conference on advances
in Big Data, Computing and Data Communication Systems
(icABCD), Drakensberg, South Africa. Copyright 2019 by the
author(s).
The only problem is that programmers and application de-
signers were slow in adopting a new style and approach to
application development called parallel programming and
(Pacheco, 2011) hence the multicore technology and its un-
derlying parallel architecture saw slow adoption rates from
application developers. Programmers and application de-
velopers argue that parallel programming is not as mundane
(McCool et al., 2012; Diaz et al., 2012) as traditional serial
programming and development of applications. Using par-
allel programming as a style of development, developers
would have to know well how to utilize parallel patterns
and how objects compose, how to avoid races, and how
to obtain deterministic output from code (Robison, 2013).
When programming for multicore architectures or any par-
allel architecture one has to know all boundaries and how
to proof test that code is safe and obeys all laws of par-
allel programming such as the fundamental principle of
computation which governs and dictates parallel compu-
tation known as Amdahl’s Law (Amdahl, 1967). Amdahl
’s fundamental principle states that not all work can done
in parallel and research (Hill & Marty, 2008) has proven
that Amdahl ’s Law still holds in the Multicore era. There
exists a taxonomy which represents classes of in which all
computers can be classified and grouped according to their
architectures. This was first illustrated and documented
by Flynn (Flynn, 1972) and has since become the de facto
standard for classifying computers with their architectures
(Flynn, 1972) including the ones with parallel architec-
tures. Further attempts have been proposed by researchers
(Snyder, 1988) to extend the taxonomy to accommodate
other classes which may appear in future. Flynn’s taxon-
omy identified that computers can be classified into four
classes namely SIMD, SISD, MIMD, and MISD with vari-
ations between classes being the single data streams (SISD,
MISD) and multiple data streams (SIMD, MIMD). Parallel
computers and parallel architectures fall with the multiple
data streams and these can be classified into two more cate-
gories of shared memory or distributed memory. When mi-
croprocessors adopted the multicore architectures scheme
they fell into the multiple data stream. However not much
has been developed to take advantage of their underlying
architectures and research (Hill & Marty, 2008) shows that
it is possible and much can still be done to improve utiliza-
tion and reduce problems dark silicon (Esmaeilzadeh et al.,
ar
X
iv
:1
90
9.
08
97
8v
1 
 [c
s.D
C]
  1
9 S
ep
 20
19
Accelerating Green Computing with Hybrid Asymmetric Multicore Architectures and Safe Parallelism
2011). We aim to research and develop microprocessor ar-
chitecture designs which are eco-friendly and suitable for
promoting green computing. We will do all of this while
our main emphasis remains energy efficiency. The technol-
ogy treadmill continues to grow and elevate a staggering
rate. As consumers continue to drive the technology tread-
mill the size of the transistors on microchips keep shrinking
allowing devices such as wireless headsets to have embed-
ded microchips. As this trend continue to grow problems
arise in the semiconductor and the microprocessor industry.
The amount of energy budget spent on today’s micropro-
cessors has reached unavoidable peaks. Transistors cannot
shrink forever, and this suggests the end of Moore’s law for
Silicon. Researchers are now in search for a perfect can-
didate to replace silicon and this is referred to as the post-
silicon era. The most important question is what will be
that substitute. All of that remains to be answered by time,
while in the meantime the quest for research into devel-
oping better technology architectures for multicore micro-
processors continues. Our research presented in this paper
is part of that quest. We shall structure the remainder of
this paper as follows. Section 2 will present research work
related to our research then in section 3 will present our
HAMA with Safe Parallelism, section 4 will discuss ob-
tained experimental results. Lastly, section 5 will present
conclusion and future work.
2. Related Work
We live in an age where every personal computer and
other major technology products such as smart phones,
smart watches and wireless earphones have microproces-
sor chips. Accompanying these chips are parallel archi-
tectures making high performance computing capabilities
available for these microprocessor chips. For over five
decades (Mack, 2011) these microprocessor chips have
been governed by silicon and Moore’s Law. This is now
coming to an end and it is important to learn and investi-
gate how green will the future be without silicon and its
heavy ecological footprint. There has been a substantial
amount of research published on determining how the fu-
ture will be without silicon. In this section we identify,
discuss and critique the most notable ones related to our
research. Massimo and team in (Fischetti et al., 2013) un-
dertook a theoretical study to find out if silicon can with-
stand gate leakage currents below 10nm. Their brilliant
theoretical study shows that scaling rules extracted from
the 2011 International Technology Roadmap for Semicon-
ductors (ITRS)-Roadmap and by using more strict scal-
ing rules from the literature confirmed by simulations of
5-nm gate-length III VFETs. By employing local emper-
ical pseudopotentials they were able see that gate current
in the ON-state is shown to reach worrisome values at gate
lengths of about 5 nm. Kwon who is a senior researcher
at the world’s largest microchips producer samsung pro-
vided a notable research input in (Kwon, 2011) which pro-
posed a very important study that provides details on how
to make the semiconductor industry power effiecient and
how microchip can be produced in an eco-friendly manner
to protect the planet. In this study he outlined that to save
mankind from life-threatening environmental crisis caused
by non-ecofriendly semiconductor technologies, the indus-
try is expected to convert to eco-friendly technologies to
preserve and save the environment. Kozawa et. al (Kozawa
et al., 2014) conducted a study which investigated the ex-
tendibility of chemically amplified resist processes to the
sub-10-nm half-pitch node taking into assumption the use
of extreme ultraviolet lithography which demands more on
the energy budget of a foundry. Furthermore, in this work
they advise that although sub-10-nm fabrication is consid-
ered to be feasible, a significant increase in the acid gen-
eration concentration and the development of related mate-
rial technologies are required. Hence, this study declares
jwwwthat as the shrinking continues the more energy will
be needed to be consumed and therefore, more resources
will have to be utilized which can have a negative outcome
on the environment. We aim to take all these studies into
account when designing our hybrid asymmetric multicore
architecture. We have used the lessons outlined by these
studies to further enhance our architecture to make it more
power efficient.
3. HAMA with Safe Parallelism
3.1. Hybrid Asymmetric Multicore Architecture
(HAMA)
In a similar body of work (Mogale et al., 2018) previously
proposed by same authors listed on this current research
paper, a full description is given for a complete Hybrid
Asymmetric Multicore Architecture called DOMINO. This
paper is a continuation of research proposed by the authors
in (Mogale et al., 2018) and adopts the same definition for a
Hybrid Asymmetric Multicore Architecture. As described
by authors in (Mogale et al., 2018) the Hybrid Asymmetric
Multicore Architecture utilizes passive cores that are de-
signed not to stay powered on at all times to preserve and
save energy. Passive cores utilize a selftiming mechanism
that is deterministic in nature and assures that after com-
pletion of work the cores will be powered off. To avoid
computation overheads these cores utilize structural paral-
lelism which is fully compositional and highly determinis-
tic by adopting parallel patterns.
Accelerating Green Computing with Hybrid Asymmetric Multicore Architectures and Safe Parallelism
Fig. 1. Quad Core Topology With Two Cores Powered.
As seen in figure 1 workload is divided evenly on the pas-
sive cores using a designated parallel pattern as a guide
since parallel patterns exhibit safe parallelism and ensure
determinism. Using parallel patterns such as a stencil or
a map or a farm pattern we can synchronize the duration
of the timer with that of the size of the problem in order
to determine computation duration. We are aware that this
entirely depends on Amdahl’s law and we adopt asymmet-
ric design coupled with asynchronous cores which can be
passive to promote energy efficiency.
3.2. Safe Parallelism with Parallel Patterns
For parallelism to be considered Safe it must be struc-
turally defined with objects that compose and program code
has to be fully deterministic without races, dependencies
which can cause undesired computation overheads. Later
in this paper we will demonstrate that parallelism is expen-
sive in terms of energy consumption per computation for
a typical multicore CPU. There exists several Algorthmic
Skeleton Frameworks (AsKF) also known as parallel pat-
terns which can be used to counter dark silicon and parallel
computation problems (McCool et al., 2012), (Cole, 1989)
which occur frequently in Multicore Architecture Topolo-
gies. These Skeletons or parallel patterns seen in figure 2
are also known as idoms in parallel computing, help main-
tain structered parallelism for fine grain computing (Robi-
son, 2013). Also they help make sure and maintain that
computation does not only compose but also that it is de-
terministic at all times. Several of these skeletons exists in
the serial programming world but are insufficient for par-
allel computation. Most common known skeletons in the
serial world are for, if, and while. These are not effec-
tive for parallel computation and so does not suit parallel
architecures well and this has led researchers to develop
new forms of Skeletons which can aid this problem. We
discuss them briefly below:
Fig 2. Illustration of Parallel Patterns (McCool et al., 2012)
3.2.1. MAP PATTERN
The Map pattern is used for performing an operation on
every element on a collection. The Map pattern usually
comes in handy in applications which utilize collections.
On a map pattern, a serial iteration pattern is executed
which is independent by nature and has no dependencies.
Illustrated in figure 3 the map pattern utilizes an elemental
function which has known iterations to ensure composition
and to eleminate the problem of non-deterministic compu-
tation. The map pattern is suitable for multicore archite-
cures because each strand of computation on each node in
the map pattern sequence can be mapped to a core for par-
allel execution.
Fig 3. Illustration of Map Pattern (McCool et al., 2012)
Map pattern is very suitable for embarrasingly parallel
problems and can be nested with other patterns (Aldinucci
et al., 2016), (Sheshikala et al., 2016) to create a more pow-
erful pattern for computation. A map pattern can be ad-
vanced with a reduce pattern to form a Map-Reduce pattern
which can help enhance parallel computation. For example
on a map-reduce pattern a mapper side
〈
x1, y1
〉
together
with its input
〈
x2, y2
〉
shuffled and sorted and lastly given
as input to the reducer as
〈
x2, y2
〉
which then generates〈
x3, y3
〉
as the last output. We discuss the reducer pattern
next on our list.
Accelerating Green Computing with Hybrid Asymmetric Multicore Architectures and Safe Parallelism
3.2.2. REDUCE PATTERN
The reduce pattern is an idom which combines all items
in a collection into one output. On a reduce pattern or a
reduction pattern which has a collection of k items as an
example, two adjacent items x and y of that collection can
be chosen and reduced to form a k − 1 collection. Reduce
pattern works well for applications such as matrice multi-
plication and monte carlo simulation. It is used extensively
in many algorithms because of its associative properties. In
summary it can be thought of as P = X1 ⊕ X2 ⊕ ...Xn,
provided Xi represents the ith item in the collection. If we
assume that a data collection P is of type c then we can
use a binary function which is associative in nature. This
function can be represented as ⊕ : c× c→ c, provided the
function carries no dependencies.
3.2.3. STENCIL PATTERN
A stencil pattern is a data pattern which behaves like a map
pattern with the primary difference being that the elemental
function cannot only access the items in a collection but
also items in a neibourhood. Thus a stencil’s output is a
function of neighbourhoods of elements in a collection. A
Stencil pattern is a variation of the gather pattern and it is
commonly used in applications of image processing which
would normally utilize two dimensional arrays for storing
pixels for bitmaps and manipulating those pixels to achieve
desired output as depicted in figure 4.
Fig 4. Illustration of the Stencil Pattern (McCool et al., 2012)
3.2.4. FARM PATTERN
The farm pattern or parallel idom operates similar to the
map pattern however, the size of the collection is not known
in advance. The farm pattern is also suitable for embarras-
ingly parallel computations. One of the primary differences
between the farm pattern and the map pattern is that the
map pattern is a data pattern while the farm pattern is a
stream pattern.
3.3. Karatsuba Polynomial Multiplication Experiment
3.3.1. EXPERIMENTAL SETUP
We have designed and optimized the Karatsuba Polyno-
mial Multiplication algorithm which is used as a fast mul-
tiplication algorithm. In our experiments this algorithm
will be used to perform 256 multiplications over ten thou-
sand degree Polynomials. Our target platform is both the
old Haswell microarchitecture and the new Kaby Lake
microarchitecture all wrapped in a Core i7 microproces-
sor. The Kaby Lake die is of 14nm while Haswell is
22nm. For both aforementioned nodes we have optimized
the Karatsuba Algorithm described in Algorithm Listing 1
to be executed in serial, vectorized, Parallel, and Parallel-
Vectorized. For parallel and Parallel vectorized we have
optimized the algorithm to utilize the aforementioned par-
allel patterns as a strategy. We have timed both computa-
tions for Haswell and Kaby Lake and after running the al-
gorithm on the Haswell processor the following test results
were obtained as seen in figure 5.
Algorithm 1. Karatsuba Polynomial Multiplication
3.3.2. KARATSUBA ON HASWELL
As can be seen from the screenshot in figure 5 the algorithm
performed well based on how well it was optimized. As
an example both parallel and vectorized parallel perform
very well when compared to serial which in this case we
can deem as suboptimal for speedup and performance. As
we can see in figure 5 both parallel and Parallel/vectorized
improved performance respectively with 3.85x and 3.97x
speedup both with minimum timespan of 3.2 seconds over
12 seconds time span of both serial and vectorized.
Accelerating Green Computing with Hybrid Asymmetric Multicore Architectures and Safe Parallelism
Fig 5. Screenshot of Karatsuba on Haswell
As visible in figure 5 our modified Karatsuba algorithm
first runs serial, vectorized then follows parallel and Paral-
lel/Vectorized. If we look at the CPU utilization in terms of
clockspeed we see that the algorithm utilizes the CPU. For
both serial and vectorized the utilization is not that heavy
but for parallel and parallel-vectorized CPU Clock utiliza-
tion is intense for all cores as visible in figure 6 below.
Fig 6. CPU utilization of Karatsuba on Haswell
For sake of performance, this is good. However it is wise
to note that CPUs that are running at full speed consume a
lot of energy. If we look at the CPU temparature profile for
the aforementioned experiment of Karatsuba on Haswell
we notice that temparatures rise as soon as parallelism is
utilized. Hence this prompts us to come up with a bet-
ter approach which promotes ecofriendly parallelism that
is safe.
3.3.3. KARATSUBA ON KABY LAKE
We have also tested Karatsuba running on Kaby Lake
which is Intel’s most recent state of the art microproces-
sor. These microprocessors are designed to be efficient,
powerful and since they are 14nm thick they feature fan-
less design. We have our algorithm running on serial and
optimized for vectorized, parallel and vector-parallel. The
performance we obtained can be seen visible in the screen-
shot of figure 7.
Fig 7. Screenshot of Karatsuba on Kaby Lake
We see from figure 7 that the results are quite different
from Haswell which featured a full 8x Hyperthreaded mul-
ticore microprocessor at 22nm. The serial part of the
algorithm spanned 17.875s and a speedup of 1.00x was
achieved while the vectorized part performed at 17.422s
with speedup of 1.03x which is better than the serial coun-
terpart. Further, we see that the parallel part of the algo-
rithm halved the time by spanning only 8.516s to com-
pleted at achieved a speedup of 2.10 x.
Fig 8. CPU utilization of Karatsuba on Kaby Lake
From figure 8 we can see that the portrayed clockspeed pro-
file of Karatsuba running on Kaby Lake is totally different
from the one of Haswell. This is because at 14nm the core
architecture has changed albeit with few similarities. One
has to take into account the fact that the clockspeed of the
processors is not the same, but the computation distribu-
tion pattern remains similar. We see that the last part which
features vectorized parallelism is also flat. The reason why
performance is great and spans the ideal optimal time for
the vectorized algorithm is that our implementation com-
bines both data parallelism by utilizing arrays and task par-
allelism by allowing compute intensive parts of our algo-
rithm such as polynomial computation to run in parallel.
However as mentioned before, while parallelism reduces
maximum computation time it increases voltage utilization
per core as visible in figure 8. This can be seen in figure
Accelerating Green Computing with Hybrid Asymmetric Multicore Architectures and Safe Parallelism
9 which presents CPU Voltage per core for Karatsuba run-
ning on Kaby Lake.
Fig 9. Kaby Lake CPU Voltage per core
If we look at both figures 8 and 12 we can notice few sim-
ilarities. This is because the CPU Voltage performance per
core correlates with core clock speed. The higher the clock-
speed the higher the Voltage per core and that is the draw-
back with multicore designs. The clock speed is depen-
dent on the overall thread activity that is experienced by
the CPU. The total thread activity plays an important part
in the performance of the CPU because if the workload is
not balanced then performance deficiencies may be experi-
enced. In this research we are promoting a strategy of using
safe parallelism and Hybrid Asymmetric Multicores as a
strategy to accelerate green computing. We recognize that
parallelism alone is not enough and that parallelism on mul-
ticore design is detrimental because it increases energy use
per core even though it reduces computation time as visible
in figure 5 and figure 7. To counter the energy use per core
trap created by multicore designs we recommend adoption
of our aforementioned hybrid assymmetric multicore de-
signs which feature low powered asynchronous cores. If
we let all parallel activity to be run on these passive asyn-
chronous cores then we believe that energy effiency can
be very much improved per computation since the asyn-
chronous cores have no clock and feature low power cou-
pled with globally asynchronous and locally synchronous
strategy. To provide brief evidence to this we provide a
simplified energy analysis below.
3.3.4. ENERGY EFFIENCY ANALYSIS
We adopt Ahmdal’s law collaries mentioned by authors in
(Hill & Marty, 2008) for theoretical analysis and evaluation
of our designs we take into consideration the following the-
orem proposed by authors in (Yao et al., 2009) which states
that if speedup for asymmetric is expressed as follows:
Speedupasymmetric(f, n, r) =
1
( 1−f
perf(r)
)+( f
perf(r)+n−r )
(3.1)
it follows that if perf(r) = rc, 0 < c < 1, then it holds
that:
• If fn−1 (1−c)c ≤ n2 , then the maximum of speedup oc-
curs at r = 1 and the speedup is a decreasing function
of r
• If fn ≥ n1−c then it is clear that the maximum speedup
occurs at r = n and it is an increasing function of r
• Lastly, If fn−1 (1−c)c ≤ n2 and cf ≤ n1−c, then the
maximum speedup will occur at a unique interval r0 ∈
(1, n).
Since we will be focusing on asymmetric design to improve
perfomance we will take the aforementioned into consid-
eration when analyzing the performance of our designs.
However, one important aspect that we are very much con-
cerned about is energy effieciency since the goal of this
paper is to analyze and determine how to promote green
computing by adopting HAMA and Safe Parallelism.
If we take into account that energy effiency can be im-
proved with parallelization, then we can deduce a strategic
Axiom which states that:
• Processors can be run at an arbitrary clock frequency
subject to a capped maximum frequency we will call
Fmax
• The speedup of k that one can achieve with ideal Par-
allelism of f = 0.99 in correlation with processor
speeds and scaling is subject to 1 ≤ k ≤ 1s+p
N
based
on Amdahl’s law defined in (Amdahl, 1967).
• Lastly, we argue that albeit this is true we optimisti-
cally declare that the average computation span kcomp
approaches the ideal level of parallelization f = 0.99.
That is the greater the value of f gets, the less amount
of kcomp will be experienced and since kcomp will be
reduced, less energy will spent on an average compu-
tation.
It is worth noting that the last statement of our axiom takes
into account the fact that parallelization increases both tem-
parature and voltage per core peformance as seen in figure
Accelerating Green Computing with Hybrid Asymmetric Multicore Architectures and Safe Parallelism
9 and figure 10 which presents the temparature profile of
Karatsuba running on Haswell.
Fig 10. Temparature Profile of Karatsuba on Haswell
3.4. Hybrid Architecture Perfomance Drawbacks
We would have liked to conclude this paper by telling re-
searchers that there are no drawbacks to our designs. How-
ever, this can never be the case and the following is what
we identified as drawbacks and we will be working on them
in future research work. We also advise fellow researchers
to lend a hand where possible since research is a continous
team effort.
• Core Computation Synchronization - Since our
design features both synchronous and asynchronous
cores on a die this gives birth to delay per computa-
tion due to synchronization. Even though this can be
minimal at times it does become visible if workload is
increased
• Compatability Issues - As of today most micropro-
cessors utilize clocks and because operating systems
have been programmed to use this clock, our design
may not be compatible yet with many platforms be-
cuase of the hybrid nature of the design causing per-
formance to collapse since only synchronous cores
may be recognized.
• Sequential Synchronization - Our design is truly
optimized mostly for parallel computation hence the
adoption of asynchronous cores. We believe that our
design will experience under utilization if they are
adopted for serial computation and since only syn-
chronous active cores may be only active at a given
time performance will be greatly reduced.
To model power consumption for the last case we shall use
the variable D to refer to our asymmetric core. It is worth
to note that D is a special case that shall only arize when
there is a need. Using Amhdal’s law collary described in
equation 3.1, we know that n is the number of processors
and f is the fraction of computation that can be parallelized
(0 <= f <= 1). To model power consumption for D we
shall use k as a variable for measuring the power consumed
during idle time by our processor (0 <= k <= 1). Further,
we assume that when our chip D is in superscalar mode it
consumes a power of 1. However, by definition using Amh-
dal’s law (Amdahl, 1967) the amount of power consumed
by a processor during the sequential phase is 1 while the
remaining (n− 1) processors consume (n− 1)k. Thus we
can assume that during sequential computing phase our D
processor will consume is:
Sequential = 1+(n−1)k(n2 )
We note that this may not always be the case and as afore-
mentioned D is a special case for sequential computation
which we did not particularly focus on as part of our re-
search goals.
4. Conclusion and Future Work
In this paper we presented our novel strategy for accel-
erating green computing by utilizing and adopting Hy-
brid Asymmetric Multicore Architectures (HAMA) model
with Safe Parallelism. We have provided sufficient evi-
dence which argues that parallelism alone is not enough
to promote energy efficiency. Most of the modern com-
puting is serial and contributes to the global footprint of
energy consumption. These impacts are often witnessed
and experienced in many server farms and cloud comput-
ing platforms where majority of the world’s information
resides.Through our novel strategy we have proven that by
adopting HAMA and utilizing safe parallelism, energy con-
sumption per computation can be greatly minimized. We
have also outlined the minor drawbacks of our strategy
which we hope overcome with future work.
References
Aldinucci, Marco, Danelutto, Marco, Drocco, Maurizio,
Kilpatrick, Peter, Misale, Claudia, Pezzi, G Peretti, and
Torquati, Massimo. A parallel pattern for iterative sten-
cil+ reduce. The Journal of Supercomputing, pp. 1–16,
2016.
Amdahl, Gene M. Validity of the single processor approach
to achieving large scale computing capabilities. In Pro-
ceedings of the April 18-20, 1967, spring joint computer
conference, pp. 483–485. ACM, 1967.
Accelerating Green Computing with Hybrid Asymmetric Multicore Architectures and Safe Parallelism
Cole, Murray I. Algorithmic skeletons: structured manage-
ment of parallel computation. Pitman London, 1989.
Diaz, Javier, Munoz-Caro, Camelia, and Nino, Alfonso. A
survey of parallel programming models and tools in the
multi and many-core era. Parallel and Distributed Sys-
tems, IEEE Transactions on, 23(8):1369–1386, 2012.
Esmaeilzadeh, Hadi, Blem, Emily, Amant, Renee St,
Sankaralingam, Karthikeyan, and Burger, Doug. Dark
silicon and the end of multicore scaling. In Computer Ar-
chitecture (ISCA), 2011 38th Annual International Sym-
posium on, pp. 365–376. IEEE, 2011.
Fischetti, Massimo V, Fu, Bo, and Vandenberghe,
William G. Theoretical study of the gate leakage current
in sub-10-nm field-effect transistors. IEEE Transactions
on Electron Devices, 60(11):3862–3869, 2013.
Flynn, Michael J. Some computer organizations and their
effectiveness. IEEE transactions on computers, 100(9):
948–960, 1972.
Gutowski, Timothy G, Branham, Matthew S, Dahmus, Jef-
frey B, Jones, Alissa J, Thiriez, Alexandre, and Sekulic,
Dusan P. Thermodynamic analysis of resources used in
manufacturing processes. Environmental science & tech-
nology, 43(5):1584–1590, 2009.
Hill, Mark D and Marty, Michael R. Amdahl’s law in the
multicore era. 2008.
Kozawa, Takahiro, Santillan, Julius Joseph, and Itani,
Toshiro. Feasibility study of sub-10-nm half-pitch fab-
rication by chemically amplified resist processes of ex-
treme ultraviolet lithography: I. latent image quality pre-
dicted by probability density model. Japanese Journal of
Applied Physics, 53(10):106501, 2014.
Kwon, Oh-Hyun. Eco-friendly semiconductor technolo-
gies for healthy living. In Solid-State Circuits Confer-
ence Digest of Technical Papers (ISSCC), 2011 IEEE In-
ternational, pp. 22–28. IEEE, 2011.
Mack, Chris A. Fifty years of moore’s law. IEEE Transac-
tions on semiconductor manufacturing, 24(2):202–207,
2011.
McCool, Michael, Reinders, James, and Robison, Arch.
Structured parallel programming: patterns for efficient
computation. Elsevier, 2012.
Mogale, Hope, Esiefarienrhe, Michael, Gasela, Naison,
and Letlonkane, Lucia. Introducing domino: An eco-
friendly asynchronous hybrid multicore architecture for
green computing. In 2018 International Conference on
Advances in Big Data, Computing and Data Communi-
cation Systems (icABCD), pp. 1–7. IEEE, 2018.
Pacheco, Peter. An introduction to parallel programming.
Elsevier, 2011.
Robison, Arch D. Composable parallel patterns with intel
cilk plus. Computing in Science & Engineering, 15(2):
0066–71, 2013.
Sheshikala, M, Rao, D Rajeswara, and Prakash, R Vijaya.
Parallel approach for finding co-location pattern–a map
reduce framework. Procedia Computer Science, 89:341–
348, 2016.
Snyder, Lawrence. A taxonomy of synchronous parallel
machines. Technical report, DTIC Document, 1988.
Yao, Erlin, Bao, Yungang, Tan, Guangming, and Chen,
Mingyu. Extending amdahl’s law in the multicore era.
ACM SIGMETRICS Performance Evaluation Review, 37
(2):24–26, 2009.
