Design of High performance and Low power Simultaneous Multi-Threaded Processor by Arora, Krishan et al.
International Journal of Electrical and Computer Engineering (IJECE) 
Vol. 3, No. 3, June 2013, pp. 423~428 
ISSN: 2088-8708  423 
  
Journal homepage: http://iaesjournal.com/online/index.php/IJECE 




Krishan Arora*, Paramveer Singh Gill*, Parul Mehra** 
* Assistant Professor Departement of Electronics and Electrical Engineering, Lovely ProfessionalUniversity, Punjab 
**Phd Scholar, Departement of Commerce, CMJ University, Meghalaya 
 
 
Article Info  ABSTRACT 
Article history: 
Received Mar 31, 2013 
Revised May 14, 2013 
Accepted May 27, 2013 
 
 In this paper, we present the design of a High Performance Multi-Threaded 
Processor. Processing of high quality images is inevitable in applications 
such as, HD TV, Gaming Multimedia, etc. which require a great processing 
power with low power consumption. This can be achived with multi-threaded 
processors which optimally utilises the Functional Units (Fus). The speed of 
processing is as good as multi-core processors with lesser area. A conflict 
resolver (CR) is designed for scheduling the instructions, which involves 
allocation of Fu. The data move instructions are in majority in any of the 
programs; the corresponding logic blocks are replicated and speed of 
execution is further improved. We illustrated for two-threaded 
processorHowever, it is possible to extend the design for any number of 







Single Threaded Processor 




Departement of Electronics and Electrical Engineering, 




1. INTRODUCTION  
The extreme developments in entertainment, gaming, medical imaging and HDTV along with 
electronics systems. Electronic systems design is getting more and more complex. The system requires very 
high speed processing, at the same time power consumption is also stretching its limitations. To meet this 
contradicting requirements the solution is in the efficient and effective embedded processor design. One of the 
promising methods is “Simultaneous multithreading” (SMT), which takes super-threading to the next level in 
the high performance processor design. It is super-threading without the restriction, i.e. all the instructions 
issued by the front end on each clock are from the same thread. In SMT, instructions are executing on different 
threads simultaneously. So we get better utilization of functional units. The issue of scheduling is properly 
managed by a special hardware unit called Conflict analyzer, which takes part of the opcode, i.e. 3 most 
significant bits.  
Although SMT might seem like a pretty large departure from the kind of conventional, process-
switching multithreading done on a single-threaded CPU, it actually doesn't add too much complexity to the 
hardware. This is done by dividing up the processor's architectural resources into two types: Replicated  and 
Shared.   
 
 
2. RESEARCH METHOD 
The potential for achieving a significant increase in throughput on a superscalar by using 











































 Vol. 3, No. 






































are. This is d
d. Let's take a


















































a design is a
 issue of non








o drive the si
ors and the ne
om above in 
 Multi-Thread
hine cycle i.e
e very less po
 not need any
uit is compos
speed increa











seem like a 
n a single-th

















exists. In [8] 




ing, It fetch 








































ity Model to s
) A new Sing
the op-code 
ly two clock 
on area. It is
it which is re













rams to issue 
ocessor [3] p




















 because we 
quire to redef
 eliminating 




















g is a techniq


























read has its 
ile sharing 
ifice single 
 and clock 
ns using a 
multaneous 





model of a 















































ction on the r
ction then co
ent and do 
d-1 will exec
ng functional









































sfer logic to im
fer instructions
Thread proce
 is only one
ct Resolver 
ure 2.  Archit
thing but a s
l unit will be
t Instruction 
nal Units. If 
r issue a sign
 instruction 




























 used by thr
by Thread-1 














 each thread 





















e output. All t








s care of all d
has its own co
ing each thre


















 again so by





py of TL. It i













n. So it will 
 Thread-2 to 
nctional unit
 counter of 
ead-1(i.e. Ins




s because in a




ne copy of 
ifter). The 
ti-Threaded 
des of next 
reserve the 
execute its 




by the flow 
ter-2, 
 program 















































 Vol. 3, No. 
ESULTS AN
A total of
ered to be r
ssors are com
 





















































h Threads to exe























































6  5 
ISSN:2













IJECE ISSN: 2088-8708  
 
Design of High Performance and Low Power Simultaneous Multi-threaded Processor (Krishan Arora) 
 
427
So as we can see from the above graphs that the gate count and power consumption of Dual core 
processor is just double as that of Single Threaded Processor. But there is lot of area and power saving in case 
of Multithreaded Processor as compare to Dual Core Processor. Its is mainly because in case of  Multi-
Threaded the all functional units are not replicated but they are shared, while in case of Dual-Core processor 




Figure 6. Performance with respect to dependencies among the threads 
 
 
Here we compared performance of Single-Threaded Processor and Multi-Threaded Processor on the 
bases of Instructions executed per clock cycle for 50 Instructions program per Thread (i.e. total of 100 
instructions). For 0 and 2% conflicts our Multi-Threaded Processor performs 197% compare to Single-
Threaded Processor (i.e. nearly like two Single-Threaded Processors). As percentage of conflicts increases the 
performance of Multi-Threaded processor starts decreasing nearly linearly. Since for 50 instructions per thread 
the maximum possible conflicts can be 94%. At 94% conflict Multi-Threaded will perform 104.63%.But as we 
know most of the Instructions in a program is Data-transfer and Jump, so in most of cases the conflict will be 
will be varies from 30-60% and correspondingly our Multi-Threaded processor performance varies from 
156.15-126.88% compare to Single-Threaded Processor. 
 
 
6.  EXPERIMENTATION 
The processor is modelled using Verilog HDL and its functionality is verified by using ModelSim-XE 




7.  FUTURE SCOPE 
The presented Multi-Threaded processor architecture can be extended to any number of threads by 
suitably redesign the CR, also replicate transfer logic and CPU Registers as many as threads.The usual 
limitation on the number of threades is number of functional units used in the design. Incase the number of 
threads exceeds the number of functional units, the threades has to wait more to get perticular functional unit to 
execuete its instruction, so the perforemence of Multi-Threaded processor will degrade greatly .  But there is 
also scope we can either go for combination of Multi-Core and Multi-threaded processor, in which there will 




[1] Dean M. Tullsen, Susan J Eggers, Henry M Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. 
ICSA'95, Santa Margherita Ligure Italy. 1995: 392–403. 
[2] Susan J Eggers, Joel S Emer, Henry M Levy, Jack L Lo, Rebecca L Stamm, Dean M Tullsen. Simultaneous 
Multithreading: A Platform for Next-Generation Processors. Journal of IEEE Micro. 1997: 12-17. 
[3] Haitham Akkary, Michael A Driscoll. A Dynamic Multithreading Processor. 31st Annual ACM-IEEE International 

































   ISSN:2088-8708 
IJECE Vol. 3, No. 3, June 2013:  423–428 
428
[4] O’Melveny, Myers LLP, Kayamba, Inc. A Massively Multithreaded Packet Processor. Presented at NP2: Workshop 
on Network Processors, held in conjunction with The 9th International Symposium on High-Performance Computer 
Architecture, Anaheim, California. 2003: 1-11.  
[5] P Leadbitter, D Page, NP Smart. Nondeterministic Multithreading. IEEE Transactions on Computers. 2007; 56(7): 
992-998. 
[6] Venkatesan Packirisamy, Shengyue Wang, Antonia Zhai, Wei-Chung Hsu and Pen-Chung Yew. Supporting 
Speculative Multithreading on Simultaneous Multithreaded Processor. Y Roberts et al. (Eds.): HiPC. 2006: 148-158. 
[7] Carlos Madriles, Carlos Garcı´a-Quin˜ones, Jesu´s Sa´nchez, Pedro Marcuello, Dean M. Tullsen. Mitosis: A 
Speculative Multithreaded Processor Based on Precomputation Slices. IEEE Transactions on Parallel and Distributed 
Syatems. 2008; 19(7): 914-925.  
[8] Nicholas Ma, Naraig Manjikian, Subramania Sudharsanan. Modeling and Simulation of Multicore multithreaded 
Processor Architecture in System C. CCECE. 2008: 1155-1160. 
[9] David Burgart. Modeling Multi-Threaded Processors. White paper TeamQuest. 2008: 1-10. 
 
 
BIOGRAPHIES OF AUTHORS  
 
 
Krishan Arora is presently working as Assistant Professor in the Department of  Electronics and 
Electrical Engineering in Lovely Professional University, Phagwara (Punjab). He has 
completed his B.Tech (Electrical & Electronics Engg.) from Punjab Technical 
University,Punjab and M.Tech (Electrical Engg.) from Punjab Technical University, 





Paramveer Singh Gill is presently working as Assistant Professor in the Department of  
Electronics and Electrical Engineering in Lovely Professional University, Phagwara (Punjab). 
He has completed his B.Tech (Electronics & Communication Engg.) from Punjab Technical 








Parul Mehra is Phd Scholar from Department of Commerce in CMJ University, Meghalaya. She 
has completed her Master of Commerce from Hindu College, Amritsar and Bachelors of 
Education with specialization in Commerce and Economics from Khalsa college of Education 
Amritsar. She has cleared UGC-NET with specialisation in Commerce in DEC.2011 in First 
Attempt. 
 
 
