What's the Problem
The long latency operations such as memory accesses remote reads synchronization operations may extend for 10 to 100 cycles, forcing the traditional processor to sit idle until the result comes in
Main Idea
Multithreaded processors multiplex the execution of a number of concurrent threads onto the hardware in order to hide latencies When a long-latency operation occurs in one of the threads, another begins execution 
Multiple Hardware Contexts

Characteristic latency
Multithread Models
Coarse-grained (block interleaving)
Executes a single thread until it reaches certain situations
Fine-grained (cycle-by-cycle interleaving)
The processor switches each cycle to a different thread
Multiple-issue (simultaneous)
Integrate multithreaded mechanism into superscalar architectures
Coarse-grained Multithreading
The triggering event in a block interleaving (coarse-grained) model can be classified as follows:
Coarse-grained Example -Sparcle 
Cycle-by-cycle Interleaving
An instruction of a thread can be fed into the pipeline after the completion of the previous instruction of that thread It eliminates pipeline hazards so that the processor pipeline can be very simple 
Prospects for Success
To date (1995), there have been no successful multithreaded machines because of the:
Extra cost Complexity of hardware Dearth of tools for extracting thread-level parallelism
Today, however, there are many solutions for these problems
The advanced semi-conductor technologies The advanced CAD tools for circuits design The advanced software tools for parallelism exploitation
