In this research paper, we report an entirely different approach to design a scalable fast parallel counter with improved performance in terms of component and transistor counts. Subsequently the simulation tests are carried out for a wide range of input conditions to validate the design. The main advantages of this scalable counter include low power consumption in milliwatt (mw) range and have speed in the range of GHz. The proposed design is modular in nature indicating that it can easily be upgraded or applied for large counters easily. Repeated use of basic building blocks such as 3-bit synchronous parallel counter, simple D flip flop and 2-bit synchronous parallel counter with enable signal made the design of counter simpler and modular. The logic uses early overflow states enabling all the blocks in the architecture concurrently at the system clock. The pipelined structures together with early overflow based logic provide correct functioning of all building blocks without ripple effects. The design is implemented using Microwind, Digital Schematics (DSCH) and 0.12 µm technologies. Performance shows a total power consumption of 0.164 mw with a clock speed of 1GHz.
INTRODUCTION
Counters are used in almost all digital circuit and systems such as frequency synthesizer, measuring systems, analogue to digital converterss and a wide range of circuits used in communication systems [1] . Counters are also used as basic building blocks for more advanced digital logic circuits [2] , [3] . High speed parallel counters find numerous applications for arithmetic operations that include neural networks and triggering the nuclear instruments [3] , [4] . The key features required will vary greatly depending upon a particular application. In some cases we require counters with long counting width and high count frequency. It is highly desirable that counters must be designed in such a way that they are independent of counting width and yet in other cases, synchronous high speed parallel counters are very much in demand [5] . While designing fast counters, it is a big challenge that wide and fast counters will result in much increased chip area since speed and area increase simultaneously [6] . The desirable and important key features of counters implemented at VLSI scale include relatively constant counting time with increased counting speed, digital output and stable VLSI implementation [7] . Many journals have reported different counters with large counting width that have been designed and implemented by researchers [8] , [9] . Mostly these designs have used concept of enabling higher order blocks and ANDing of the overflow states of lower order blocks which have ultimately resulted in increased complexity of design. The designs have poorly performed on counter frequency requirements [1] , [2] . These limitations have been guiding force for a better an improved design as proposed in this paper.
FAST MODULAR COUNTER DESIGN
We present the main architecture of a modular and scalable counter with high speed and high clock frequency in this section. First, a representative block diagram is proposed and then the main architecture is proposed with analysis of its performance.
Block Diagram
The block diagram of scalable high speed parallel counter for 16-bit width is represented by figure 1. As shown in the figure 1, the counting path circuit consists of synchronous 3-bit parallel counter, pipelining structure consisting of simple Dflip flop and synchronous 2-bit parallel counter respectively in the sequence. indicates early overflow signals which are required by the counting path for proper working of the counter. A common clock and reset signals are connected to both paths which finally results in a synchronous working type of counter [8] .
Design of Architecture
The architecture of scalable high-speed parallel counter for 16-bit counter width is shown in figure 2 . The figure 2 clearly shows the counting path which consists of three different circuit blocks as indicated by labels of BLK1, BLK2 and BLK3. The BLK2 is acting as a pipelining structure between BLK1 and BLK3 and then subsequently between successive BLK3s. In the state look-ahead path, each states of BLK1 are selected and then subsequently pipelined. This is used for enabling the BLK3s and the level of pipelining is depending upon the BLK3's position in the counting path. In state lookahead path, the left most columns, consisting of inverters, 3-bit input AND Gates and BLK2s show the selection and pipelining of early overflow states of BLK1. And the successive columns in the state look-ahead path show the ANDing of the selected pipelined states with the overflow states of the BLK3s coming to the corresponding positions in the counting path. For enabling i th BLK3 in the counting path, we need to generate the enabling signal from state look-ahead path up to the (i-1) th BLK3.
A detailed circuit diagram of circuit block BLK1 is separately shown in figure 3 . It is a synchronous 3-bit parallel counter. From BLK1, outputs Q 0 & Q 1 are taken directly as the LSBs and an additional output EN1 is also generated from this circuit which is used as enabling signal subsequently. The EN1 signal is given as the input to the circuit block BLK2. All the three outputs of BLK1 are used for the selection of early overflow states in the state look-ahead path circuit. Depending upon the number of output bits of BLK1, the overall width of the counter can be increased to any number of bits.
Circuit diagram of circuit block BLK2 is shown in figure 4 . It is a simple D flip-flop and it acts as a delay element in the circuit. BLK2 is used in both the counting path and state lookahead path. In counting path the output of BLK2 is given as the block circuit enabling signal ENS to BLK3. Thus it can be seen that in counting path, BLK2 is pipelining the enable signals of BLK3s and in state look-ahead path, pipelining the early overflow states.
Circuit diagram of BLK3 is shown in figure 5 . It is a synchronous 2-bit parallel counter. BLK3 produces its outputs Q 0 & Q 1 , which forms the MSBs of the counter. Also an additional output EN3 is generated by ANDing it's outputs Q 0 and Q 1 with the YC signal, which is generated by the state look-ahead path. The signal EN3 is pipelined again by BLK2 and is given as the input ENS to the next block. The overflow states of each block are predetermined and the output states of BLK1 are decoded, in other words it means selecting a particular output state to enable a particular circuit block by the addition of appropriate number of delay elements. The state look-ahead path circuit selects different states of BLK1 and then introduces suitable delays or pipelines those states depending upon the order of BLK3s in the counting path. For this the enable supporting signals are required to be generated.
To enable the highest order blocks in the counter design, the repeated delaying and performing ANDing operations of the selected states of BLK1 are required. 
CIRCUIT OPERATION
The entire circuit operation of counting can be explained by using the concept of counter state equations. The state equations are explained in the following section. 
State Equations
Here the 3-bit outputs of BLK1 are represented by y 1-3 , y 1-2 , and y 1-1 . Even though 3-bits are available only last two LSBs are taken into account for the 16-bit counting operation. Next 6-bit counter state equations can be given as -
For an 8-bit counter state equations may be given by -
For 10-bit counter state equation are given as follows-
For a 12-bit or even higher bit counter, the state equations use count state y 4 
Similarly for 14-bit counter state equations are given by -
And finally for 16-bit counter state equations are given by -
Design of Clock
For proper working, path delays of counting path and state look-ahead path should be less than the clock period of the counter [1] . Here assuming that access time for both BLK1 and BLK3 are essentially equal. Let T clock be the clock period of the counter, T Block be the access time of block circuits BLK1 or BLK3, T 3-AND be the gate delay of 3-input AND gate, T s-h be the combined set-up time and hold time of D flipflop, then the following condition for T clock must be satisfied -
T clock > T Block + T 3-AND + T s-h
The clock period mainly depends on the block access time.
Hence, for any changes in the design like increasing the number of output bits of the blocks or using different D flipflops the clock period will change substantially. This conditions is very stringent and must be kept in mind while designing the circuits or block for fast counters [10] , [11] . In this research work, clock is designed while keeping various delays in mind. 
International Journal of Applied Information Systems (IJAIS) -ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA Volume 1-No. 9, April 2012 -www.ijais.org

SIMULATION RESULT ANALYSIS
The performance of the scalable high-speed parallel counter is verified by simulating the circuit with wide range of input conditions. The tools used for carrying out simulation tests are mainly based on software from Microwind and Digital Schematic (DSCH) [9] . Microwind is Electronic Design Automation (EDA) software which integrates very conveniently the front-end and back-end of the chip design principles of VLSI. It is possible to generate the layout of the circuit in Microwind either by using Verilog code files of circuit design or by directly entering schematic layout. The later needs cross checking of any violation of design rules. It is also possible to measure the area utilized by the circuit design by using Microwind software. During this research DSCH is mainly used for the design of circuit schematics and for generating timing waveform diagrams. 
Power Estimation
We simulated a 16-bit scalable parallel counter circuit which was designed using Microwind 3. 
Transistor Count and Area Estimation
The simulation results of a 16-bit counter for its transistor count are shown in Table 2 . The number of transistors used by each block and also by the logic gates and inverters are shown there in. It includes that of the circuits for the state look-ahead path. A total of 1018 transistors have been used in implementing the counter. Table 3 shows the total number of transistors for various bit sized counters i.e. counters of size varying from 8-bit to 16-bit and corresponding chip area requirements. We analyzed area utilization criterion for 0.12µm Technology. After analyzing the increase in transistor count with increase in counter bit size, we concluded that there is approximately 1.28 times of increase in transistor count and approximately 1.26 times increase in area for each 2 bit increment of counter bit size. Also it became evident from analysis that since the counter design consists of more number of D flip flops, we can further reduce chip area requirements by using advanced D flip flops with much less transistor count and thereby improving overall design performance. All other results are graphically summarized in figure 6 for quick visual interpretation.
Timing Waveform Diagram Analysis
The timing waveform diagram of simulation tests of the scalable high-speed parallel counter is shown in the figure 7.
It has been clearly observed that there are fluctuations in timing waveforms of the counter at various levels of the signals. From the figure 7 it can also be clearly observed that the least significant bits are subjected to more fluctuations compared to the most significant bits.
The LSB Y 1 has the frequency of CLK/2. The Y 2 has the frequency of CLK/4 and so on. Thus in the counter itself we can find signals with different frequencies which are below that of the clock frequency. As it can be seen in figure 6 , the counting state at the beginning of the counter is set to an arbitrary example value of [0 0 0 0 0 1 1 1 0 1 1 1 1 0 1 0] . The subsequent counting can be easily verified from the waveforms. We simulated the 16-bit circuit with constant clock frequency of 1GHz for various technologies. 
CONCLUSTION
In this paper we have reported the concluding analysis and results of our simulated design and verifications on a scalable high-speed parallel counter. We have subsequently tested and verified design starting from 8-bit to 16-bit count size. The special features of this counter design which have emerged out of testing and verification process include the modularity and pipelining structure. One can implement counter of any larger numbered bit size without needing much complexity and design efforts due to the nature of the modularity of design. To achieve this, we only need to have added BLK3 and BLK2 in appropriate numbers at appropriate positions in order to design a desired larger bit width counter. Positioning of BLK2 (D flip-flops) has much more bearing in the counter design and subsequently its performance. The introduction of BLK2 has resulted in exclusion of AND gates with large fanin for the enabling of higher order count bits. This is clearly observed as an advantage. After attaining the maximum count of a given design, if it is required to further increase the counter width, it can be done simply by increasing the counter width of BLK1. Since all the circuit blocks except BLK1, of counting path are preceded by BLK2 (D flip flop), therefore, all the circuit blocks will be getting enabled with constant delay. Hence there is no mismatch of delays. Since the counter is having binary output, there is no need of any detector circuits at output. The result analysis shows some of the important findings in terms of transistor count and chip area requirements. The transistor counts have shown very comparable results available for literature review. Some rough patterns have emerging for the possible relationship between transistor count, power consumption and technology size. These findings may result in very concrete physical laws and principles if verified with higher precision and accuracy involved with the software tools used for implementation. The future work will involve further refining the results and improved designs of more complex circuits and systems.
ACKNOWLEDGEMENT
