

The objective of this assignment is to apply your skills at dynamic logic design to build a fast full adder.

## **Readings:**

- Read DEC's Alpha 21164 Clocking Methodology
- Read Skew-Tolerant Domino Circuits
- Skim sections 5.5.8-5.5.15 of W&E (this is one of the weaker parts of W&E)

## **1.0 Clocking Overhead**

You are considering one of three clocking schemes for a static implementation of the FemptoHAL processor: flip-flops, transparent latches, or pulsed latches. You expect that the adder self-bypass loop (see the project description document) will be the most critical path.

Suppose that the maximum delay of the adder is 8 FO4 delays, the result mux is 2 FO4 delays, and the bypass mux is 2 FO4 delays. Suppose the system has clock skew of 2 FO4 delays.<sup>1</sup>

**a**) Suppose that a flip-flop has a setup time of 1.5 FO4 delay, a hold time of 0 FO4 delays, a maximum propagation delay of 1.5 FO4 delays, and a minimum propagation delay of 0.5 FO4 delays. What is the cycle time of the adder self-bypass path built using a flip-flop at the start of the cycle.

**b**) Suppose that a transparent latch has a setup time of 1 FO4 delay, a hold time of 0 FO4 delays, a maximum propagation delay of 1.5 FO4 delays, and a minimum propagation delay of 0.5 FO4 delays. What is the cycle time of the adder self-bypass path built using transparent latches in each half-cycle. You may assume that the logic can be broken into pieces which conveniently fit between each latch.

c) Suppose that a pulsed latch has the same delays as a transparent latch. Also suppose you use the shortest pulse necessary to hide all overhead. What is the cycle time of the adder

<sup>1.</sup> Self-bypass paths see less skew than many other paths because the path starts and ends at the same sequential element. Nevertheless, we'll conservatively budget the same skew to all elements on the chip.

self-bypass path built using a pulsed latch at the beginning of the cycle? How wide must your pulse be?

**d**) Using the result of Jobwork 3, compute the maximum operating frequency (in GHz) of each of the three designs (a-c). Which do you prefer? Why?

## 2.0 Min-Delay

If a cycle contains too little logic, a result may race through and violate the hold time of the clocked element at the end of the cycle.

Using the same delays from the previous question, find the minimum necessary contamination delay through logic in any cycle for:

a) flip-flop system

b) transparent latch system

c) pulsed latch system using the pulse width you chose in problem 1

## 3.0 Skew-tolerant Domino

Now you're ready to do a domino implementation of the FemptoHAL processor. You expect to achieve a cycle time of 10 FO4 delays! To minimize clocking overhead, you choose to use 4-phase skew-tolerant domino circuits with 50% duty cycle clocks. The hold time of your domino gates is 0 FO4 delays. HINT: some of these questions can be answered with equations in the skew-tolerant domino paper, while some don't match the assumptions of equations in the paper. Once you understand the concepts in the paper, you should be able to derive the correct equations.

**a**) If you budget a global skew of 2 FO4 delays between any points on the chip, how much time is available for borrowing into the next phase?

**b**) If you require that no two connected gates in a phase have a local skew of more than 1 FO4 delay between them, how much time is available for borrowing into the next phase?

**c)** Perhaps controlling the global skew to less than 2 FO4 delays in all parts of the chip will be too expensive? How much global skew can the chip tolerate before it may experience min-delay problems if the contamination delay through any phase is at least 0.5 FO4 delays?