# **Logical Effort**\*:

#### **Designing for Speed on the Back of an Envelope**

#### **David Harris**

David Harris@hmc.edu

#### Harvey Mudd College Claremont, CA

# Introduction Chip designers face a bewildering array of choices. ☐ What is the best circuit topology for a function? ☐ How large should the transistors be? How many stages of logic give least delay? Logical Effort is a method of answering these questions: Uses a very simple model of delay ☐ Back of the envelope calculations and tractable optimization ☐ Gives new names to old ideas to emphasize remarkable symmetries Who cares about logical effort? ☐ Circuit designers waste too much time simulating and tweaking circuits High speed logic designers need to know where time is going in their logic CAD engineers need to understand circuits to build better tools **Logical Effort David Harris** Page 3 of 38

| Outline                                                                                                      |              |              |  |
|--------------------------------------------------------------------------------------------------------------|--------------|--------------|--|
| ☐ Introduction ☐ Delay in a Logic Gate ☐ Multi-stage Logic Netwo ☐ Choosing the Best Num ☐ Example ☐ Summary |              |              |  |
| Logical Effort                                                                                               | David Harris | Page 2 of 38 |  |



<sup>\*</sup> Based on a book by Ivan Sutherland, Bob Sproull, and David Harris published by Morgan Kauffman, 1999

#### **Outline**

☐ Introduction

Delay in a Logic Gate

☐ Multi-stage Logic Networks

☐ Choosing the Best Number of Stages

Example

■ Summary

Logical Effort David Harris Page 5 of 38



# **Delay in a Logic Gate**

Let us express delays in a process-independent unit:

$$d = \frac{d_{abs}}{\tau}$$

 $\tau \approx 20 \, \text{ps}$ in 0.25  $\mu \text{m}$ technology

Delay of logic gate has two components:

effort delay, a.k.a. stage effort f = f + p parasitic delay

Effort delay again has two components:



electrical effort is sometimes called "fanout"

Page 6 of 38

Page 8 of 38

- Logical effort describes relative ability of gate topology to deliver current (defined to be 1 for an inverter)
- Electrical effort is the ratio of output to input capacitance

Logical Effort David Harris



**Computing Logical Effort** 



Logical Effort David Harris

#### **A Catalog of Gates**

Table 1: Logical effort of static CMOS gates

| 0.1.1       | Number of inputs |     |     |     |      |          |
|-------------|------------------|-----|-----|-----|------|----------|
| Gate type   | 1                | 2   | 3   | 4   | 5    | n        |
| inverter    | 1                |     |     |     |      |          |
| NAND        |                  | 4/3 | 5/3 | 6/3 | 7/3  | (n+2)/3  |
| NOR         |                  | 5/3 | 7/3 | 9/3 | 11/3 | (2n+1)/3 |
| multiplexer |                  | 2   | 2   | 2   | 2    | 2        |
| XOR, XNOR   |                  | 4   | 12  | 32  |      |          |

#### Table 2: Parasitic delay of static CMOS gates

| Gate type         | Parasitic delay    |  |  |
|-------------------|--------------------|--|--|
| inverter          | p <sub>inv</sub>   |  |  |
| n-input NAND      | np <sub>inv</sub>  |  |  |
| n-input NOR       | np <sub>inv</sub>  |  |  |
| n-way multiplexer | 2np <sub>inv</sub> |  |  |
| 2-input XOR, XNOR | 4np <sub>inv</sub> |  |  |

p<sub>inv</sub> ≈ 1parasitic delays depend on diffusion capacitance

Logical Effort David Harris Page 9 of 38

# Example

Estimate the delay of a fanout-of-4 (FO4) inverter:



Logical Effort: g = Electrical Effort: h = Parasitic Delay: p = Stage Delay: d =

Logical Effort David Harris Page 11 of 38

#### **Example**

Estimate the frequency of an *N*-stage ring oscillator:

g =



Electrical Effort: h = Parasitic Delay: p = Stage Delay: d = Oscillator Frequency: F = Oscillator Frequency

Logical Effort:

Logical Effort David Harris Page 10 of 38

#### Outline

☐ Introduction

Delay in a Logic Gate

Multi-stage Logic Networks

☐ Choosing the Best Number of Stages

Example

☐ Summary

Logical Effort David Harris

Page 12 of 38

# **Multi-stage Logic Networks**

Logical effort extends to multi-stage networks:



$$g_1 = 1$$
  $g_2 = 5/3$   $g_3 = 4/3$   $g_4 = 1$   
 $h_1 = x/10$   $h_2 = y/x$   $h_3 = z/y$   $h_4 = 20/2$ 

- ☐ Path Logical Effort:
- Don't define
- $H = \frac{C_{out \, (path)}}{C_{in \, (path)}}$   $H = \prod_i h_i$  because we don't know  $h_i$  until the design is done

Can we write F = GH?

☐ Path Effort:

■ Path Electrical Effort:

**Logical Effort David Harris** Page 13 of 38

## **Delay in Multi-stage Networks**

We can now compute the delay of a multi-stage network:

- $D_F = \sum_i f_i$ ☐ Path Effort Delay:
- $P = \sum p_i$ ☐ Path Parasitic Delay:
- $D = \sum d_i = D_F + P$ ☐ Path Delay:

We can prove that delay is minimized when each stage bears the same effort:

$$\hat{f} = g_i h_i = F^{1/N}$$

Therefore, the minimum delay of an N-stage path is:

This is a key result of logical effort. Lowest possible path delay can be found without even calculating the sizes of each gate in the path.

Page 15 of 38 **Logical Effort David Harris** 

## **Branching Effort**

No! Consider circuits that branch:



**Logical Effort** David Harris Page 14 of 38

#### **Determining Gate Sizes**

Gate sizes can be found by starting at the end of the path and working backward.

At each gate, apply the capacitance transformation:

$$C_{in_i} = \frac{C_{out_i} \cdot g_i}{\hat{f}}$$

☐ Check your work by verifying that the input capacitance specification is satisfied at the beginning of the path.

**David Harris Logical Effort** Page 16 of 38

# **Example**

Select gate sizes y and z to minimize delay from A to B

Logical Effort:

Electrical Effort:

H =

Branching Effort:

Path Effort:

Best Stage Effort:

Delay:

D =

Work backward for sizes:

z =

*y* =

**Logical Effort** 

**David Harris** 

Page 17 of 38

#### **Outline**

- ☐ Introduction
- Delay in a Logic Gate
- Multi-stage Logic Networks
- **Choosing the Best Number of Stages**
- Example
- Summary

**Logical Effort** David Harris Page 18 of 38

## **Choosing the Best Number of Stages**

How many stages should a path use?

- ☐ Delay is not always minimized by using as few stages as possible
- ☐ Example: How to drive 64 bit datapath with unit-sized inverter

Initial driver Datapath load 2.8

 $D = NF^{1/N} + P = N(64)^{1/N} + N$  assuming polarity doesn't matter

**Logical Effort David Harris** Page 19 of 38

#### **Derivation of the Best Number of Stages**

Suppose we can add inverters to the end of a path without changing its function.

 $\Box$  How many stages should we use? Let  $\hat{N}$  be the value of N for least delay.

Logic Block: 
$$n_1$$
 stages Path effort  $F$ 

N- $n_1$  extra inverters

$$D = NF^{1/N} + \sum_{1}^{n_1} p_i + (N - n_1) p_{inv}$$

$$\frac{\partial D}{\partial N} = -F^{1/N} ln \left( F^{1/N} \right) + F^{1/N} + \rho_{inv} = 0$$

Define  $\rho = F^{1/\hat{N}}$  to be the best stage effort. Substitute and simplify:

$$p_{inv} + \rho (1 - ln\rho) = 0$$

**Logical Effort David Harris** Page 20 of 38

# **Best Number of Stages** (continued) $p_{inv} + \rho (1 - ln\rho) = 0$ has no closed form solution. Neglecting parasitics (i.e. $p_{inv} = 0$ ), we get the familiar result that $\rho = 2.718$ (e) $\Box$ For $p_{inv} = 1$ , we can solve numerically to obtain $\rho = 3.59$ How sensitive is the delay to using exactly the best number of stages? $\frac{\hat{D}(N)}{\hat{D}(\hat{N})}$ I like to use $\rho = 4$ 1.51 1.26 Better use too many stages than too few. 0.25 $\square$ 2.4 < $\rho$ < 6 gives delays within 15% of optimal -> we can be sloppy Logical Effort **David Harris** Page 21 of 38

# Outline Introduction Delay in a Logic Gate Multi-stage Logic Networks Choosing the Best Number of Stages Example Summary Logical Effort David Harris Page 22 of 38

#### **Example** Let's revisit Ben Bitdiddle's decoder problem using logical effort: 32 bits $a < 3:0 > \overline{a} < 3:0 >$ 4:16 Decoder 16 words Decoder specification: Register File 16 ☐ 16 word register file ☐ Each word is 32 bits wide ☐ Each bit presents a load of 3 unit-sized transistors $\Box$ True and complementary inputs of address bits *a*<3:0> are available ☐ Each input may drive 10 unit-sized transistors Ben needs to decide: ☐ How many stages to use? ☐ How large should each gate be? ☐ How fast can the decoder operate? **David Harris** Page 23 of 38 **Logical Effort**

| Example: Number of Stages |                                  |                   |
|---------------------------|----------------------------------|-------------------|
| low many stages should    | d Ben use?                       |                   |
| Effort of decoders is     | dominated by electrical and b    | ranching portions |
| Electrical Effort:        | H =                              |                   |
| Branching Effort:         | <b>B</b> =                       |                   |
| we neglect logical effor  | rt (assume G = 1),               |                   |
| Path Effort:              | <b>F</b> =                       |                   |
| emember that the best     | stage effort is about $\rho = 4$ |                   |
|                           | where of stages is: $N = 1$      |                   |
| Herice, the best hun      | Tibel of Stages is. IV =         |                   |
|                           |                                  |                   |
|                           |                                  |                   |
|                           |                                  |                   |
|                           |                                  |                   |
|                           |                                  |                   |

# **Example: Gate Sizes & Delay**

Lets try a 3-stage design using 16 4-input NAND gates with G =



- $\Box$  Actual path effort is: F =
- $\Box$  Therefore, stage effort should be: f =
- $\Box$  Gate sizes: z = y =
- ☐ Path delay:

Logical Effort David Harris Page 25 of 38

#### **Outline**

D =

- ☐ Introduction
- ☐ Delay in a Logic Gate
- ☐ Multi-stage Logic Networks
- ☐ Choosing the Best Number of Stages
- ☐ Example
- Summary

Logical Effort David Harris Page 27 of 38

#### **Example: Alternative Decoders**

**Table 3: Comparison of Decoder Designs** 

| Design                                     | Stages | G    | Р  | D    |
|--------------------------------------------|--------|------|----|------|
| NAND4; INV                                 | 2      | 2    | 5  | 29.8 |
| INV; NAND4; INV                            | 3      | 2    | 6  | 22.1 |
| INV; NAND4; INV; INV                       | 4      | 2    | 7  | 21.1 |
| NAND2; INV; NAND2; INV                     | 4      | 16/9 | 6  | 19.7 |
| INV; NAND2; INV; NAND2; INV                | 5      | 16/9 | 7  | 20.4 |
| NAND2; INV; NAND2; INV; INV; INV           | 6      | 16/9 | 8  | 21.6 |
| INV; NAND2; INV; NAND2; INV; INV; INV      | 7      | 16/9 | 9  | 23.1 |
| NAND2; INV; NAND2; INV; INV; INV; INV; INV | 8      | 16/9 | 10 | 24.8 |

We underestimated the best number of stages by neglecting the logical effort.

- ☐ Logical effort facilitates comparing different designs before selecting sizes
- ☐ Using more stages also reduces G and P by using multiple 2-input gates
- Our design was about 10% slower than the best

Logical Effort David Harris Page 26 of 38

#### **Summary**

Table 4: Key Definitions of Logical Effort

| Term              | Stage expression             | Path expression                            |
|-------------------|------------------------------|--------------------------------------------|
| Logical effort    | g (seeTable 1)               | $G = \prod g_i$                            |
| Electrical effort | $h = \frac{C_{out}}{C_{in}}$ | $H = \frac{C_{out (path)}}{C_{in (path)}}$ |
| Branching effort  | n/a                          | $B = \prod b_i$                            |
| Effort            | f = gh                       | F = GBH                                    |
| Effort delay      | f                            | $D_F = \sum f_i$                           |
| Number of stages  | 1                            | N                                          |
| Parasitic delay   | p (seeTable 2)               | $P = \sum p_i$                             |
| Delay             | d = f + p                    | $D = D_F + P$                              |

Logical Effort David Harris Page 28 of 38

#### **Method of Logical Effort**

Logical effort helps you find the best number of stages, the best size of each gate, and the minimum delay of a circuit with the following procedure:

- $\Box$  Compute the path effort: F = GBH
- ☐ Estimate the best number of stages:  $\hat{N} \approx log_{A}F$ 
  - Estimate the minimum delay:  $D = \hat{N} F^{1/\hat{N}} + P$
- ☐ Sketch your path using the number of stages computed above
- ☐ Starting at the end, work backward to find transistor sizes:

$$C_{in_i} = \frac{C_{out_i} \cdot g_i}{\hat{f}}$$

Logical Effort David Harris Page 29 of 38

#### Conclusion

Logical effort is a useful concept for thinking about delay in circuits:

- ☐ Facilitates comparison of different circuit topologies
- ☐ Easily select gate sizes for minimum delay
- ☐ Circuits are fastest when effort delays of each stage are equal and about 4
- ☐ Path delay is insensitive to modest deviations from optimal sizes

Some further results from logical effort include:

- Logical effort can be applied to domino, pass gate, and other logic families
- Logic gates can be skewed to favor one input or edge at the cost of another
- ☐ While the logical effort of a multiplexer is independent of the number of inputs, parasitic delay increases with size, so 4-way multiplexers are best
- ☐ Circuits that fork should equalize delays between legs of the fork

A book on Logical Effort will be available in Feb. 1999 from Morgan Kaufmann

http://www.mkp.com/Logical Effort

Logical Effort David Harris Page 31 of 38

#### **Limitations of Logical Effort**

Logical effort is not a panacea. Some limitations include:

#### ☐ Chicken & egg problem

how to estimate G and best number of stages before the path is designed

#### ☐ Simplistic delay model

neglects effects of input slopes

#### ☐ Interconnect

iteration required in designs with branching and non-negligible wire C or RC same convergence difficulties as in synthesis / placement problem

#### ☐ Maximum speed only

optimizes circuits for speed, not area or power under a fixed speed constraint

Logical Effort David Harris Page 30 of 38

#### **Delay Plots**



- $\Box$  d = f + p = gh + p
- Delay increases with electrical effort
- ☐ More complex gates have greater logical effort and parasitic delay

Logical Effort David Harris Page 32 of 38

#### **Example**

Estimate the frequency of an *N*-stage ring oscillator:



Logical Effort:  $g \equiv 1$ 

 $h = \frac{C_{out}}{C_{in}} = 1$ Electrical Effort:

 $p = p_{inv} \approx 1$ Parasitic Delay:

d = gh + p = 2Stage Delay:

Oscillator Frequency:  $F = \frac{1}{2Nd_{abs}} = \frac{1}{4N\tau}$ 

A 31 stage ring oscillator in a 0.25 µm process oscillates at about 400 MHz.

Logical Effort **David Harris** Page 33 of 38

# **Branching Effort**

No! Consider circuits that branch:



Introduce new kind of effort to account for branching within a network:

 $b = \frac{C_{on path} + C_{off path}}{C_{on path}}$ ■ Branching Effort:

 $B = \prod b_i$ ☐ Path Branching Effort:

Note:

Now we can compute the path effort:

 $\prod h_i = BH \neq H$ in circuits that branch

■ Path Effort: F = GBH

**Logical Effort David Harris** Page 35 of 38

#### **Example**

Estimate the delay of a fanout-of-4 (FO4) inverter:



Logical Effort:

 $h = \frac{C_{out}}{C_{in}} = 4$ Electrical Effort:

 $p = p_{inv} \approx 1$ Parasitic Delay:

d = gh + p = 5Stage Delay:

The FO4 inverter delay is a useful metric to characterize process performance.

1 FO4 delay =  $5\tau$ 

This is about 100 ps in a 0.25 µm process.

**Logical Effort** David Harris Page 34 of 38

#### **Example**

Select gate sizes y and z to minimize delay from A to B

 $G = (4/3)^3$ Logical Effort:

 $H = \frac{C_{out}}{C_{in}} = 4.5$ Electrical Effort:

 $B=2\bullet 3=6$ Branching Effort: F = GHB = 64

 $\hat{f} = F^{1/3} = 4$ Best Stage Effort:

Path Effort:

Work backward for sizes:

 $z = \frac{4.5C \cdot (4/3)}{4} = 1.5C$ 

 $D = 3 \cdot 4 + 3 \cdot 2 = 18$   $y = \frac{3z \cdot (4/3)}{4} = 1.5C$ Delay:

**Logical Effort David Harris** Page 36 of 38

# **Example: Number of Stages**

How many stages should Ben use?

☐ Effort of decoders is dominated by electrical and branching portions

 $\Box \quad \text{Electrical Effort:} \qquad H = \frac{32 \bullet 3}{10} = 9.6$ 

 $\Box$  Branching Effort: B = 8 because each address input

controls half the outputs

If we neglect logical effort,

Remember that the best stage effort is about  $\rho = 4$ 

 $\Box$  Hence, the best number of stages is:  $N = log_4 76.8 = 3.1$ 

Let's try a 3-stage design

Logical Effort David Harris Page 37 of 38

#### **Example: Gate Sizes & Delay**

Lets try a 3-stage design using 16 4-input NAND gates with  $G = 1 \cdot 2 \cdot 1 = 2$ 



Actual path effort is:

$$F = 2 \cdot 8 \cdot 9.6 = 154$$
 Close to

Therefore, stage effort should be:  $f = (154)^{1/3} = 5.36$  4, so f is reasonable

$$\Box$$
  $z = 96 \cdot 1/5.36 = 18$   $y = 18 \cdot 2/5.36 = 6.7$ 

$$\Box$$
  $D = 3f + P = 3 \cdot 5.36 + 1 + 4 + 1 = 22.1$ 

Logical Effort David Harris Page 38 of 38