

NEIL H. E. WESTE DAVID MONEY HARRIS

FOURTH EDITION

CMOS

# Lecture 13: Datapath Functional Units

### Outline

- Comparators
- Shifters
- Multi-input Adders
- Multipliers

### Comparators

- **D** 0's detector: A = 00...000
- □ 1's detector: A = 11...111
- $\Box Equality comparator: A = B$
- □ Magnitude comparator: A < B

# 1's & 0's Detectors

- □ 1's detector: N-input AND gate
- □ 0's detector: NOTs + 1's detector (N-input NOR)



# **Equality Comparator**

- □ Check if each bit is equal (XNOR, aka equality gate)
- □ 1's detect on bitwise equality



**18: Datapath Functional Units** 

# **Magnitude Comparator**

- Compute B A and look at sign
- □ B A = B + ~A + 1
- For unsigned numbers, carry out is sign bit



**18: Datapath Functional Units** 

# Signed vs. Unsigned

#### □ For signed numbers, comparison is harder

- C: carry out
- Z: zero (all bits of A B are 0)
- N: negative (MSB of result)
- V: overflow (inputs had different signs, output sign  $\neq$  B)
- S: N xor V (sign of result)

| Relation   | Unsigned Comparison    | Signed Comparison                 |
|------------|------------------------|-----------------------------------|
| A = B      | Ζ                      | Ζ                                 |
| $A \neq B$ | Z                      | Z                                 |
| A < B      | $C \cdot \overline{Z}$ | $\overline{S} \cdot \overline{Z}$ |
| A > B      | $\overline{C}$         | S                                 |
| $A \leq B$ | C                      | $\overline{S}$                    |
| $A \ge B$  | $\overline{C} + Z$     | S + Z                             |

**18: Datapath Functional Units** 

### Shifters

#### □ Logical Shift:

- Shifts number left or right and fills with 0's
  - 1011 LSR 1 = 1011 LSL1 =
- □ Arithmetic Shift:
  - Shifts number left or right. Rt shift sign extends
    - 1011 ASR1 = 1011 ASL1 =
- Rotate:
  - Shifts number left or right and fills with lost bits
    - 1011 ROR1 = 1011 ROL1 =

### **Funnel Shifter**

- A funnel shifter can do all six types of shifts
- Selects N-bit field Y from 2N–1-bit input
  - Shift by k bits  $(0 \le k \le N)$
  - Logically involves N N:1 multiplexers



### **Funnel Source Generator**

| Shift Type              | $Z_{2N-2:N}$       | <i>Z<sub>N - 1</sub></i> | Z <sub>N-2:0</sub> | Offset |
|-------------------------|--------------------|--------------------------|--------------------|--------|
| Rotate Right            | $A_{N-2:0}$        | $A_{N-1}$                | $A_{N-2:0}$        | k      |
| Logical Right           | 0                  | $A_{N-1}$                | $A_{N-2:0}$        | k      |
| Arithmetic Right        | sign               | $A_{N-1}$                | $A_{N-2:0}$        | k      |
| Rotate Left             | A <sub>N-1:1</sub> | $A_0$                    | $A_{N-1:1}$        | k      |
| Logical/Arithmetic Left | A <sub>N-1:1</sub> | $A_0$                    | 0                  | k      |

# **Array Funnel Shifter**

#### N N-input multiplexers

- Use 1-of-N hot select signals for shift amount
- nMOS pass transistor design (V<sub>t</sub> drops!)



**18: Datapath Functional Units** 

CMOS VLSI Design 4th Ed.

Ζ

S

凤

# **Logarithmic Funnel Shifter**

□ Log N stages of 2-input muxes

No select decoding needed



#### **32-bit Logarithmic Funnel** Wider multiplexers reduce delay and power Operands > 32 bits introduce datapath irregularity 32 Source 31 0 arith Generator Bits Source Generator shif 0-31left Z. 63 Source 31 Generator Bits 32-62 Mux 4:1 Decod 39 k 38 0 mux4 5 8:1 D 8 31 mux8 32 **18: Datapath Functional Units** CMOS VLSI Design 4th Ed. 13

#### **Barrel Shifter**

- Barrel shifters perform right rotations using wraparound wires.
- □ Left rotations are right rotations by N k = k + 1 bits.
- □ Shifts are rotations with the end bits masked off.

### **Logarithmic Barrel Shifter**





**18: Datapath Functional Units** 

CMOS VLSI Design <sup>4th Ed.</sup>



# **Multi-input Adders**

□ Suppose we want to add k N-bit words

- Ex: 0001 + 0111 + 1101 + 0010 = 10111

#### □ Straightforward solution: k-1 N-input CPAs

Large and slow



# **Carry Save Addition**

□ A full adder sums 3 inputs and produces 2 outputs

- Carry output has twice weight of sum output

- □ N full adders in parallel are called *carry save adder* 
  - Produce N sums and N carry outs



**18: Datapath Functional Units** 

CMOS VLSI Design <sup>4th Ed.</sup>

# **CSA Application**

#### □ Use k-2 stages of CSAs

- Keep result in carry-save redundant form
- Final CPA computes actual result



**18: Datapath Functional Units** 



- Produce N M-bit partial products
- Sum these to produce M+N-bit product

#### **General Form**

Multiplicand:  $Y = (y_{M-1}, y_{M-2}, ..., y_1, y_0)$ Multiplier:  $X = (x_{N-1}, x_{N-2}, ..., x_1, x_0)$ 

**D** Product: 
$$P = \left(\sum_{j=0}^{M-1} y_j 2^j\right) \left(\sum_{i=0}^{N-1} x_i 2^i\right) = \sum_{i=0}^{N-1} \sum_{j=0}^{M-1} x_i y_j 2^{i+j}$$

multiplicand У<sub>5</sub>  $y_4 \quad y_3 \quad y_2 \quad y_1$ y<sub>0</sub> multiplier X<sub>4</sub> X<sub>3</sub> Х<sub>5</sub> х<sub>2</sub> **x**<sub>1</sub> X<sub>0</sub>  $x_0y_5 x_0y_4 x_0y_3 x_0y_2 x_0y_1$ x<sub>0</sub>y<sub>0</sub>  $x_1y_5 \quad x_1y_4 \quad x_1y_3 \quad x_1y_2 \quad x_1y_1 \quad x_1y_0$ partial  $x_2y_5 x_2y_4 x_2y_3 x_2y_2 x_2y_1 x_2y_0$ products  $x_{3}y_{5}$   $x_{3}y_{4}$   $x_{3}y_{3}$   $x_{3}y_{2}$   $x_{3}y_{1}$   $x_{3}y_{0}$  $\mathbf{x}_4 \mathbf{y}_5 \quad \mathbf{x}_4 \mathbf{y}_4$  $x_4y_3 \quad x_4y_2 \quad x_4y_1 \quad x_4y_0$  $x_5y_3$ x<sub>5</sub>y<sub>2</sub>  $X_5 y_5$ x<sub>5</sub>y<sub>4</sub>  $\mathbf{x}_5 \mathbf{y}_1 \quad \mathbf{x}_5 \mathbf{y}_0$ product **p**<sub>10</sub> p<sub>8</sub> р<sub>7</sub>  $p_5$  $p_4$  $p_3$  $\mathbf{p}_2$  $p_1$  $\mathbf{p}_0$ p<sub>11</sub> p<sub>9</sub>  $\mathsf{p}_6$ 

#### **18: Datapath Functional Units**

CMOS VLSI Design <sup>4th Ed.</sup>

# **Dot Diagram**

#### Each dot represents a bit



### **Array Multiplier**



**18: Datapath Functional Units** 

CMOS VLSI Design <sup>4th Ed.</sup>

# **Rectangular Array**

#### □ Squash array to fit rectangular floorplan



**18: Datapath Functional Units** 

# **Fewer Partial Products**

- Array multiplier requires N partial products
- If we looked at groups of r bits, we could form N/r partial products.
  - Faster and smaller?
  - Called radix-2<sup>r</sup> encoding
- □ Ex: r = 2: look at pairs of bits
  - Form partial products of 0, Y, 2Y, 3Y
  - − First three are easy, but 3Y requires adder ⊗

# **Booth Encoding**

- Instead of 3Y, try –Y, then increment next partial product to add 4Y
- □ Similarly, for 2Y, try –2Y + 4Y in next partial product

| Inputs            |          | Partial Product   | Booth Selects |            |                   |         |
|-------------------|----------|-------------------|---------------|------------|-------------------|---------|
| x <sub>2i+1</sub> | $x_{2i}$ | <sup>x</sup> 2i−1 | $PP_i$        | $SINGLE_i$ | $\text{DOUBLE}_i$ | $NEG_i$ |
| 0                 | 0        | 0                 | 0             | 0          | 0                 | 0       |
| 0                 | 0        | 1                 | Y             | 1          | 0                 | 0       |
| 0                 | 1        | 0                 |               |            |                   |         |
| 0                 | 1        | 1                 |               |            |                   |         |
| 1                 | 0        | 0                 |               |            |                   |         |
| 1                 | 0        | 1                 |               |            |                   |         |
| 1                 | 1        | 0                 |               |            |                   |         |
| 1                 | 1        | 1                 |               |            |                   |         |

18: Datapath Functional Units

### **Booth Hardware**

- □ Booth encoder generates control lines for each PP
  - Booth selectors choose PP bits



27

# **Sign Extension**

Partial products can be negative

- Require sign extension, which is cumbersome
- High fanout on most significant bit



# **Simplified Sign Ext.**

□ Sign bits are either all 0's or all 1's

- Note that all 0's is all 1's + 1 in proper column
- Use this to reduce loading on MSB



**18: Datapath Functional Units** 

# **Even Simpler Sign Ext.**

#### □ No need to add all the 1's in hardware

- Precompute the answer!



# **Advanced Multiplication**

- □ Signed vs. unsigned inputs
- Higher radix Booth encoding
- Array vs. tree CSA networks