IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

# Sequential Element Timing Parameter Definition Considering Clock Uncertainty

## David Money Harris

Abstract—When the conventional method of defining sequential element timing parameters is used in conjunction with the conventional method of accounting for clock uncertainty in timing analysis, the results are overly pessimistic because, when clock uncertainty is nonzero, the element can never be simultaneously critical for both setup time and clock-to-Q. This brief shows that the actual sequencing overhead of conventional flip-flops is 0.5–1 fanout-of-4 (FO4) inverter delay shorter than conventional models predict. High-performance flip-flops, with a modest transparency window, can be 2 FO4 delays faster. While the exact overhead becomes a function of the clock uncertainty, for typical uncertainties, the timing parameters are well-approximated using minimum setup and clock-to-Q values.

Index Terms-Clock skew, flip-flops, timing analysis.

#### I. INTRODUCTION

Synchronous sequential digital circuits are built from combinational logic separated by clocked sequencing elements, such as flip-flops, transparent latches, or pulsed latches. Static timing analysis estimates the maximum clock frequency based on the timing parameters of the combinational logic and sequencing elements. The relevant timing parameters of a flip-flop are its setup time tsetup and clock-to-Q delay  $t_{cq}$ . These parameters are conventionally defined to minimize their sum,  $t_{dq} = t_{setup} + t_{cq}$ . Unfortunately, this definition has two weaknesses. First, in systems with typical amounts of clock uncertainty, the definition significantly overestimates sequencing overhead. Second, the definition does not uniquely characterize the behavior of transparent latches, pulsed latches, and flavors of flip-flops with soft edges and is particularly pessimistic for such elements. This brief explains the limitations of conventional timing parameter definitions and suggests choosing their minimum values individually to better correlate with the actual behavior of digital circuits.

#### **II. SEQUENTIAL ELEMENT TIMING PARAMETERS**

Fig. 1 shows the maximum delay timing constraints on a path between two flip-flops. Each flip-flop receives a clock with uncertainty ( $t_{skew}$ ) in its arrival time. This uncertainty accounts for temporal and spatial variations unknown at design time, including phase-locked loop jitter, cycle-to-cycle and spatial voltage variations in the clock distribution network, process variation in the clock distribution network, process variation in the clock distribution network, and modeling inaccuracies. In a worst case scenario, clk1 arrives at flip-flop F1 late. The flip-flop output settles to its new value after  $t_{cq}$ . The logic settles to its new value after  $t_{logic}$ . It must stabilize  $t_{setup}$  before the next clock edge at F2, which, in the worst case, arrives early.

Therefore, the clock cycle time  $T_c$  must be at least [1]

$$T_c \ge t_{\rm cq} + t_{\rm logic} + t_{\rm setup} + t_{\rm skew} \tag{1}$$

Manuscript received December 2, 2013; revised September 1, 2014; accepted October 21, 2014.

The author is with Broadcom Corporation, Irvine, CA 92617 USA (e-mail: daharris@broadcom.com).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2014.2364991



Fig. 1. Maximum delay timing constraints.



Fig. 2. Delay versus rising input arrival time for an ordinary flip-flop.

The terms unrelated to the combinational logic can be grouped as  $t_{\text{overhead}} = t_{\text{cq}} + t_{\text{steup}} + t_{\text{skew}}$ .

Variables with capital subscripts are used to describe the curve of delay versus input arrival time. The clock-to-Q delay ( $t_{CQ}$ ) of a flip-flop depends on the time that the data input settles before the rising edge of the clock ( $t_{DC}$ ). Fig. 2 shows this relationship for a typical flip-flop in units of fanout-of-4 (FO4) inverter delays. If the data arrives early enough (large  $t_{DC}$ ), the clock-to-Q delay reaches a constant minimum value  $t_{cqmin}$ . As the data arrives closer to the clock edge, the clock-to-Q delay begins to increase and eventually reaches an asymptote, where the flip-flop fails to capture the data correctly at  $t_{DC} = t_{setupmin}$ . The D-to-Q delay ( $t_{DQ} = t_{DC} + t_{CQ}$ ) is the total time from when D settles until Q settles.  $t_{DQ}$  has a slope of -1 for large  $t_{DC}$  and reaches a minimum at the point where  $t_{CQ}$  has a slope of -1. Flip-flop timing parameters,  $t_{setup}$  and  $t_{cq}$ , are conventionally characterized at this point of minimum  $t_{DO}$  [2].

Salman *et al.* [3] and Khang and Lee [4] showed that setup time, hold time, and clock-to-Q delay are actually interdependent and that this relationship can be exploited to reduce pessimism. This brief takes advantage of the interdependence of setup time, clock-to-Q delay, and clock skew to further reduce pessimism.

1063-8210 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS



Fig. 3. Setup and clock-to-Q are never simultaneously critical.



Fig. 4. Delay versus rising input arrival time for a HPST flip-flop.

When clock uncertainty is introduced, it is physically impossible for a flip-flop to simultaneously be critical on both the setup and clock-to-Q delay. Fig. 3 shows the timing for a particular flop. When the clock is early, the setup time is critical but the clock-to-Q delay may push out by up to  $t_{skew}$ . Likewise, when the clock is late, the clock-to-Q delay is critical but the setup time may increase by up to  $t_{skew}$ .

Skew-tolerant (ST) flip-flops have a transparency window during which data can flow unimpeded from D-to-Q [5]. Examples of ST flip-flops include pulsed latches and flip-flops with overlapping master and slave clocks. Fig. 4 shows a representative delay for a a high-performance ST flip-flop. Again, if the data arrives early enough before the flip-flop becomes transparent, the clock-to-Q delay reaches a constant minimum value. As the data arrives later and enters the transparency window, the data propagates through the flip-flop with a constant D-to-Q delay and an increasing clock-to-Q delay. Finally, if the data arrives too late, the delay increases rapidly until the flip-flop fails to capture the data. In a ST flip-flop,  $t_{DQ}$  does not exhibit a unique minimum and the conventional characterization is ill-defined. However,  $t_{cqmin}$  and  $t_{setupmin}$  are still well-defined.

# **III. TEST CIRCUITS**

Fig. 5 shows three scannable flip-flops under study. The conventional flip-flop [Fig. 5(a)] uses a transmission gate multiplexer to choose between the data input (D) and the scan input. A conventional flip-flop has a master-slave pair of clocked transmission gates operated by complementary clocks and tristate feedback to sustain the state nodes. Several designers have pointed out the



Fig. 5. Scannable flip-flops. (a) Conventional. (b) ST. (c) HPST.

benefits of overlapping the master and slave clocks to create a soft edge that hides clock skew and reduces dead time [5]–[8]. Such a ST flip-flop [Fig. 5(b)] adds two inverters (four clocked transistors) to delay the clock to the master, creating a brief transparency window when the master and slave are both transparent. The high-performance ST (HPST) flip-flop [Fig. 5(c)] performs a clock-AND operation to remove one of the series transmission gates in the *D*-to-*Q* path. The clock-AND delay also creates a transparency window. Note that transparency windows result in greater hold times.

It is important to consider ST flip-flops because conventional characterization is particularly pessimistic for these flops.

## IV. RESULTS

The three flip-flops were extracted from layout and simulated using HSPICE in a commercial 28 nm process, assuming worst case processing at 0.9 V and -40 °C. Clock and data edge rates and output loading were applied to model a FO4 on the inputs and outputs. The delays were normalized to a FO4 inverter. The clock paths were tuned to give a 2–3 FO4 delay between rising edges of  $\phi$  and  $\phi_d$ . As with pulsed latches, increasing this delay increases the transparency window and the amount of clock skew that can be hidden, but also increases the hold time. Timing parameters differed



Fig. 6. Delay versus input arrival time for scannable flip-flops.

| Flop                                                             | <i>t</i> <sub>setupr</sub> | <i>t</i> <sub>cqr</sub> | <i>t</i> <sub>dqr</sub> | <i>t</i> <sub>setupf</sub> | $t_{\rm cqf}$ | <i>t</i> <sub>dqf</sub> |
|------------------------------------------------------------------|----------------------------|-------------------------|-------------------------|----------------------------|---------------|-------------------------|
| Minimize $t_{setup} + t_{cq}$                                    |                            |                         |                         |                            |               |                         |
| CONV                                                             | 1.9                        | 4.4                     | 6.3                     | 3.8                        | 4.1           | 7.9                     |
| ST                                                               | -0.6                       | 6.5                     | 5.9                     | 0.8                        | 5.0           | 5.8                     |
| HPST                                                             | -0.3                       | 4.7                     | 4.4                     | 0.6                        | 4.3           | 4.9                     |
| Minimize $t_{\text{setupmin}}$ and $t_{\text{cqmin}}$ separately |                            |                         |                         |                            |               |                         |
| CONV                                                             | 1.6                        | 4.3                     | 5.9                     | 3.1                        | 3.9           | 7.0                     |
| ST                                                               | -1.6                       | 4.9                     | 3.3                     | -0.1                       | 3.7           | 3.6                     |
| HPST                                                             | -0.9                       | 4.0                     | 3.1                     | 0.2                        | 3.6           | 3.8                     |

TABLE I Flip-Flop Timing Parameters

for the rising and falling edges and were suffixed with r or f, accordingly.

Fig. 6 plots flip-flop *D*-to-*Q* delays versus input arrival time for both clock edges of all three flip-flops. Note that timing is always measured with respect to the input clock  $\varphi$ , not the delayed clock. Table I summarizes the timing parameters for the three flip-flops shown in Fig. 5. As expected, the ST and HPST flip-flops are faster than the conventional flip-flop because the overlap between clocks reduces the dead time at the clock edge. The HPST flip-flop is generally fastest because it has one fewer transmission gate on the critical path from *D*-to-*Q*.  $t_{setupmin}$  and  $t_{cqmin}$  are shorter than  $t_{setup}$  and  $t_{cq}$ . The pessimism reduction is particularly great for the ST and HPST flip-flops because of the transparency window.

Timing parameters are a function of the output load and of clock and data slew rates. Our team has used Cadence Liberate to characterize the flip-flops across the range of conditions with the proposed parameter definitions. We have generated complete liberty timing files and used them for synthesis and static timing analysis of ARM processors, obtaining performance and power results consistent with the reduced pessimism.

# V. IMPACT OF FINITE CLOCK UNCERTAINTY

Characterizing a flip-flop at the  $t_{setupmin}$  and  $t_{cqmin}$  points provides a good approximation when clock uncertainty is large compared with the skew tolerance of the flip-flop. This section examines the impact of finite clock uncertainty, which bounds the useful width of the transparency window.



Fig. 7. Setup time derivation with clock skew for HPST rising edge.



Fig. 8. Timing parameters as a function of clock skew for HPST rising edge.

To minimize sequencing overhead in the face of clock uncertainty, the *D*-to-*Q* delay should be the same whether the clock arrives at its earliest or latest point. Fig. 7 shows how to apply this to the HPST graphically. The curves and minimum values are the same as those in Fig. 4 and the impact of skew is shown in gray. At the earliest clock arrival,  $t_{DC1} = t_{setup}$ . At the latest clock arrival,  $t_{DC2} = t_{setup} + t_{skew}$ . Pick  $t_{setup}$  so that  $t_{DQ}$  is the same for both  $t_{DC1}$  and  $t_{DC2}$  and measure  $t_{cq}$  at  $t_{DC2}$ . With a  $t_{skew}$  budget of 2 FO4 delays, Fig. 7 shows that  $t_{setupr} = -0.9$  and  $t_{cqr} = 4.0$ . This agrees with the  $t_{setupmin}$  and  $t_{cqmin}$  values in Table I to the nearest 0.1 FO4 delay.

Formally, let  $t_{CQ}(t_{DC})$  be a nonincreasing function describing the clock-to-Q delay as a function of the input arrival time before the clock:  $t_{DQ}(t_{DC}) = t_{CQ}(t_{DC}) + t_{DC}$ . Choose  $t_{setup}$  that uniquely satisfies

$$t_{\rm CQ}(t_{\rm setup}) = t_{\rm CQ}(t_{\rm setup} + t_{\rm skew}) + t_{\rm skew}$$
(2)

and define

$$t_{\rm cq} = t_{\rm CO}(t_{\rm setup} + t_{\rm skew}). \tag{3}$$

Using (2), (3), or the graphical method of Fig. 7, Fig. 8 plots the computed  $t_{cq}$ , and corresponding  $t_{setup}$  as a function of the budgeted  $t_{skew}$ . When  $t_{skew} = 0$ , the proposed method is equivalent to the conventional method. As the budgeted skew increases

4

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

above 1 FO4,  $t_{cq}$  rapidly approaches the  $t_{cqmin}$  horizontal asymptote because the clock-to-Q delay is insensitive to the data arrival time so long as the data arrives somewhat before the clock. Moreover,  $t_{setup}$  also settles near  $t_{setupmin}$  when the skew exceeds 2 FO4. If the transparency window were wider, the timing parameters would approach their minimum values at larger skew budgets.

In a nontrivial design,  $t_{skew}$  is typically ~10% of the cycle time [2], or 3 FO4 for a reasonably aggressive ASIC with a 30 FO4 period. Even a path from a flop back to itself experiences significant skew from cycle-to-cycle voltage variations and jitter [9]. Hence, for most designs,  $t_{setupmin}$  and  $t_{cqmin}$  are excellent approximations to the best parameters considering clock uncertainty. However, if the slew is exceptionally well controlled or the transparency window of a flip-flop is widened to hide additional skew, the uncertainty should be considered.

# VI. CONCLUSION

This brief has shown that, when clock uncertainty is at least a few gate delays, characterizing a flip-flop at the minimum setup time and minimum clock-to-Q delay separately provides a more accurate prediction of its performance than the traditional method of characterizing the flip-flop at the point that minimizes the sum of setup and clock-to-Q delays. The proposed method shows cycle times 0.5–1 FO4 inverter delays faster than in the traditional method for ordinary flip-flops. Moreover, the conventional method is especially pessimistic when applied to ST flip-flops. The proposed method properly captures the benefits of transparency in ST flip-flops, indicating cycle time improvements of up to 2 FO4 delays when the clock between master and slave has a 2-inverter overlap.

#### ACKNOWLEDGMENT

The author would like to thank the anonymous reviewers and C. Lutkemeyer for his conversations.

#### REFERENCES

- S. H. Unger and C. Tan, "Clocking schemes for high-speed digital systems," *IEEE Trans. Comput.*, vol. 35, no. 10, pp. 880–895, Oct. 1986.
- [2] N. Weste and D. Harris, CMOS VLSI Design, 4th ed. Boston, MA, USA: Addison-Wesley, 2011.
- [3] E. Salman, A. Dasdan, F. Taraporevala, K. Kucukcakar, and E. Friedman, "Exploiting setup-hold-time interdependence in static timing analysis," *IEEE Trans. Comput.-Aided Design*, vol. 26, no. 6, pp. 1114–1125, Jun. 2007.
- [4] A. B. Khang and H. Lee, "Timing margin recovery with flexible flipflop timing model," in *Proc. 15th Int. Symp. Quality Electron. Design*, Mar. 2014, pp. 496–503.
- [5] N. Nedovic, V. G. Oklobdzija, and W. W. Walker, "A clock skew absorbing flip-flop," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2003, pp. 342–497.
- [6] H. Warnock *et al.*, "Circuit design techniques for a first-generation cell broadband engine processor," *IEEE J. Solid-State Circuits*, vol. 41, no. 8, pp. 1692–1706, Aug. 2006.
- [7] V. Joshi, D. Blaauw, and D. Sylvester, "Soft-edge flipflops for improved timing yield: Design and optimization," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design*, Nov. 2007, pp. 667–673.
- [8] M. Wieckowski *et al.*, "Timing yield enhancement through soft edge flip-flop based design," in *Proc. IEEE Custom Integr. Circuits Conf.*, Sep. 2008, pp. 543–546.
- [9] D. Chinnery and K. Keutzer, Closing the Gap Between ASIC & Custom: Tools and Techniques for High-Performance ASIC Design. New York, NY, USA: Springer-Verlag, 2002.