## Predicting the Worst-Case Voltage Violation in a 3D Power Network

Wanping Zhang<sup>1,2</sup>, Wenjian Yu<sup>3</sup>, Xiang Hu<sup>2</sup>, Amirali Shayan<sup>2</sup>, A. Ege Engin<sup>4</sup>, and Chung-Kuan Cheng<sup>2</sup>

<sup>1</sup>wanpingz@qualcomm.com, Qualcomm Inc. 5775 Morehouse Dr., San Diego, CA, U.S.A

<sup>2</sup>{w7zhang, x2hu, amirali, ckcheng}@ucsd.edu, UC San Diego, La Jolla, CA, U.S.A

<sup>3</sup>yu-wj@tsinghua.edu.cn, Tsinghua University, Beijing 100084, China

<sup>4</sup>aengin@mail.sdsu.edu, San Diego State University, San Diego, CA92182, U.S.A

#### ABSTRACT

This paper proposes an efficient method to predict the worst case of voltage violation by multi-domain clock gating in a three-dimensional (3D) on-chip power network considering leakage current. We first describe the 3D Power Distribution Network (PDN) structure which includes on-chip inductance and through-silicon-vias (TSV). The analysis flow using a superposition technique will be introduced later on. Then, we propose a general model to identify the worst-case gating pattern and the maximum variation area with arbitrary leakage current. For low power wireless chips, we introduce another simplified model, which treats the leakage to be a DC current. We formulate these two models with integer linear programming (ILP). The ILP based method is significantly faster than a conventional method based on enumeration. The experimental results also show that the noise contributed by leakage current is not negligible.

#### **Categories and Subject Descriptors**

J.6 [COMPUTER-AIDED ENGINEERING]: Computer-aided design (CAD); B.2.2 [Performance Analysis and Design Aids]: Worst-case analysis

#### **General Terms**

Algorithms, Performance, Design.

#### Keywords

Power Networks, Worst Case Violation Prediction, Clock Gating, Leakage, Integer Linear Programming.

#### **1. INTRODUCTION**

Three-dimensional (3D) stacking technology is a promising direction to increase integration density, improve system performance and reduce production cost [14]. However, the

*SLIP'09*, July 26–27, 2009, San Francisco, California, USA. Copyright 2009 ACM 978-1-60558-576-5/09/07...\$10.00.

supply voltage is reduced, while the active and leakage current keep increasing, which make more power supply noise. These voltage variations have adverse impact on chip, package and board performance such as longer signal delay and even logic failure [1]. On the other hand, clock gating and multi-domain design are widely used in low power implementation [9]. Therefore, it is very important to predict the clock gating pattern which leads to the worst case voltage violation.

A lot of work has been done to predict the worst-case voltage variation of power network. Bai et al. proposed MIMAX algorithm to generate a tight upper bound on the maximum macro-block current envelope, which leads to the maximum voltage drop [3]. Shi et al. introduced an algorithm to predict the worst-case logical timing correlations among the cells which cause the voltage resonance [4]. Lin et al. proposed a full-chip vectorless approach for dynamic power integrity analysis [5]. Zhang et al. described an integer linear programming formulation to find the worst voltage violation [9].

Firstly, these previous work focused on the traditional 2D power networks to analyze the noise problem. This paper extends the worst case noise prediction problem onto a more comprehensive model for 3D PDN structures which includes on-chip parasitic inductance and TSVs [13].

Secondly, some of the above works [3-5] do not consider the case with multiple clock domains. Clock gating with multiple clock domains is an efficient technique to reduce unnecessary power dissipation by disabling the clock for a circuit module [6]. A certain clock gating pattern may induce the resonance phenomena with large voltage noise. The circuit modules working for some clock cycles and gated for other cycles also increase the complexity of analyzing the power network.

With the continuous shrinking of feature size, leakage is becoming a major challenge for VLSI design [7]. However, some previous works [8, 9] neglect the leakage current, which leads to the underestimation of power consumption and noise. In this paper, we take the leakage into consideration to predict the worst voltage violation.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

We consider the worst case with the maximum voltage violation area [1, 2], which presents the accumulating effect of the noise. The violation area at node *i* is defined to be:

$$A_{i} = \int_{0}^{T} \max(V_{\min} - v_{i}(t), 0) dt \quad , \tag{1}$$

where  $V_{\min}$  is the allowed voltage drop, and  $v_i(t)$  is the voltage response at node *i*.

This paper follows the superposition technique in [9] to deal with the voltage response considering multi-domain clock gating, and makes the extension to 3D PDN. To predict the worst case of gating patterns, we propose two models with integer linear programming (ILP) formulation and solve them efficiently with the commercial tool CPLEX [10]. The first model is general to include arbitrary leakage current sources. While for low power wireless chip analysis, a simplified model is utilized, which treats the leakage to be a DC current. The proposed ILP-based method shows a large speedup to the enumeration method.

#### 2. PROBLEM FORMULATION

In this section, we firstly introduce the models for 3D onchip power network with multiple clock domains. Then the analysis flow based on superposition is discussed. Finally, we will describe the general model and simplified model to predict the worst case voltage violation considering clock gating and leakage current.

## 2.1 Power Network Model with Multi-Domain Clock Gating



# Figure 1. 3D Power network model with multi-domain clock gating.

The 3D power network model with multi-domain clock gating is shown in Fig. 1. The power network is divided into nine domains. The clock controlling signal "1" or "0" for each domain means whether the transistors in this domain work or sleep at the current clock cycle. In working mode, the current is constituted by dynamic and leakage current, while in idle or sleep mode, there is only leakage current. We assume the working frequencies for each domain are the same, and the current waveform repeats for each cycle. In low power design, the circuit is always

divided into several domains, and the clock gating is independent from each other.

The extracted 3D power network model [15] is shown in Fig. 2. Each stacking layer is modeled by a mesh structure with parasitic resistance and inductance. A capacitor is placed at each intersection node of the mesh to represent the on-chip parasitic capacitance and the decoupling capacitor. The behavior of transistors are modeled to be piecewise linear (PWL) current sources. TSVs which connect adjacent layers are modeled as series RLs with values from the extraction results. The voltage source represents the nominal voltage supply from package and is connected to the center of the bottom layer mesh through an on-chip voltage regulator module.

The objective is to predict the sequence of the clock gating signal for each domain causing the maximum voltage violation at observing nodes. This work will give a most pessimistic estimation of the on-chip power network noise to help the stability analysis.

#### 2.2 Analysis Flow Based on Superposition

Due the resonance in circuit, the voltage variation will fluctuate for several cycles before reaching the steady state. As shown in Fig. 3, if the circuit works in the first cycle, the voltage variation takes four cycles ( $n_k = 4$ , where  $n_k$  is



Figure 2. Extracted 3D power network model with onchip inductance and TSVs.



Figure 3. Superposition of voltage response.

explained in Fig. 6) to reach back to be zero. Assuming the current profile keeps unchanged, if the circuit works again



Figure 4. Analysis flow in one domain considering leakage current.

in the second cycle, the variation waveform corresponding to the current source in the second cycle will be the same but with one cycle shift. Based on the above analysis, the overall variation waveform in one cycle would be the superposition of the four parts.

We first consider the case with one clock domain, and then extend into multiple domains. The analysis flow considering leakage current for one domain is described in Fig. 4. The voltage variation waveforms at observation node are calculated with active current source and leakage current source working respectively in one clock cycle. The voltage variation may take  $n_k$  cycles to reach the steady state. The dissected waveforms in  $n_k$  cycles need to be superimposed. However, for each cycle, if the clock is enabled, the voltage variation caused by active current is selected. On the other hand, if the clock is disabled, the



Figure 5. Analysis with multiple domains.

voltage variation caused by leakage current is chosen. The goal is to determine the clock gating pattern for each cycle

*P:* the number of domains

- $n_k$ : the number of cycles that voltage variation waveform reaches steady state in the *k*th domain.
- *n*: the number of cycles needed in superposition of all the domains, which is  $\sum_{k=1}^{p} n_{k}$ ;
- *m*: the number of sample voltage response in each cycle;
- $V_{dd}$ : nominal voltage;
- $V_{\min}$ : minimal voltage requirement. Voltage is considered to be violation if below this value;

 $V_{ij}^{A}, V_{ij}^{L}$ : voltage responses at the *j*th sampling point of the *i*th cycle,  $(1 \le i \le n, 1 \le j \le m)$ , for active and leakage current respectively; The active current includes the dynamic and leakage currents;

- $\tilde{V}_{ij}^{A}, \tilde{V}_{ij}^{L}$ : voltage variation from  $V_{dd}$  for the active and leakage current, i.e.  $V_{dd} V_{ij}^{A}$  and  $V_{dd} V_{ij}^{L}$ ;
- $\tilde{V}_{ii}^{D}$ : voltage variation from  $V_{dd}$  with only dynamic current;
- *cutoff*: the allowed minimal value of  $\tilde{V}_{ij}$ . If  $\tilde{V}_{ij}$  is larger than *cutoff*, voltage violation occurs;
- $d_i$ : time interval between adjacent sample points;
- *M*: a sufficiently large constant.

#### Figure. 6. Parameters description for ILP formulations.

that makes the superimposed variation waveform in one cycle have maximum violation area.

If there are P domains, the total voltage variation for the observation node is the summation of the variation contributed by each domain. Assume the working frequencies of all the domains are the same which means the period T is the same. We superimpose these dissected variation waveforms from each domain as shown in Fig. 5. Then the clock gating pattern ("1" or "0") for each domain needs to be determined to maximize the violation area of the overall superimposed variation waveform. The enumeration method exhaustively tries all the possible clock gating patterns.

## 2.3 Models to Predict the Worst-Case Violation Area Considering Leakage Current

With the reduction of power supply voltage and increasing operating frequency, the threshold voltages have to scale aggressively, and result in higher subthreshold leakage currents [11]. On the other hand, the gate leakage is becoming larger with the reduction of the gate oxide thickness [7]. Therefore, we can not ignore the leakage current in noise estimation.

In the general model, both the dynamic current and leakage can have arbitrary waveform. When the clock is enabled, the overall active current includes both dynamic and leakage current. While the clock is disabled, which means in sleep or idle mode, there is only leakage current. In order to take the leakage effect into consideration when clock is disabled, we need to have two voltage variations contributed by the active current and leakage current respectively as Fig. 4. The two waveforms are superimposed. When the clock is enabled, we select the dissected waveform from the active response. Otherwise, select the dissected waveform from the leakage response. Then, the overall superimposed voltage waveform can be obtained considering both active and leakage current effect.

This general model covers both dynamic and leakage current effect to analyze the worst case violation area. For general processors such as IBM processors, the leakage currents vary in active and idle mode because of circuit activity and temperature [7]. The separation voltage response computation for active and leakage current in the proposed model provides the capability to handle this case.

The simplified model considers the leakage current to be a DC constant. Unlike the general processors whose leakage current takes a large portion of active current (i.e. 30%), the low power wireless chips have a limited percentage of leakage (i.e. 1%), and the overall current is small (i.e. 300mA) [12]. Therefore, the leakage current can be approximated to be a DC constant, and we assume that the current value keeps the same in either active or idle mode. We can then simplify the general model to deal with the DC leakage current sources. Since the voltage response for DC current sources is still a DC constant, the voltage variation contributed by leakage is simply a DC bias. This will simplify the ILP formulation which will be explained in the next section.

## 3. ILP BASED ALGORITHM

We proposed an Integer Linear Programming (ILP) based method to determine the clock gating pattern for the worstcase voltage violation area. The objective function is to maximize the violation area of the superimposed voltage variation waveforms. The decision variables are the clock gating signal for every cycle at each domain. Thus, this problem can be solved optimally by commercial ILP solver such as CPLEX [10]. We will describe the formulations for both general model and simplified model below.

#### 3.1 General Model

These parameters used in the proposed models are listed in Fig. 6. We sample the voltage waveform in each cycle with *m* time points, whose intervals are  $d_j$  seconds  $(1 \le j \le m)$ . The following variables are used in the ILP:

•  $x_i^A \in \{0,1\}, 1 \le i \le n, x_i^L \in \{0,1\}, 1 \le i \le n$ : binary variables to indicate the status of clock gating signal for the *i*th cycle. If the clock is enabled,  $x_i^A$  is "1",  $x_i^L$  is "0", and the dissected waveform from  $\tilde{V}_{i}^A$  will be selected. If the

clock is disabled,  $x_i^A$  is "0",  $x_i^L$  is "1", and the dissected waveform from  $\tilde{V}_{ii}^L$  will be selected.

- $y_j \in \{0,1\}, 1 \le j \le m$ : binary variables to indicate whether the *j*th voltage sampling violates the allowed amount. These are intermediate variables used to compute the violation amount in the *j*th sampling point.
- $u_j \in [0,\infty), 1 \le j \le m$ : continuous auxiliary variables to represent the total violated amount for the *j*th voltage sampling.

The ILP formulation is then presented as follows:

Maximize: 
$$\sum_{j=1}^{m} d_j u_j$$

Subject to:

$$y_{j} \cdot M \ge \sum_{i=1}^{n} \tilde{V}_{ij}^{A} x_{i}^{A} + \sum_{i=1}^{n} \tilde{V}_{ij}^{L} x_{i}^{L} - cutoff, \ 1 \le j \le m$$
(2)

$$(y_{j}-1) \cdot M \leq \sum_{i=1}^{n} \tilde{V}_{ij}^{A} x_{i}^{A} + \sum_{i=1}^{n} \tilde{V}_{ij}^{L} x_{i}^{L} - cutoff, \ 1 \leq j \leq m$$
(3)

$$u_{j} \leq \sum_{i=1}^{n} \tilde{V}_{ij}^{A} x_{i}^{A} + \sum_{i=1}^{n} \tilde{V}_{ij}^{L} x_{i}^{L} - cutoff + M(1 - y_{j}), \ 1 \leq j \leq m \ (4)$$

$$u_j \le M \cdot y_j, \ 1 \le j \le m \quad . \tag{5}$$

$$x_i^A + x_i^L = 1, \ 1 \le i \le n \tag{6}$$

The objective is the total violation area, which is the summation of the area in each sampling. Constraints (2) and (3) describe the how  $y_j$  works: (2) enforces  $y_j$  to be 1 if  $\sum_{i=1}^{n} \tilde{V}_{ij}^{A} x_i^{A} + \sum_{i=1}^{n} \tilde{V}_{ij}^{L} x_i^{L} > cutoff$ , which means the cutoff is violated in this point and the area should be counted to be violation; (3) makes  $y_j$  be 0 if  $\sum_{i=1}^{n} \tilde{V}_{ij}^{A} x_i^{A} + \sum_{i=1}^{n} \tilde{V}_{ij}^{L} x_i^{L} < cutoff$ . Constraints (4) and (5) restrict  $u_j$  by using  $y_j$ :  $u_j \leq \sum_{i=1}^{n} \tilde{V}_{ij}^{A} x_i^{A} + \sum_{i=1}^{n} \tilde{V}_{ij}^{L} x_i^{L} - cutoff$  when  $y_j = 1$  according to

(4), and  $u_j \le 0$  when  $y_j = 0$  according to (5). Since the objective function needs to be maximized, constraints (4) and (5) are actually equivalent to the following conditional assignment:  $u_i = \sum_{i=1}^{n} \tilde{V}^A x^A + \sum_{i=1}^{n} \tilde{V}^L x^L = cutoff$  if  $v_i = 1$ , and

assignment: 
$$u_j = \sum_{i=1}^{L} V_{ij}^A x_i^A + \sum_{i=1}^{L} V_{ij}^L x_i^L - cutoff$$
 if  $y_j = 1$ , and  $u_j = 0$  otherwise. Constraint (6) ensures either the active

 $u_j = 0$  otherwise. Constraint (6) ensures either the active waveform or the leakage waveform will be selected.

The above formulation presents the violation area maximization problem with only 2n+m binary variables. Thus it can be efficiently solved by CPLEX.

#### **3.2 Simplified Model**

In simplified model, the DC leakage current exists in both active and idle mode, and the DC current will contribute a DC bias  $V_{DC}$  to the voltage response with dynamic current.

The ILP formulation for simplified model is presented as follows:

Maximize: 
$$\sum_{j=1}^{m} d_j u_j$$

Subject to:

$$y_j \cdot M \ge \sum_{i=1}^n \tilde{V}_{ij}^D x_i - cutoff + V_{DC}, \ 1 \le j \le m$$

$$\tag{7}$$

$$(y_j - 1) \cdot M \le \sum_{i=1}^n \tilde{V}_{ij}^D x_i - cutoff + V_{DC}, \ 1 \le j \le m$$
(8)

$$u_{j} \leq \sum_{i=1}^{n} \tilde{V}_{ij}^{D} x_{i} - cutoff + V_{DC} + M(1 - y_{j}), \ 1 \leq j \leq m$$
<sup>(9)</sup>

$$u_j \le M \cdot y_j, \ 1 \le j \le m \quad . \tag{10}$$

Because the leakage is simplified to be a DC constant, we just add the DC bias  $V_{DC}$  to the dynamic voltage response in constraints (7)-(9), instead of two voltage variation curves together like constraints (2)-(4).

#### 4. EXPERIMENTAL RESULTS

We implement the ILP based method with the ILOG CPLEX9.1.10 [10]. We also implement an enumeration method for comparison, which exhaustively tries all TABLE I

COMPUTATIONAL RESULTS WITH 1% LEAKAGE

| Test case | # Clock Domain | T_enum. (s) | T_ILP (s) | A_ILP (mV·ns) |
|-----------|----------------|-------------|-----------|---------------|
| 1         | 4              | 21          | <0.1s     | 51.78         |
| 2         | 6              | N.A.        | <0.1s     | 46.05         |
| 3         | 8              | N.A.        | <0.1s     | 56.87         |
| 4         | 10             | N.A.        | <0.1s     | 62.67         |



Figure. 7. Voltage responses with each domain working respectively.

possible clock gating patterns. The experiments are run on a 3.2GHz Pentium 4 machine with 1GB memory.

The test cases are simplified industrial power networks for low power wireless chips. Therefore, we apply the simplified model to estimate the worst case voltage violation area. These power networks are of mesh structures with on-chip R, C and inductive components from package. The dynamic current for the whole chip is about 360mA, and the percentage of the leakage current varies from 1% to 5%. The VDD is 1V, and the cutoff to determine violation is 5% of VDD which is 0.05V. The number of clock domains in the test cases varies from 4 to 10. A node at the center of a clock domain is selected as the observation point, whose voltage response is simulated. The number of cycles required for superimposition is 6.



Figure. 8. Worst case voltage violation area.

We first show the proposed ILP based method via the fourdomain example with 1% leakage in Table I. Fig. 7 shows the voltage responses with each domain working respectively. The worst violation area clock gating pattern given by the proposed algorithm in this example is {110011, 110001, 110011, 110011}, with each group for a clock domain. And the worst case violation area under that pattern is displayed in Fig. 8. The dotted line is the worst case voltage response, the VDD is 1V, and the cutoff voltage to determine violation is 0.05V which is represented by the dashed line. Therefore, the area below this dashed line is the violation area whose value is 51.78 mV·ns.

Then we compare the computational time between the TABLE II

VIOLATION AREA COMPARISON WITH DIFFERENT LEAKAGE PERCENTAGE FOR A FOUR-DOMAIN Network

| % of Leakage | ViolationArea (mV·ns) | % Violation area increased |
|--------------|-----------------------|----------------------------|
| No leakage   | 51.685                | 0                          |
| 1%           | 51.778                | 0.18%                      |
| 2%           | 51.873                | 0.36%                      |
| 3%           | 51.971                | 0.55%                      |
| 4%           | 52.069                | 0.74%                      |
| 5%           | 52.171                | 0.94%                      |
|              |                       |                            |

enumeration method ("T\_enum.") and the proposed ILP based method ("T\_ILP"). The total leakage current is 1%. The number of decision variables is proportional to the number of domains. So the complexity of enumeration method grows exponentially as the number of domains grows. Hence, it only works for these cases with small numbers of clock domains. For the four-domain case, the enumeration method consumes 21 seconds which is over 200 times slower than the ILP based method. The proposed ILP based method works efficiently for complicated cases with more domains, and provides an optimal solution. The simulation time is not included in the computational time of Table I.

We also show the worst case violation area with different percentage of leakage in Table II. The test case is the first one in Table I which is a four-domain power network. If we do not consider leakage current, the violation area is  $51.685 \text{ mV} \cdot \text{ns}$ . When the percentage of leakage is increased from 1% to 5% which are some typical rates in low power wireless chips, the violation area keeps increasing as shown in the third column in Table II. For general processors, the leakage could take more percentage of total current (i.e. 30%) and therefore, the violation area considering leakage would be much larger than the area without considering it. This implies the importance to have a worst case violation area analysis model that takes the leakage current into consideration.

#### 5. CONCLUSIONS

We propose an ILP based method to predict the worst-case clock gating pattern and maximum violation area in a 3D power network. We take the leakage current into consideration, and provide two models. One is for the general case, and the simplified one is for low power wireless chips power network analysis. The proposed method is efficient and can be applied to complicated cases with multiple domains. We also show the contribution of violation area by leakage current, and conclude that it is not negligible.

#### 6. Acknowledgments

The authors would like to acknowledge the support of NSF CCF-0811794 and California MICRO Program.

## 7. REFERENCES

 Z. Qi, H. Li, S. X.-D. Tan, et. al, "Fast decap allocation algorithm for robust on-chip power delivery", in *Proc. ISQED 2005*, pp. 542-547.

- [2] J. Fan, I. Liao, S. X.-D. Tan, et. al, "Localized on-chip power delivery network optimization via sequence of linear programming", in *Proc. ISQED 2006*, pp. 272-277.
- [3] G. Bai, S. Bobba, and I. N. Hajj, "Simulation and optimization of the power distribution network in VLSI circuits," in *Proc. IEEE/ACM Int. Conf. Computer Aided Des.*, Nov. 2000, pp. 481-486.
- [4] J. Shi, Y. Cai, S. X.-D. Tan, and X. Hong, "Efficient early stage resonance estimation techniques for C4 package," in *Proc. Asia South Pacific Design Automation Conf.*, Jan. 2006, pp. 826-831.
- [5] S. Lin, M. Nagata, K. Shimazake, K. Satoh, M. Sumita, H. Tsujikawa, and A.T. Yang, "Full-chip vectorless dynamic power integrity analysis and verification against 100uV/100ps-resolution measurement," in *Proc. IEEE Custom Integr. Circuits Conf.*, Oct. 2004, pp. 509- 512.
- [6] H. Li, S. Bhunia, et. al, "Deterministic clock gating for microprocessor power reduction", in *Proc. Int. Symp. High-Performance Computer Architecture 2003*, pp. 113-122.
- [7] H. Su, F. Liu, A. Devgan, E. Acar, S. Nassif, "Full chip leakage estimation considering power supply and temperature variations", in *Proc. ISLPED 2003*, pp. 78 – 83.
- [8] W. Zhang, L. Zhang, et. al, "Fast power network analysis with multiple clock domains", in *Proc. IEEE International Conference on Computer Design (ICCD) 2007*, pp. 456-463.
- [9] W. Zhang, Y. Zhu, W. Yu, et. al, "Finding the worst voltage violation in multi-domain clock gated power network", in *Proc. IEEE/ACM Design, Automation & Test in Europe* (DATE) 2008, pp. 537-540.
- [10] "ILOG CPLEX: High performance software for mathematical programming and optimization", http://www.ilog.com/products/cplex/
- [11] S. Borkar, "Low power design challenges for the decade", in Proc. Asia South Pacific Design Automation Conf., 2001, pp. 293-296.
- [12] M. Bickerstaff, L. Davis, C. Thomas, D. Garrett, C. Nicol, "A 24Mb/s radix-4 LogMap turbo decoder for 3GPP-HSDPA mobile wireless", in *Proc. ISSCC* 2003, pp. 150-151.
- [13] G. Huang, M. Bakir, A. Naeemi, H. Chen, and J. D. Meindl, "Power delivery for 3d chip stacks: Physical modeling and design implication", In *Proc. IEEE EPEP 2007*, pp. 205-208.
- [14] D. J. Mountain, "Analyzing the value of using threedimensional electronics for a high-performance computational system", *IEEE Transactions on Advanced Packaging*, Volume 31, Issue 1:107-117, Feb. 2008
- [15] A. Shayan, X. Hu, and et. al, "3D Power Distribution Network Co-design for Nanoscale Stacked Silicon IC", in *Proc. EPEP 2008*, pp. 11-14.