# Finding the Worst Voltage Violation in Multi-Domain Clock Gated Power Network

Wanping Zhang<sup>1,2</sup>, Yi Zhu<sup>2</sup>, Wenjian Yu<sup>2,3</sup>, Ling Zhang<sup>2</sup>, Rui Shi<sup>2</sup>, He Peng<sup>2</sup>, Zhi Zhu<sup>1</sup> Lew Chua-Eoan<sup>1</sup>, Rajeev Murgai<sup>4</sup>, Toshiyuki Shibuya<sup>5</sup>, Nuriyoki Ito<sup>6</sup>, Chung-Kuan Cheng<sup>2</sup>

<sup>1</sup>{wanpingz, zzhu, lewc}@qualcomm.com, Qualcomm Inc. 5775 Morehouse Dr., San Diego, CA, U.S.A

<sup>2</sup>{w7zhang, y2zhu, w1yu, lizhang, rshi, hepeng, ckcheng}@ucsd.edu, UC San Diego, La Jolla, CA, U.S.A

<sup>3</sup>yu-wj@tsinghua.edu.cn, Tsinghua University, Beijing 100084, China

<sup>4</sup>murgai@fla.fujitsu.com, Fujitsu Laboratories of America, Inc., Sunnyvale, CA, U.S.A.

<sup>5</sup>shibu@jp.fujitsu.com, Fujitsu Laboratories LTD, Kawasaki, Japan

<sup>6</sup>ito.nuriyoki@jp.fujitsu.com, Fujitsu Limited, Kawasaki, Japan

# Abstract

This paper proposes an efficient method to find the worst case of voltage violation by multi-domain clock gating in an on-chip power network. We first present a voltage response in an arbitrary multi-domain clock gating pattern, using a superposition technique. Then, an integer linear programming (ILP) formulation is proposed to identify the worst-case gating pattern and the maximum variation area. The ILP based method is significantly faster than a conventional method based on enumeration. The experimental results are also compared with a case where peak voltage variation is induced, which shows the latter technique largely underestimated the overall variation effect.

# **1. Introduction**

With aggressive technology scaling, power ground (P/G) network has become one of the major concerns in VLSI design. IR drops and simultaneous switching noises lead to supply voltage variations which have adverse impact on chip, package and board performance such as longer signal propagation delay, false logic switching, and logic failure [3]. Therefore, it is becoming important to analyze and optimize the power network efficiently [2].

Clock gating with multiple clock domains is an efficient technique to reduce unnecessary power dissipation by disabling the clock to a circuit [4]. However, a certain clock gating pattern may induce the resonance phenomena with large voltage noise. The sub-circuit working for some clock cycles and gated for other cycles also increases the complexity of analyzing the power network. Therefore, identifying the clock gating pattern that leads to the worst voltage variation is a challenging research topic, which provides a margin of the system.

There are two different definitions of the worst case. One is the peak noise. Zhang et. al proposed a method to identify the clock gating pattern which leads to the peak voltage variation [1]. The other definition for the worst case is the maximum violation area [2, 3], which describes the accumulating effect of the noise. The decap budgeting algorithms were proposed to minimize the violation area at node *i*, which is defined as:

$$A_{i} = \int_{0}^{T} \max(V_{\min} - v_{i}(t), 0) dt$$
 (1)

where  $V_{\min}$  is the allowed voltage drop.

In this paper, we consider the worst case of maximum violation area. We first calculate the voltage response with the current sources working for each clock cycle. Then, the voltage response considering multi-domain clock gating pattern can be obtained by a superposition technique. To find the worst case of gating patterns, an integer linear programming (ILP) formulation is proposed and solved with the commercial tool CPLEX. The proposed ILP-based method shows a large speedup to the enumeration method. Experimental results are also compared with the case leading to the peak voltage variation, which reveal that the latter largely underestimates the overall violation area.

### 2. Problem Statement

The model of power network with multiple clock domains is shown in Fig. 1. The network is a mesh structure of R, C, and L elements. The behavior of transistors is modeled with current sources. The waveform of current source is considered as a piecewise linear (PWL) function, and it is assumed to be the same for different cycles. The multi-domain clock gating technique divides the circuit into multiple clock domains. For each domain there is a clock controlling signal, whose value indicates whether the transistors in the domain work or sleep at the current clock cycle. The sequence of clock controlling signals is called clock gating pattern. The clock gating patterns of multiple domains are independent from each other. In Fig. 1, four clock domains and their clock gating patterns are shown.

The goal of this work is to determine the clock gating patterns, which cause the maximum voltage violation area at given observing network nodes. Since the gating signal in each domain and each cycle can be either enabled or gated, a conventional method to search the pattern with maximal violation area is by enumeration. After simulating the network with all possible current source distributions corresponding to the gating patterns, the one induces maximal violation area can be obtained. However, this method is computationally exhaustive, because we need to examine  $2^n$  patterns, where *n* is the number of involved clock cycles of all the domains.



Fig. 1 Power network with multi-domain clock [1].

# 3. Find the Worst Voltage Violation

If all current sources in one clock domain only work in one cycle, the voltage response will fluctuate for several cycles before reaching steady state, due to the resonance in circuit. It can then be utilized to obtain the voltage response for multiple cycles with a clock gating pattern [1]. This is briefly introduced in Section 3.1. Then, the ILP formulation to identify the clock gating pattern causing the maximum violation area is introduced.

# **3.1.** Voltage Response for the Situation with Multi-Domain Clock Gating

If the waveform of current source *i* within the first cycle is denoted by  $f_i(t)$ , its waveform within *k* cycles considering clock gating can be expressed as:

$$g_i(t) = \sum_{l=0}^{k-1} b_l f_i(t-lT), \ i = 1, \cdots, q$$
(2)

where sequence  $\{b_i\}$  represents the clock gating pattern, and q is the total number of current sources. If in the *l*th cycle the clock domain is enabled,  $b_i = 1$ , otherwise  $b_i = 0$ .

For the case that the current sources are the only input of circuit and they only work for the first cycle, we use  $y_0(t)$  to denote the voltage response at a given node. Then, the voltage waveform corresponding to current sources working for *k* cycles of a gating pattern becomes [1]:

$$y(t) = \sum_{l=0}^{k-1} b_l \cdot y_0(t - lT) .$$
(3)

This is because of the linearity of the circuit model.

Fig. 2 shows an example of  $y_0(t)$ . Suppose one clock cycle is 5ns, we find out that the waveform takes 6 cycles to reach the steady state (i.e. 0). This character simplifies the calculation of y(t) greatly and enables our ILP based method to predict the worst-case clock gating pattern.



Fig. 2 Voltage response when circuit works for only one cycle.

If  $y_0(t)$  reaches its steady state after *p* cycles, the *l* in (3) needs to satisfy 0 < t - lT < pT to contribute to y(t) with non-zero value of  $y_0(t - lT)$ . That is,

$$\frac{t}{T} - p < l < \frac{t}{T}.$$
(4)

So, we just need to check *n* clock gating values (value of  $b_l$ ) at most to calculate y(t). These *p* cycles are the cycle the time point belongs to and the preceding *p*-1 ones.



Fig. 3 Superposition of voltage response

The superposition idea for calculating y(t) is illustrated with the example shown in Fig. 2. For this example, we depict the waveforms for the 6 cycles separately, and arrange from top to bottom in Fig. 3. Now, with the help of the 6 waveforms, the voltage response at a given clock cycle can be easily obtained. Firstly, we check the clock gating values for the 6 cycles with our concern, which correspond to the six waveforms in Fig. 3, respectively. For the cycle with clock enabled, we keep the corresponding waveform. Finally, adding all kept waveforms together gives the result of y(t) at the specified clock cycle.

The waveform of  $y_0(t)$  can be simulated with the scheme proposed in [1]. Through the vector fitting technique and logarithmic-scaled frequency sampling, the power network with as large as  $10^6$  nodes can be simulated with high efficiency.

The above derivation only considers the current sources. The resulting waveform y(t) needs to be added with DC bias voltage accounting for the effect of supplying voltage. If there are multiple clock domains, we can similarly superimpose the waveforms induced by different domain together, to get the response at a node.

#### **3.2. ILP Formulation and Solution**

We formulate the violation area maximization problem as an integer linear program, where the objective function and constraints are represented by linear constraints with integer decision variables. Thus it can be solved optimally by commercial ILP solver, such as ILOG CPLEX [5].

CPLEX employs branch-and-bound technique to search the optimal solution of the given problem. An enumeration tree is generated where each node represents a value of an integer decision variable. At each intermediate node in the search tree, the upper bound (for maximization problem) is derived by relaxing the undetermined integer variables and thus solving the corresponding linear program. The derived bounds, together with the sophisticated cuttingplane and heuristic algorithms, are used to efficiently prune the search space. Thus, the branch-and-bound algorithm runs much faster than pure enumeration, though the complexity remains exponential.

Consequently, this approach is suitable for those problems that require optimal solutions and the scales of real instances are moderate. ILP is suitable for our problem, since the accuracy of the worse case identification is of importance, and the number of clock domains is usually limited.

The ILP formulation has great impact on the performance of the algorithm. Reduction of the number of integer variables is always preferred. Binary variables are better than general integer variables as they help reduce the searching space. In the formulation we derived, the number of variables is linear to the problem size and all decision variables are binary.

Our ILP formulation contains the parameters (constants as inputs) shown in Fig. 4. We sample the voltage waveform in each cycle with *m* time points, whose intervals are  $d_i$  seconds  $(1 \le j \le m)$ .

The following variables are used in the ILP:

•  $x_i \in \{0,1\}, 1 \le i \le n$ : binary variables to indicate the status of clock gating signal for the *i*th cycle. These are the solution of clock gating.

- *n*: the number of cycles needed in superposition of all the domains;
- *m*: the number of sample voltage response in each cycle;
- $V_{dd}$ : nominal voltage;
- $V_{\min}$ : minimal voltage requirement. Voltage is considered to be violation if below this value;
- $V_{ij}$ : voltage response in *i*th cycle and *j*th sampling in that cycle, where  $1 \le i \le n$  and  $1 \le j \le m$ ;
- $\tilde{V}_{ij}$ : voltage drop from  $V_{dd}$ , i.e.  $V_{dd} V_{ij}$ ;
- *cutoff*: the allowed minimal value of  $\tilde{V}_{ij}$ . If  $\tilde{V}_{ij}$  is larger than *cutoff*, voltage violation occurs;
- $d_{i}$ : time interval between adjacent sample points;
- $a_j$ . This interval between adjacent sample p
- *M*: a sufficiently large constant.

# Fig. 4 Parameters description.

- $y_j \in \{0,1\}, 1 \le j \le m$ : binary variables to indicate whether the *j*th voltage sampling violates the allowed amount. These are intermediate variables used to compute the objective function of violation area.
- $u_j \in [0,\infty), 1 \le j \le m$ : continuous auxiliary variables to represent the total violated amount for the *j*th voltage sampling. Note that they are continuous variables, and therefore need not be searched with the branch-and-bound algorithm.

The ILP formulation is then presented as follows:

Maximize: 
$$\sum_{j=1}^{m} d_j u_j$$
  
Subject to:

$$y_j \cdot M \ge \sum_{i=1}^n \tilde{V}_{ij} x_i - cutoff, 1 \le j \le m \quad , \tag{5}$$

$$(y_j - 1) \cdot M \le \sum_{i=1}^{n} \tilde{V}_{ij} x_i - cutoff, 1 \le j \le m$$
(6)

$$u_{j} \leq \sum_{i=1}^{n} \tilde{V}_{ij} x_{i} - cutoff + M (1 - y_{j}), 1 \leq j \leq m$$
(7)

$$u_j \le M \cdot y_j, 1 \le j \le m \quad . \tag{8}$$

The objective is the total violation area, which is the sum of the area in each sample response. Constraints (5) and (6) describe the property of  $y_j$ : (5) enforces  $y_j$  to be 1 if  $\sum_i \tilde{V}_{ij} x_i > cutoff$ , i.e. the cutoff is violated in this point and

the area should be counted; (6) makes  $y_j$  be 0 if  $\sum_i \tilde{V}_{ij} x_i < cutoff$ . Constraints (7) and (8) restrict  $u_j$  by

using 
$$y_j: u_j \le \sum_i \tilde{V}_{ij} x_i - cutoff$$
 when  $y_j = 1$  according to (7),

and  $u_j \le 0$  when  $y_j = 0$  according to (8). Since the objective function needs to be maximized, constraints (7) and (8) are actually equivalent to the following conditional assignment:  $u_j = \sum_i \tilde{V}_{ij} x_i - cutoff$  if  $y_j = 1$ , and  $u_j = 0$ 

otherwise.

The above formulation expresses the violation area maximization problem with only n+m binary variables. Thus it can be efficiently solved by CPLEX.

| Test case | # Clock<br>Domain | T_enum.<br>(s) | T_ILP<br>(s) | A_peak<br>(V∙ns) | A_ILP<br>(V∙ns) | Under-<br>estimatio<br>n (%) |
|-----------|-------------------|----------------|--------------|------------------|-----------------|------------------------------|
| 1         | 4                 | 21             | 0.19         | 0.508            | 0.605           | 19.1                         |
| 2         | 4                 | 21             | 0.08         | 1.145            | 1.284           | 12.1                         |
| 3         | 6                 | N.A.           | 0.14         | 0.551            | 0.618           | 12.2                         |
| 4         | 6                 | N.A.           | 0.13         | 1.177            | 1.305           | 10.9                         |
| 5         | 8                 | N.A.           | 0.44         | 0.514            | 0.605           | 17.7                         |
| 6         | 8                 | N.A.           | 0.14         | 1.145            | 1.283           | 12.1                         |
| 7         | 10                | N.A.           | 0.5          | 0.614            | 0.674           | 9.8                          |
| 8         | 10                | N.A.           | 0.2          | 1.39             | 1.54            | 10.8                         |

Table 1. Computational results

# 4. Experimental Results

We implement the ILP based method with the ILOG CPLEX9.1.10. The enumeration method and the method to identify the peak voltage variation are also implemented in C language. The experiments are run on a P4 3.2GHz machine with 1G memory.

The test cases are simplified industrial power networks of mesh structures with on-chip R, C and inductive components from a package involving about 5000 nodes. The number of clock domains in the test cases varies from 4 to 10. A node at the center of a clock domain is selected as the observation point, whose voltage response is simulated. In the experiment, waveform  $y_0(t)$  for a different clock domain is similar to that shown in Fig. 2. So the number of cycles required for superimposition is 6. The computational times of the ILP based method and the enumeration method for identifying the worst-case clock gating pattern are listed in Table 1.

We first compare the computational time between the enumeration method ("T\_enum.") and the proposed ILP based method ("T\_ILP"). The enumeration method only works for cases with small numbers of clock domains. For the first two cases, the enumeration method consumes 21 seconds which is over 200 times slower than the ILP based method. Our ILP based method works efficiently for complicated cases with more domains, and provides an optimal solution. The computational time in Table 1 does not include the simulation time. The proposed method is applied on the simulation results which are achievable by an efficient algorithm even for circuit of millions nodes [1].

The results of finding the maximum peak noise are then compared with that of the maximum violation area. The violation areas given by both methods are shown in column 5 and 6 in Table 1. The percentage of underestimation is shown in the last column with the average of 13.09%. Fig. 5 shows the results of voltage violation for test case 1 using both algorithms.  $V_{dd}$  is 1V and  $V_{min}$  is 0.9V. The red dot curve and blue solid curve are the voltage responses obtained with the algorithm to maximize peak noise ("MaxPeak") and the proposed algorithm,

respectively. The violation area abtained by the proposed method is depicted as the shaded area, which is remarkably larger than that of MaxPeak. The worst violation area clock gating pattern given by the proposed algorithm in this example is {110110, 110110, 101101, 101101}, with each group for a clock domain.

#### 5. Conclusion and Future Work

In this paper, we propose an efficient method to identify the worst-case clock gating pattern and maximum violation area, based on the ILP formulation. The proposed method is over 200 times faster than the conventional method based on enumeration. The future work includes utilizing the efficient method to help designers estimate the worst case noise, and for optimization of decap allocation.



Fig. 5 Comparison of voltage violation between both MaxPeak and MaxArea algorithm.

#### Acknowledgment

The authors would like to acknowledge the support of NSF CCF-0618163 and California MICRO Program.

#### References

[1] W. Zhang, L. Zhang, et. al, "Fast power network analysis with multiple clock domains", *ICCD 2007*.

[2] J. Fan, I. Liao, S. X.-D. Tan, et. al, "Localized on-chip power delivery network optimization via sequence of linear programming", *ISQED 2006*, pp. 272-277.

[3] Z. Qi, H. Li, S. X.-D. Tan, et. al, "Fast decap allocation algorithm for robust on-chip power delivery", *ISQED* 2005, pp: 542-547.

[4] H. Li, S. Bhunia, et. al, "Deterministic clock gating for microprocessor power reduction", *Int. Symp. High-Performance Computer Architecture 2003*, pp:113-122.
[5] http://www.ilog.com/products/cplex/