# Analysis and Optimization of Low-Power Passive Equalizers for CPU–Memory Links

Ling Zhang, Wenjian Yu, Senior Member, IEEE, Yulei Zhang, Student Member, IEEE, Renshen Wang, Alina Deutsch, Fellow, IEEE, George A. Katopis, Fellow, IEEE, Daniel M. Dreps, James Buckwalter, Member, IEEE, Ernest S. Kuh, Life Fellow, IEEE, and Chung-Kuan Cheng, Fellow, IEEE

Abstract—Several types of low-power passive equalizer are investigated and optimized in this paper. The equalizer topologies include T-junction, parallel R-C, and series R-L structures. These structures can be inserted either at the driver or the receiver side at both the chip and package level to improve the channel bandwidth of central processing unit (CPU)-memory links. Using the eye area as the objective function to be maximized, we optimize these equalizers for the CPU-memory interconnection of an IBM POWER6 system with and without practical constraints on the RLC parameter values. An efficient optimization flow combined with an algorithm predicting the worst case eye diagram is proposed and employed to optimize 42 equalizer schemes. Simulation results show that, without employing any equalizer, the data eye is closed for the bit rate of 6.4 Gb/s, while the equalized schemes can work at the bit rate of 8 Gb/s. Very promising improvements in eye height and jitter are observed with little power overhead. Simulation results also show the sensitivity of the equalization schemes to the RLC values and the effect of coupling noise.

*Index Terms*—Central processing unit-memory link, eye diagram, low power, optimization, passive equalization.

#### I. INTRODUCTION

THE power and performance of packaging-level interconnects have become crucial for optimized system

Manuscript received December 7, 2008; revised November 5, 2010; accepted March 1, 2011. Date of current version September 21, 2011. This work was supported in part by the National Science Foundation (NSF) under Program NSF CCF-1017864. The work of W. Yu was supported in part by the NSF of China under Grant 61076034. Recommended for publication by Associate Editor M. Cases upon evaluation of reviewers' comments.

L. Zhang was with the Department of Computer Science and Engineering, University of California, San Diego, CA 92037 USA. She is now with Broadcom Corporation, San Diego, CA 92127 USA (e-mail: flizhang@cs.ucsd.edu).

W. Yu is with the Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China (e-mail: yu-wj@tsinghua.edu.cn).

Y. Zhang and J. Buckwalter are with the Department of Electrical and Computer Engineering, University of California, San Diego, CA 92037 USA (e-mail: fy1zhang@ece.ucsd.edu; buckwalterg@ece.ucsd.edu).

R. Wang and C.-K. Cheng are with the Department of Computer Science and Engineering, University of California, San Diego, CA 92037 USA (e-mail: rewang@cs.ucsd.edu; ckchengg@cs.ucsd.edu).

A. Deutsch is with the IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 USA (e-mail: deutsch@us.ibm.com).

G. A. Katopis is with the IBM System Group, Poughkeepsie, NY 12601 USA (e-mail: katopis@us.ibm.com).

D. M. Dreps is with the IBM System and Technology Group, Austin, TX 78758 USA (e-mail: drepsdm@us.ibm.com).

E. S. Kuh is with the Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA 94720 USA (e-mail: kuh@eecs.berkeley.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCPMT.2011.2157149

performance. While multicore architectures increase the onchip computing processing capability, inter-chip communication bandwidth must expand to accommodate this processing demand. Meanwhile, minimizing signaling power is becoming an ever greater challenge since many conventional approaches that improve performance increase the system power consumption. Therefore, a low-power signaling scheme is necessary.

An important approach to combating inter-symbol interference (ISI) is equalization. In the 1920s, the concept of equalization was introduced in [1] and [2]. In [3], a constant-*R* ladder network (as shown in Fig. 1) was described, which behaves as an equalizer. The ladder satisfies the condition  $Z_1Z_2 = R^2$ . When it is terminated with resistance *R*, its input impedance is *R* as well. Therefore, multiple ladders can be cascaded. In application,  $Z_1$ ,  $Z_2$  are implemented using RLC components. Due to the low-pass characteristic of typical transmission lines, high-frequency components in the input signals suffer more attenuation than low-frequency components, resulting in reduced bandwidth and ISI. Passive equalizers generally act as a high-pass filters and, therefore, compensate the magnitude of different frequencies. This alleviates the ISI through bandwidth enhancement and the overall capacity.

In 2005, an adaptive passive equalizer based on an RLC T-junction network was introduced, which was demonstrated to have better power efficiency than active equalizers [4]. Shin *et al.* from Intel proposed three passive equalizers in [5] for the driver side. The equalization schemes include T-junction and parallel R-C. They demonstrated that 90-mV eye opening at 10 Gb/s is feasible for a 19-in long differential pair with 1.2 V supply voltage. Guo *et al.* analyzed the equalization schemes using an inductor and high-impedance line at the receiver side in [6]. They optimized and implemented the schemes for an ideal printed circuit board trace with length of 38 in, where the eye opening ranged from 170–190 mV with 0.8 V supply voltage and 5 Gb/s data rate.

In this paper, we investigate several simple and effective passive equalizer components, namely, T-junction, parallel R-C, and serial R-L structures. These components can work at both the driver and receiver sides for central processing unit (CPU)–memory links. Their combinations suggest various equalization schemes. For the CPU–memory interconnection of an IBM POWER6 system, the passive equalization schemes were optimized to obtain the maximum eye area. Simulation results show that, with low power consumption, the equalized scheme can greatly improve the eye quality when higher data rate and crosstalk are considered.



Fig. 1. Constant-R ladder: input impedance is R when  $Z_1Z_2 = R^2$ .



Fig. 2. Structure of the CPU-memory link.

The main contributions of this paper include the following. 1) The equalization effects of components T-junction, parallel R-C, and serial R-L at driver and receiver sides are analyzed, and several claims are drawn to help the design of equalization. 2) An efficient optimization flow combined with an algorithm predicting the worst case eye diagram is proposed and applied to design the passive equalizers for the CPU–memory link of an IBM POWER6 system. 3) Various passive equalizer schemes are analyzed and compared, and simulation results show significant performance improvement with little power overhead. 4) With consideration of size limit, the optimized passive equalizers are demonstrated to be insensitive to the variations of RLC parameters, and robust to crosstalk effect.

### II. CPU-MEMORY LINKS IN IBM POWER6 SYSTEM

We simulate the passive equalizer schemes based on the CPU-memory link of IBM POWER6 system. IBM introduced POWER6 microprocessor-based systems in 2007. The dual-core microprocessor has been fabricated in a 65-nm silicon-on-insulator process and contains over 700 M transistors. It can operate at over 5 GHz frequency for highperformance applications and consumes less than 100 W for low power applications [7]. Due to these two modes of operation, both the speed and the power are important design considerations for the POWER6 I/O circuitry and a challenge for the corresponding interconnection design.

According to [8], there are more than 800 wires coming off the processor chip dictated by system performance and scaling requirements. The total I/O bandwidth is around 300 Gb/s. The links between CPU and memory have a bit rate of upto 3.2 Gb/s/wire for single ended and 6.4 Gb/s/wire for differential pair.

Each POWER6 chip includes two integrated memory controllers [9]. A memory controller supports up to four parallel channels, each of which can be connected through an off-chip



Fig. 3. Equalization components (a) R-L, (b) R-C, and (c) T-junction.

interface to 1–4 buffer chips daisy-chained together. A channel supports a 2-byte read data path, a 1-byte write data path, and a command path that operates four times faster than the DRAM frequency, which is up to 800 MHz. For some system configurations, buffer chips are mounted on the system board, through the industry standard dual inline memory modules (DIMMs) card. We use the off-chip CPU–memory links as the test case of the equalizer schemes. Simulation results show the signal quality is improved with little overhead of power consumption.

The channel is a 20-in long differential pair with  $50-\Omega$  characteristic impedance, We test it at the data rate of 6.4 Gb/s. The representative critical path of the channel, from the chip carrier through card, board, to memory module, is modeled and analyzed. The model takes all the fan-out, connector, and via array discontinuities into account. Fig. 2 shows the schematic chart of the CPU–memory link, where the possible equalizers are drawn as dashed blocks. Because of manufacturing limit at board level, reflections from packaging to card trace and dimm trace to memory module are more obvious. The driver-side equalizer can be either at on-chip or package level, while receiver-side equalizer is placed near either the port RXPKG or the output. We observe waveforms at the input, output ports, and two internal ports TXPKG and RXPKG, as shown in Fig. 2.

#### **III. PASSIVE EQUALIZATION COMPONENTS AND SCHEMES**

Three basic equalizer components are considered, i.e., R-L, R-C, and T-junction, as shown in Fig. 3. To preserve the constant  $Z_0$  property, the RLCG components in T-junction satisfy [10]

$$\frac{R}{G} = Z_0^2, \frac{L}{C} = Z_0^2.$$
(1)

The ladder structure is not considered in this paper because ladder is equivalent to T-junction in terms of transfer function at both driver and receiver sides, but consumes more power than T-junction when used at the driver side. We can look at their input impedance to verify it. For the ladder used at the driver, we assume source resistance is  $Z_0$  and  $R = Z_0$ . A parallel R-C ( $R_d$ ,  $C_d$ ) is used to implement  $Z_1$  and  $Z_2$  can be derived using  $Z_2 = R^2/Z_1$ . After algebraic operation, its input impedance at dc can be written as

$$R_{in}^{DC} = Z_0 + \frac{Z_0}{1+r}, r = \frac{2R_d^2}{(2R_d + Z_0)Z_0}.$$
 (2)

|         | ENDEED MILD CONGED OF 14      | OI OLOGILD     |                |
|---------|-------------------------------|----------------|----------------|
| Label   | Topology                      | Driver         | Load           |
|         |                               | resistance     | resistance     |
| М       | match (no equalizer)          | Z <sub>0</sub> | Z <sub>0</sub> |
| S       | R-L (on-chip)                 | N.A.           | Infinity       |
| Р       | R-C (on-chip)                 | 10Ω            | $R_L^{1}$      |
| $T_m^c$ | Matched on-chip T-junction    | Z <sub>0</sub> | Z <sub>0</sub> |
| $T_m^p$ | Matched off-chip T-junction   | $Z_0$          | $Z_0$          |
| $T_u^c$ | Unmatched on-chip T-junction  | 10Ω            | $R_L$          |
| $T_u^p$ | Unmatched off-chip T-junction | 10Ω            | $R_L$          |
| 1       |                               | . ~            |                |

TABLE I LABELS AND USAGES OF TOPOLOGIES

 $^{1}R_{L}$  is a variable determined by optimization flow.

TABLE II GROUPS OF SCHEMES ACCORDING TO THE MATCHING CONDITIONS

| Group | Topologies at driver side    | Topologies at receiver side     |
|-------|------------------------------|---------------------------------|
| G1    | Matched: $M, T_m^c, T_m^p$   | Matched: $M, T_m^c, T_m^p$      |
| G2    | Unmatched: $P, T_u^p, T_u^c$ | Matched: $M, T_m^c, T_m^p$      |
| G3    | Matched: $M, T_m^c, T_m^p$   | Unmatched: $P, T_u^p, T_u^c, S$ |
| G4    | Unmatched: $P, T_u^p, T_u^c$ | Unmatched: $P, T_u^p, T_u^c, S$ |

Given the fact that r > 0, we know the input resistance is less than  $2Z_0$ , which is the value of T-junction input resistance. Similarly, we can derive the ac input impedance of the ladder, whose magnitude is less than  $2Z_0$  as well. Therefore, the ladder always consumes more power than T-junction.

The three types of lumped elements can be used at either side or both sides of the channel. For R-L and R-C structures, only on-chip implementation is considered. The reason is that, due to unmatched nature, when these structures are used off-chip, they generate excessive reflection and produce inferior results. T-junction can be implemented on-chip and off-chip because of matching. We summarize and label the usage of the components in Table I. Column 3 lists the driver resistance when the equalizer is at the driver, while Column 4 gives the load resistance when the equalizer is at the receiver. For R-L, the driver resistance is not available because R-L is never used at the driver side, and the load resistance is infinity because R-L serves as load itself and no external load is needed. For R-C, a 10- $\Omega$  driver resistance is considered, and the load resistance is treated as a variable  $R_L$ , which will be determined by our optimization flow [as shown in (31)]. For on-chip and off-chip T-junction, we explore both the matched (labeled as  $T_m^c$  and  $T_m^p$ ) and unmatched cases (labeled as  $T_u^c$  and  $T_u^p$ ). For the matched case, both driver and load resistance are  $Z_0$ , for the unmatched case, the condition is the same as R-C. The match structure without any equalizer is given label M for reference.

Given these seven basic topologies, there are many different schemes combining them at the driver and receiver side. We group the schemes according to the matching condition at both sides, as shown in Table II. For example, schemes in Group 1 have matched driver and receiver. It incudes M + M (match at driver + match at receiver),  $M + T_m^c$  (match at driver + on-chip T-junction at receiver),  $M + T_m^p$  (match at driver + off-chip T-junction at receiver),  $T_m^c + M$ ,  $T_m^c + T_m^c$ ,  $T_m^c + T_m^p$ ,



Fig. 4. Block diagrams of equalizer at (a) driver and (b) receiver.

 $T_m^p + M$ ,  $T_m^p + T_m^c$ , and  $T_m^p + T_m^p$ . The total number of schemes in Group 1 is 9. Following the same way of combination, the number of schemes in Group 2, 3, and 4 are 9, 12, and 12.

Because the matching condition has a predominant effect on the eye diagram, it determines the performance of each group. With matched driver and receiver, only slight reflection exists, therefore the jitter is small. With unmatched driver or receiver, there exist reflections affecting the height of the eye.

#### IV. ANALYSIS OF THE EQUALIZATION TOPOLOGIES

In this section, we analyze and compare the topologies that are adopted in these schemes. The *s*-parameter, input impedance, and voltage gain are derived for each topology. Then we illustrate the transfer functions and compare different topologies. Finally three claims summarize the section.

Terms used for voltage and impedance in the subsequent subsections follow Fig. 4. For simplicity, discontinuities are not considered and the channel is treated as  $Z_0$ . A source impedance  $Z_g$  connects the driver equalizer (if used) to the driver output, and a load impedance  $Z_L$  terminates the receiver equalizer (if used).  $V_s$  is injected voltage,  $V_1$  and  $V_2$  are input and output voltages of driver equalizer,  $V_3$  is voltage output for matched termination, and  $V_4$  and  $V_{out}$  are input and output voltages of the receiver equalizer. Regarding the input reflections  $\Gamma_{in}$  and impedances  $Z_{in}$ , which we will derive for each topology, they are defined at the port between  $V_1$  and ground for the driver equalizer, and between  $V_4$  and ground for the receiver equalizer. For voltage gain derivations, we write them in terms of  $V_s$  for the driver equalizer and  $V_3$  for the receiver equalizer. Therefore, the voltage effect of equalization can be readily observed. In following derivations, superscripts are added to voltage and input impedance to indicate the topologies.

#### A. R-L Structure (Only Used at Receiver)

The S matrix of R-L can be written as

$$S_{R-L} = \begin{bmatrix} \Gamma_0 & 1 + \Gamma_0 \\ 1 + \Gamma_0 & \Gamma_0 \end{bmatrix}$$
(3)

where

$$\Gamma_0 = \frac{\frac{Z_{R-L} + Z_0}{Z_{R-L} Z_0} - Z_0}{\frac{Z_{R-L} + Z_0}{Z_{R-L} Z_0} + Z_0}, \quad Z_{R-L} = R + sL.$$
(4)

The input impedance of R-L is

$$Z_{in}^{R-L} = Z_{R-L} \tag{5}$$

and the R-L output voltage is

$$V_{out}^{R-L} = \frac{Z_{R-L}}{Z_{R-L} + Z_0} V_3.$$
 (6)

#### B. R-C Structure

The S matrix of R-C is

$$S_{R-C} = \begin{bmatrix} \Gamma_0 & (1+\Gamma_0)\frac{Z_0}{Z_{R-C}+Z_0} \\ (1+\Gamma_0)\frac{Z_0}{Z_{R-C}+Z_0} & \Gamma_0 \end{bmatrix}$$
(7)

where

$$\Gamma_0 = \frac{Z_{R-C}}{Z_{R-C} + 2Z_0}, \quad Z_{R-C} = \frac{R}{sRC + 1}.$$
 (8)

1) R-C at Receiver: In this case, the load reflection is

$$\Gamma_L = \frac{Z_L - Z_0}{Z_L + Z_0}$$
(9)

and the input reflection of R-C is

$$\Gamma_{in}^{R-C} = \frac{Z_{in} - Z_0}{Z_{in} + Z_0} = 1 - \frac{2Z_0(1 - \Gamma_L)}{2Z_0 + (1 - \Gamma_L)Z_{R-C}}.$$
 (10)

The input impedance of R-C can be written as

I

$$Z_{in}^{R-C} = Z_{R-C} + Z_L = Z_0 \frac{1+\Gamma_L}{1-\Gamma_L} + Z_{R-C}.$$
 (11)

The output voltage of R-C is

$$V_{out}^{R-C} = \frac{Z_L}{Z_{R-C} + Z_L + Z_0} V_3.$$
 (12)

2) *R-C at Driver:* For a matched channel, the load reflection (at port  $V_2$ )  $\Gamma_L = 0$ , and the input reflection (at port  $V_1$ ) of R-C is

$$\Gamma_{in}^{R-C} = s_{11} = \frac{Z_{R-C}}{Z_{R-C} + 2Z_0}.$$
(13)

The input impedance of R-C is

$$Z_{in}^{R-C} = Z_{R-C} + Z_0. (14)$$

The R-C output voltage  $V_2$  can be written as

$$V_2^{R-C} = \frac{Z_0}{Z_{in} + Z_g} V_s.$$
 (15)

## C. T-Junction

The S matrix of the T-junction is

$$S_T = \begin{bmatrix} 0 & \frac{Z_2}{Z_0 + Z_2} \\ \frac{Z_2}{Z_0 + Z_2} & 0 \end{bmatrix}$$
(16)

where  $Z_2 = R+sL$  [as shown in Fig. 3(c)] is the impedance of serial R-L component. Notice that  $s_{11}$  and  $s_{22}$  of the T-junction are zero when the channel is matched, and the T-junction has no reflection and behaves as  $Z_0$ .

1) *T*-Junction at Receiver: When the load impedance is  $Z_L$ , the load reflection  $\Gamma_L$  is the same as (9), and the input reflection of the T-junction (at port  $V_4$ ) is

$$\Gamma_{in}^{T} = s_{12}s_{21}\Gamma_{L} = \left(\frac{Z_{2}}{Z_{0} + Z_{2}}\right)^{2}\Gamma_{L}.$$
 (17)

The input impedance of the T-junction is

$$Z_{in}^{T} = \frac{1 + \Gamma_L \left(\frac{Z_2}{Z_2 + Z_0}\right)^2}{1 - \Gamma_L \left(\frac{Z_2}{Z_2 + Z_0}\right)^2} Z_0.$$
 (18)



Fig. 5. Simulated transfer functions of R-L (S), channel with R-L (M+S), and channel without R-L (M+M). The -3-dB bandwidth is 2.65 GHz.

The transfer function of the T-junction without reflection is

$$H_T = \frac{V_{out}^+}{V_4^+} = s_{21} = \frac{Z_2}{Z_0 + Z_2}.$$
 (19)

Considering reflection  $V_{out}^-$ , the voltage at output is

$$V_{out}^T = V_3 \frac{s_{21}}{2} (1 + \Gamma_L).$$
 (20)

For on-chip unmatched T-junction  $T_u^c$ , the output voltage is determined by the termination. For the other three types of T-junctions,  $\Gamma_L = 0$  and  $V_{out} = V_3 s_{21}/2$ .

2) *T-Junction at Driver:* In this case, both the  $\Gamma_L$  (at port  $V_2$ ) and  $\Gamma_{in}$  (at port  $V_1$ ) are zero, and

$$Z_{in}^T = Z_0. (21)$$

Therefore,  $V_2$  of the T-junction is

$$V_2^T = \frac{s_{21}Z_0}{Z_0 + Z_g} V_s.$$
(22)

For on-chip unmatched T-junction  $T_u^c$ ,  $Z_g = 10 \Omega$  (Table I, row 7), so  $V_2 = 5V_s s_{21}/6$ . For the other three types of T-junctions,  $Z_g = Z_0$ , and  $V_2 = V_s s_{21}/2$ .

#### D. Structure Comparison

In this section, we compare the transfer function, output voltage, and input reflection of the different structures and summarize the results with three claims.

Figs. 5–7 show the simulated transfer functions<sup>1</sup> of the three structures themselves (dash lines) and the channels with (dash dot lines) and without (solid lines) using equalizers. We can clearly see the high-pass filter effect of R-L, R-C, and T-junction. The 3-dB bandwidth of original channel is 0.65 GHz, and by using equalizers we can greatly improve it. For R-L and R-C, the transfer functions have zigzags in high frequencies because these two topologies are not matched and introduce reflections. In contrast, the transfer function of the T-junction is much smoother, which means it reduces the internal reflection due to channel discontinuities.

<sup>1</sup>Transfer function of the equalizer is defined as  $V_2/V_1$  for the driver equalizer or  $V_{out}/V_4$  for the receiver equalizer. Transfer function of the channel is defined as  $V_{out}/V_1$ .



Fig. 6. Transfer functions of R-C (P), channel with R-C (P + M), and channel without R-C (M + M). The -3-dB bandwidth is 1.87 GHz.



Fig. 7. Simulated transfer functions of T-junction  $(T_m^c)$ , channel with T-junction  $(T_m^c + M)$ , and channel without T-junction (M + M). The -3-dB bandwidth is 3.05 GHz.

To do further comparison, we need to have the following assumptions:

$$Z_2 Z_{R-C} = Z_{R-L} Z_{R-C} = Z_0^2$$
(23)

where  $Z_2 = R + sL$  [as shown in Fig. 3(c)], and  $Z_{R-C}$  and  $Z_{R-L}$  are the input impedance of R-C and R-L as defined in (8) and (4). When (23) is satisfied, different structures have the same transfer function, which provide a fair condition for comparisons on voltages and reflections. With these assumptions, we have the following three claims.

*Claim 1:* At the driver side, R-C has larger output voltage than T-junction.

Substituting  $Z_2$  with  $Z_0^2/Z_{R-C}$ , the driver side output voltage of the T-junction in (22) can be rewritten as

$$V_2^T = \frac{Z_0}{Z_0 + Z_g + Z_{R-C} + \frac{Z_g Z_{R-C}}{Z_0}} V_s$$
(24)

while the driver side output voltage of R-C is

$$V_2^{R-C} = \frac{Z_0}{Z_0 + Z_g + Z_{R-C}} V_s \tag{25}$$

according to (15). Comparing (24) and (25),  $V_2^T$  has an extra term  $Z_g Z_{R-C}/Z_0$  in the denominator, so it is smaller than  $V_2^{R-C}$ . When frequency approaches zero,  $Z_{R-C}$  approaches *R* 

and hence  $V_2^T < V_2^{RC}$ . When the frequency goes to infinity,  $Z_{R-C}$  becomes zero and so  $V_2^T = V_2^{R-C} = Z_0 V_s / (Z_g + Z_0)$ . It indicates that at the driver side, T-junction has stronger ability to compensate the high-frequency loss of the channel.

*Claim 2:* At the receiver side, R-C and R-L have larger output voltage than T-junction.

Again, substituting  $Z_2$  with  $Z_0^2/Z_{R-C}$ , the output voltage of the T-junction in (20) becomes

$$V_{out}^{T} = V_{3}(1 + \Gamma_{L})\frac{s_{21}}{2}$$
  
=  $V_{3}(1 + \Gamma_{L})\frac{Z_{0}}{2Z_{0} + 2Z_{R-C}}$  (26)

and the output voltage of R-C in (12) is

$$V_{out}^{R-C} = V_3(1+\Gamma_L) \frac{Z_0}{(1-\Gamma_L)Z_{R-C} + 2Z_0}.$$
 (27)

Since  $\Gamma_L \ge -1$  is always true, the denominator of  $V_{out}^T$  has larger magnitude than the denominator of  $V_{out}^{R-C}$ . Therefore, R-C always has a larger output than the T-junction at the receiver side. Substituting  $Z_{R-L}$  with  $Z_0^2/Z_{R-C}$ , the output voltage of R-L in (6) becomes

$$V_{out}^{R-L} = \frac{Z_0}{Z_{R-C} + Z_0} V_3.$$
(28)

Since  $(1 + \Gamma_L)/2 \le 1$  is always true, R-L has a larger output than the T-junction. When *s* goes from 0 to infinity,  $Z_{R-C}$  reduces from *R* to 0, at which point  $V_{out}^T = V_{out}^{R-C} = V_3(1 + \Gamma_L)/2$ , and  $V_{out}^{R-L} = V_3$ . It indicates that R-L has a larger output voltage at high frequency.

*Claim 3:* At receiver side R-C has larger input reflection than the T-junction.

We rewrite (10) as

$$\Gamma_{in}^{R-C} = \Gamma_L + \frac{(1 - \Gamma_L)^2 Z_{R-C}}{2Z_0 + (1 - \Gamma_L) Z_{R-C}}$$
(29)

which means the R-C structure amplifies the load reflection. In contrast, (17) shows that the input reflection of the T-junction is always smaller than the load reflection, which means that the T-junction has the ability to reduce reflections and alleviate discontinuity effects.

## V. SEQUENTIAL QUADRATIC PROGRAMMING (SQP) Optimization Flow

From the discussion in Section IV, it can be seen that the equalizer parameters determine the scattering parameters and influence the input impedance and the output voltage. Therefore, for a given channel, there exists optimal values of these RLC parameters in terms of the eye opening and jitter.

Since we want to maximize the eye opening and minimize jitter, we define the cost function as

$$f(x) = -V_{eye} \times (T_c - jitter)$$
(30)

in which x stands for the optimization variables, and  $T_c$  is the cycle time.  $V_{eye}$  and *jitter* are the worst case eye-opening voltage and timing jitter, respectively. The cost function f(x) reflects the white area in the eye diagram when the eye is a quadrangle, which is valid in the experiments.

TABLE III Optimization Variables for Each Component

| Label   | Driver side | Receiver side   |
|---------|-------------|-----------------|
| М       | None        | None            |
| S       | N.A.        | $R_t, L_t$      |
| Р       | $R_d, C_d$  | $R_t, C_t, R_L$ |
| $T_m^c$ | $R_d, C_d$  | $R_t, L_t$      |
| $T_m^p$ | $R_d, C_d$  | $R_t, L_t$      |
| $T_u^c$ | $R_d, C_d$  | $R_t, L_t, R_L$ |
| $T_u^p$ | $R_d, C_d$  | $R_t, L_t, R_L$ |

The variables in the optimization include all the independent RLC parameters in the equalizers and the load resistance  $R_L$ . The number of variables varies for the different schemes, as shown in Table III. We use subscript "d" and "t" to differentiate the driver side and the receiver side parameters. For instance, scheme  $T_m^c + M$  has variables  $R_d$  and  $C_d$  at the driver side, which determines the R, C values in the parallel branch of T-junction. For scheme  $T_m^c + T_u^c$ , there are five variables.

The problem formulation can be written as

$$\begin{array}{ll} \min & f(x) \\ \text{s.t. } 0 \leq R_t \leq R_{\max} \\ 0 \leq R_d \leq R_{\max} \\ 0 \leq R_L \leq R_{\max} \\ 0 \leq C_d \leq C_{\max} \\ 0 \leq L_t \leq L_{\max} \\ 0 \leq C_t \leq C_{\max} \end{array}$$
(31)

where  $R_{\text{max}}$ ,  $C_{\text{max}}$ , and  $L_{\text{max}}$  are upper bounds for the variables.

For a given scheme and a set of variables, generating the eve diagram using circuit simulation with pseudo-random bit sequence (PRBS) is very time consuming, especially when employed in an iterative optimization flow. A peak-distortion analysis method was proposed in [11] to estimate the worst case eye-opening voltage, which regards a general input bit sequence as the composition of unit pulse signals. Then for a linear time-invariant system, the quality of eye diagram can be analyzed with the system's unit pulse response. The saturated ramp signal (also called step signal) is more fundamental than the unit pulse signal, because the pulse can be produced by a rising step and a falling step signals. In [12] and [13], the eve-opening voltage and timing jitter were estimated from the system's step response. However, their methods made assumptions on the number of local minima and maxima points in the step response's waveform. An accurate prediction method based on step response is established in [14] and [15], which is suitable for a general step response waveform and considers asymmetric signal transition.

For the design of CPU–memory links, the step-responsebased method is suitable for the prediction of  $V_{eye}$  and jitter, and the step responses include more physical intuition for optimizing the equalizers. With each given scheme and a set of variables, the step response is generated by HSPICE transient



Fig. 8. Optimization flow for equalizer parameters.

simulation. Afterwards, the method in [14] is used to predict the worst case eye opening and jitter. Because of the quick saturation of the step-response waveform, the eye diagram prediction consumes much less time.

The relation between the cost function and variables is complex and there is no closed-form solution. We use the SQP method to solve it, which is the state-of-the-art nonlinear programming method, and has been implemented in MATLAB. Based on this paper of [16] and [17], the method closely mimics Newton's method for constrained optimization. At each iteration, a quasi-Newton updating method is used to derive the approximated Hessian of the Lagrangian function, which is then used to generate a quadratic programming subproblem, whose solution is used to obtain a search direction for a line search procedure. The SQP method relies on gradient information and may be sensitive to the starting point.

The overall optimization flow is illustrated in Fig. 8. Inputs include the type of equalization schemes and the initial design variables. The SQP flow accepts the input information and, after a number of iterations, outputs the optimal design variables and the corresponding performance metrics. In each iteration of the SQP flow, first a SPICE net list is generated according to the current design variables, and then SQP flow calls HSPICE to do circuit simulation with step input, in which the channel is described by an *s*-parameter model. After that, the step response is fed into the eye prediction algorithm to derive the worst case eye opening and jitter. Having the eye quality, SQP flow evaluates the cost function and determines the design variable values for next iteration step.

#### VI. SIMULATION RESULTS

We model the 20-in long CPU–memory links with *s*-parameters and perform HSPICE simulation. The supply voltage is 1.1V and the bit rate is 6.4 Gb/s with a rise/fall time of 45 ps. We implemented the SQP optimization flow in MATLAB and performed equalizer optimization. We compare the matched I/O results, in which all external and internal ports are matched with  $100-\Omega$  differential impedance, with

| Scheme          |               | Op        | timal Solu    | tion                   |              | ormance    |            |        |
|-----------------|---------------|-----------|---------------|------------------------|--------------|------------|------------|--------|
|                 | $R_d(\Omega)$ | $C_d(pF)$ | $R_t(\Omega)$ | $L_t(nH)$ or $C_t(pF)$ | $V_{eye}(V)$ | Jitter(ps) | 3dBBW(GHz) | f      |
| M + M           | -             | -         | -             | -                      | closed       | -          | 0.65       | -      |
| $T_m^p + M$     | 56.79         | 4.65      | -             | -                      | 0.178        | 22.80      | 2.12       | -23.75 |
| $T_m^c + M$     | 57.76         | 4.54      | -             | -                      | 0.179        | 22.70      | 2.13       | -23.90 |
| $M + T_m^c$     | -             | -         | 43.28         | 11.35                  | 0.179        | 22.68      | 2.13       | -23.91 |
| $T_m^p + T_m^c$ | 11.46         | 74.23     | 58.29         | 10.59                  | 0.196        | 16.20      | 2.19       | -27.50 |
| $T_m^c + T_m^c$ | 11.46         | 74.23     | 58.29         | 10.59                  | 0.196        | 15.98      | 2.19       | -27.55 |
| $M + T_m^p$     | -             | -         | 43.28         | 11.35                  | 0.180        | 22.28      | 2.13       | -24.16 |
| $T_m^p + T_m^p$ | 12.43         | 66.23     | 47.76         | 8.16                   | 0.195        | 12.46      | 2.33       | -28.00 |
| $T_m^c + T_m^p$ | 12.43         | 66.23     | 47.76         | 8.16                   | 0.195        | 12.28      | 2.33       | -28.03 |

0.5

TABLE IV Optimization Results of Group 1 without Size Limit



Fig. 9. Transfer functions of solutions in Table IV.



#### A. Optimization Results without Size Limit

Tables IV–VII give the optimization results for 42 schemes. Optimal variable values and the corresponding eye heights and jitters at output ports, 3-dB bandwidth, and cost function f are given for each scheme. Figs. 9–14 illustrate the transfer functions and step responses of different schemes. In Figs. 9–13

Fig. 10. Step response of solutions in Table IV.

and 16, the legends are arranged according to the dc magnitude of the transfer functions. For example, in Fig. 11, scheme M+M, which has the largest dc transfer function, is on top of the legend list, and scheme  $T_u^p + T_m^p$  is at the bottom since it has the smallest dc transfer function. In Figs. 10, 14, and 17, the legends are organized according to the dc voltage level after the transition.

We notice that adopting equalizers open the eye with range from 0.12 to 0.31 V, improve the 3-dB bandwidth from 0.65 GHz to over 3 GHz, and generally speaking, matched components generate smaller jitter while unmatched components produce larger eye due to reflections. We compare different schemes group by group in the following sections.

1) Schemes in Group 1: It can be observed from Table IV that schemes in Group 1 fall into three categories according to the number of equalizers they use. If no equalizer is used (scheme M + M), there is no eye. If only one equalizer is used (schemes  $T_m^c + M$ ,  $T_m^p + M$ ,  $M + T_m^c$ , and  $M + T_m^p$ ), the eye opening is from 178–180 mV and the jitter is 22–23 ps. If two equalizers are used, the eye opening can reach 195 or 196 mV with jitter smaller than 17 ps.



| Scheme                    |                    | Op         | timal Solut    | ion                      | Performance   |             |               |        |  |  |
|---------------------------|--------------------|------------|----------------|--------------------------|---------------|-------------|---------------|--------|--|--|
|                           | $R_d$ ( $\Omega$ ) | $C_d$ (pF) | $R_t (\Omega)$ | $L_t$ (nH) or $C_t$ (pF) | $V_{eye}$ (V) | Jitter (ps) | 3 dB BW (GHz) | f      |  |  |
| P + M                     | 64.11              | 4.02       | -              | -                        | 0.121         | 49.62       | 1.87          | -24.76 |  |  |
| $T_u^c + M$               | 79.63              | 2.98       | -              | -                        | 0.263         | 37.66       | 2.64          | -31.25 |  |  |
| $T_u^p + M$               | 78.62              | 2.90       | -              | -                        | 0.277         | 35.30       | 2.67          | -33.58 |  |  |
| $P + T_m^c$               | 27.75              | 3.22       | 64.48          | 17.52                    | 0.260         | 34.52       | 1.98          | -31.66 |  |  |
| $T_u^c + T_m^c$           | 41.80              | 2.79       | 99.33          | 31.53                    | 0.265         | 31.03       | 2.62          | -33.14 |  |  |
| $T_u^p + T_m^c$           | 45.78              | 2.36       | 101.11         | 31.45                    | 0.270         | 27.42       | 2.65          | -34.84 |  |  |
| $P+T_m^p$                 | 27.58              | 3.11       | 60.42          | 16.65                    | 0.259         | 33.69       | 2.61          | -31.74 |  |  |
| $T_u^c + T_m^p$           | 37.52              | 2.78       | 84.69          | 25.29                    | 0.264         | 30.97       | 2.62          | -33.08 |  |  |
| $T_{\mu}^{p} + T_{m}^{p}$ | 74.92              | 1.71       | 150.67         | 100                      | 0.268         | 18.05       | 2.79          | -37.08 |  |  |

TABLE V Optimization Results of Group 2 without Size Limit



Fig. 11. Transfer functions of solutions in Table V.

With the condition of matching,  $T_m^c$  is equivalent to  $T_m^p$  when used at the driver side (22, 20). At the receiver side, we notice that  $T_m^p$  is slightly better than  $T_m^c$  in terms of jitter by comparing schemes  $T_m^p + T_m^c$  with  $T_m^p + T_m^p$ , and schemes  $T_m^c + T_m^c$  with  $T_m^c + T_m^p$ . The transformer  $T_m^c + T_m^p$  also have the highest 3-dB bandwidth. The reason is, as mentioned in Section II, the larger reflections between the Dimm trace module and memory package (memory module) compared to that between the memory package and the on-chip load, which can be fine-tuned to minimize mismatch. Therefore, inserting an off-chip T-junction has better ability to alleviate the discontinuities of the channel and reduce the jitter.

Fig. 9 shows the transfer functions of different schemes. We see that the 3-dB bandwidth without equalizer is 0.65 GHz, whereas using equalizers flattens the transfer function and extends the corner frequencies beyond 1 GHz. However, the equalizer reduces the low-frequency magnitude, and two equalizers introduce more reduction (Fig. 9). This explains the eye-opening difference when different numbers of equalizers are used. It can also be seen that  $T_m^p$  at the receiver side has a lower and flatter frequency response than  $T_m^c$  at the receiver side, which tells us that  $T_m^p$  has a smaller jitter.

Fig. 10 presents the step responses of schemes in Group 1. It can be seen that the low-frequency component with one



Fig. 12. Transfer functions of solutions in Table VI.

equalizer is larger than that with two, while the slew rate of their rise edges are very similar. As a result, it needs a longer time for the signal with one equalizer to get stable, and the eye opening is slightly reduced.

The observations from Group 1 can be summarized as follows.

- a) Using  $T_m^c$  or  $T_m^p$  at both sides is better than using at one side only. If only one T-junction is used, the eye opening is about 180 mV, the jitter is around 23 ps. If two T-junctions are used, the eye opening is about 195 mV, the jitter is 12–16 ps.
- b)  $T_m^c$  and  $T_m^p$  are equivalent when used at the driver side.

2) Schemes in Group 2: In Group 2, the unmatched driver side equalizer can be P,  $T_u^c$ , and  $T_u^p$ , while the matched receiver side equalizer can be M,  $T_m^c$ , and  $T_m^p$ . Based on the analysis of Group 1, we expect that, at the receiver side,  $T_m^c$  and  $T_m^p$  are very similar and they are better than M. This trend can be observed in Table V.

By examining Table V, we find that, in terms of parameter values and eye quality: 1)  $T_u^c + M$  is similar to  $T_u^p + M$ ; 2)  $P + T_m^c$  is similar to  $P + T_m^p$ ; and 3)  $T_u^c + T_m^c$ ,  $T_u^p + T_m^c$  are similar to  $T_u^c + T_m^p$ .

Table V also shows that, for the same receiver side structure,  $T_u^p$  is very similar to and slightly better than  $T_u^c$  and  $T_u^c$  is

| Scheme          | Optimal Solution |           |               |                        |               |              | Performance |             |        |  |
|-----------------|------------------|-----------|---------------|------------------------|---------------|--------------|-------------|-------------|--------|--|
|                 | $R_d(\Omega)$    | $C_d(pF)$ | $R_t(\Omega)$ | $L_t(nH)$ or $C_t(pF)$ | $R_L(\Omega)$ | $V_{eye}(V)$ | Jitter(ps)  | 3dBBW (GHz) | f      |  |
| M + P           | -                | -         | 75.90         | 4.92(pF)               | 53.10         | 0.156        | 45.03       | 1.89        | -17.37 |  |
| $T_m^c + P$     | 57.67            | 4.68      | 0             | 23.68(pF)              | 57.67         | 0.186        | 24.62       | 2.13        | -24.48 |  |
| $T_m^p + P$     | 57.67            | 4.68      | 0             | 23.68(pF)              | 57.67         | 0.183        | 24.87       | 2.13        | -24.09 |  |
| $M + T_u^p$     | -                | -         | 32.70         | 9.13                   | 99.20         | 0.209        | 28.50       | 2.39        | -26.65 |  |
| $T_m^p + T_u^p$ | 10.44            | 87.34     | 40.96         | 8.03                   | 96.4          | 0.219        | 18.38       | 2.44        | -30.23 |  |
| $T_m^c + T_u^p$ | 9.94             | 92.36     | 40.54         | 8.15                   | 99.6          | 0.220        | 18.18       | 2.45        | -30.38 |  |
| M + S           | -                | -         | 23.12         | 3.71                   | -             | 0.222        | 46.81       | 2.56        | -24.30 |  |
| $T_m^p + S$     | 18.82            | 33.03     | 33.84         | 3.27                   | -             | 0.252        | 31.09       | 2.61        | -31.54 |  |
| $T_m^c + S$     | 38.30            | 8.47      | 51.07         | 2.83                   | -             | 0.252        | 30.94       | 2.41        | -31.58 |  |
| $M + T_u^c$     | -                | -         | 15.07         | 3.26                   | 348.06        | 0.242        | 31.50       | 3.16        | -30.22 |  |
| $T_m^p + T_u^c$ | 9.64             | 99.61     | 29.45         | 5.06                   | 499.93        | 0.262        | 25.81       | 2.75        | -34.23 |  |
| $T_m^c + T_u^c$ | 9.64             | 99.61     | 29.45         | 5.04                   | 499.93        | 0.263        | 25.51       | 2.75        | -34.37 |  |

0.5

0.4

0.3

0.2

0

-0.1

-0.3

voltage (V)

TABLE VI Optimization Results of Group 3 without Size Limit



Fig. 13. Transfer functions of solutions in Table VII.

better than *P*. T-junctions have larger eyes and lower jitters than RC because, based on Claim 2 and (24), T-junctions have smaller frequency response than RC at low frequency, which means RC has higher low-frequency magnitude and needs longer time to get the signal stable. This can be observed in Fig. 11. When frequency goes up, the difference of their transfer function approaches zero, which makes T-junction to have a flatter total transfer function and has a better eye. The difference of using  $T_u^p$  and  $T_u^c$  at the driver side comes from the difference of source resistance. In  $T_u^p$ ,  $Z_g = 50 \Omega$ , and in  $T_u^c$ ,  $Z_g = 10 \Omega$ . Therefore, the frequency response at low frequency for the off-chip T-junction is smaller, and its total transfer function over all frequency range is flatter.

The observations from Group 2 can be summarized as follows.

- a) At the driver side, T-junctions are better than P. The eye opening is improved by at least 5 mV, and the jitter is reduced by 7–15 ps.
- b) At the receiver side, T-junctions are better than the match.



S

+ S

Fig. 14. Step responses of solutions in Table VII.

3) Schemes in Group 3: In Group 3, matched equalizers  $(M, T_m^c)$ , and  $T_m^p)$  are used at the driver side, and unmatched equalizers  $(P, S, T_u^c)$ , and  $T_u^p)$  are used at the receiver side. Similar to Group 2, it is observed in Table VI that, at the driver side,  $T_m^c$  and  $T_m^p$  are very similar and they are better than M.

According to the parameter values and eye quality, schemes can be further grouped as follows: 1)  $T_m^c + P$  is similar to  $T_m^p + P$ ; 2)  $T_m^p + T_u^p$  is similar to  $T_m^c + T_u^p$ ; 3)  $T_m^p + S$  is similar to  $T_m^c + S$ ; and 4)  $T_m^p + T_u^c$  is similar to  $T_m^c + T_u^c$ .

Fig. 12 shows the frequency response of these twelve schemes. It can be seen that, generally speaking, with different drivers, the transfer functions of  $T_u^c$  and  $T_u^p$  have higher corner frequency, which makes their eye better than others. Comparing the transfer functions of  $M + T_u^c$  and  $M + T_u^p$ ,  $M + T_u^p$  has larger magnitude at low frequency, and the 3-dB bandwidth is around 1 GHz, while  $M + T_u^c$  has smaller low-frequency magnitude, the 3-dB bandwidth is around 2 GHz, and its magnitude at 3.2 GHz (half of the operating frequency) is larger than  $M + T_u^p$ . This explains why  $M + T_u^c$  has a larger eye opening. For the cases of  $T_m^c$  and  $T_m^p$  at the driver side, it also can be seen that  $T_u^c$  has larger magnitude than  $T_u^p$  at 3.2 GHz.

| Scheme          |               |            | Optimal       | Solution                 | Performance   |               |             |               |        |
|-----------------|---------------|------------|---------------|--------------------------|---------------|---------------|-------------|---------------|--------|
|                 | $R_d(\Omega)$ | $C_d$ (pF) | $R_t(\Omega)$ | $L_t$ (nH) or $C_t$ (pF) | $R_L(\Omega)$ | $V_{eye}$ (V) | Jitter (ps) | 3 dB BW (GHz) | f      |
| P + P           | 64.21         | 3.93       | 0.01          | 1.72(pF)                 | 47.06         | 0.230         | 48.59       | 1.87          | -24.81 |
| $T_u^c + P$     | 67.85         | 2.39       | 26.66         | 26.96(pF)                | 53.56         | 0.264         | 28.79       | 2.72          | -33.62 |
| $T_u^p + P$     | 81.25         | 1.80       | 25.24         | 27.22(pF)                | 57.94         | 0.267         | 22.56       | 2.80          | -35.71 |
| $P + T_u^p$     | 20.95         | 0.52       | 17.92         | 5.32                     | 162.48        | 0.264         | 43.40       | 2.76          | -29.76 |
| $T_u^c + T_u^p$ | 43.06         | 2.70       | 94.50         | 32.17                    | 59.23         | 0.271         | 32.18       | 2.63          | -33.62 |
| $T_u^p + T_u^p$ | 39.54         | 2.40       | 79.31         | 59.48                    | 23.16         | 0.277         | 29.32       | 2.65          | -35.16 |
| P + S           | 58.00         | 4.12       | 46.73         | 2.13                     | -             | 0.280         | 48.23       | 1.99          | -30.27 |
| $T_u^c + S$     | 60.21         | 3.95       | 45.49         | 1.98                     | -             | 0.311         | 43.65       | 2.71          | -35.06 |
| $T_u^p + S$     | 64.32         | 3.36       | 43.82         | 2.14                     | -             | 0.328         | 41.67       | 2.79          | -37.57 |
| $P + T_u^c$     | 18.64         | 0.87       | 17.06         | 3.74                     | 363.08        | 0.313         | 40.47       | 2.70          | -36.20 |
| $T_u^c + T_u^c$ | 43.07         | 2.69       | 94.63         | 32.12                    | 59.30         | 0.271         | 32.42       | 2.63          | -33.56 |
| $T_u^p + T_u^c$ | 39.76         | 2.35       | 79.12         | 22.54                    | 60.60         | 0.276         | 30.14       | 2.65          | -34.80 |

TABLE VII Optimization Results of Group 4 without Size Limit

TABLE VIII OPTIMIZATION RESULTS OF GROUP 1 AND 2 WITH SIZE LIMIT

| Scheme          |               | Op         | timal Solu    | tion                     | Performance   |             |               |            |        |  |
|-----------------|---------------|------------|---------------|--------------------------|---------------|-------------|---------------|------------|--------|--|
|                 | $R_d(\Omega)$ | $C_d$ (pF) | $R_t(\Omega)$ | $L_t$ (nH) or $C_t$ (pF) | $V_{eye}$ (V) | Jitter (ps) | 3 dB BW (GHz) | Power (mW) | f      |  |
| M + M           | -             | -          | -             | -                        | closed        | -           | 0.65          | 7.86       | -      |  |
| $M + T_m^c$     | -             | -          | 20.50         | 5.00                     | 0.168         | 29.39       | 3.03          | 7.86       | -21.21 |  |
| $T_m^c + T_m^c$ | 0.006         | 1.66       | 20.12         | 4.96                     | 0.168         | 29.87       | 3.05          | 7.86       | -21.23 |  |
| $P + T_m^c$     | 31.87         | 4.16       | 33.67         | 5.00                     | 0.219         | 45.16       | 2.80          | 7.96       | -24.17 |  |
| P + M           | 64.11         | 4.02       | -             | -                        | 0.232         | 49.62       | 1.87          | 6.74       | -24.76 |  |
| $T_u^p + T_m^c$ | 119.09        | 1.93       | 499.91        | 5.00                     | 0.242         | 33.53       | 2.96          | 15.75      | -29.67 |  |
| $T_u^p + M$     | 98.39         | 2.00       | -             | -                        | 0.257         | 29.91       | 2.74          | 15.59      | -32.47 |  |

Scheme **Optimal Solution** Performance  $R_d(\Omega)$  $C_d$  (pF)  $R_t(\Omega)$  $L_t$  (nH) or  $C_t$  (pF)  $R_L(\Omega)$  $V_{eye}$  (V) Jitter (ps) 3 dB BW (GHz) Power (mW) 2.28 M + P176.44 1.89(pF)83.53 0.146 50.55 6.20 -15.39 - $T_m^c + P$ 136.46 1.96 0.07 0.96 (pF) 69.31 0.178 35.17 3.15 10.43 -21.50 M + S23.12 0.222 -24.30 3.71 46.81 2.65 8.32 --- $T_m^c + S$ 100.01 2.0 0.196 10.38 -24.47 53.84 2.47 31.33 3.06 3.21 -29.24  $M + T_{\mu}^{c}$ -15.53 3.75 204.38 0.236 32.52 7.81 - $T_{m}^{c} + T_{u}^{c}$ 0 0 15.13 3.35 458.32 0.250 32.67 3.17 7.79 -30.930.230 48.59 P + P64.21 3.93 0.01 1.72 (pF) 47.06 1.87 6.77 -24.81 2.13 P + S58.00 4.12 46.73 0.280 48.23 1.99 6.97 -30.27  $\overline{T_u^p} + \overline{T_u^c}$ 59.88 1.5 33.98 4.84 257.82 0.271 38.49 3.07 15.37 -32.00 $T_u^p + P$ 99.5 2.0 0 0.59 (pF) 51.13 0.258 29.44 2.75 15.60 -32.77  $P + T_u^c$ 18.64 0.87 17.06 3.74 363.08 0.313 40.47 2.97 8.77 -36.20  $T_u^p + S$ 106.71 49.67 2.22 0.302 35.66 15.66 1.97 3.10 -36.39

 TABLE IX

 Optimization Results of Group 3 and 4 with Size Limit

The transfer functions of M + P and M + S have fluctuation beyond 30 MHz, which reduces the eye opening and increases jitter. For  $T_m^c + P$ ,  $T_m^c + S$ ,  $T_m^p + P$ , and  $T_m^p + S$ , the transfer functions are much smoother. As a result, both eye and jitter are improved.

The observations from Group 3 can be summarized as follows.

- a) At the driver side, T-junctions are better than M. The eye opening is improved by 20–30 mV, and jitter is reduced by 6–20 ps.
- b) At the receiver side, P has the smallest eye opening (below 190 mV).
- c) At the receiver side,  $T_u^c$  has the largest eye opening (above 240 mV).
- d) At the receiver side,  $T_u^p$  has the smallest jitter (18–28 ps).
- e) At the receiver side, S has the largest jitter (25-31 ps).

4) Schemes in Group 4: Both the driver side and the receiver side are unmatched for the schemes and therefore reflections affect the performance in Group 4. Structures  $T_u^p$ ,  $T_u^c$ , and P



Fig. 15. Eye diagrams at the outputs of scheme M + M and three representative schemes. (a) M + M: eye is closed. (b)  $M + T_{mc}$ :  $V_{eye} = 0.19$  V, *Jitter* = 18.9 ps. (c) M + P,  $V_{eye} = 0.23$  V, *Jitter* = 24.5 ps. (d)  $P + \text{Tuc:} V_{eye} = 0.39$  V, Jitter = 26.0 ps.



Fig. 16. Transfer functions of scheme M + M and three representative schemes.

are used at both sides, but *S* is used only at the receiver side. Similar to Group 2,  $T_u^p$  is slightly better than  $T_u^c$  at the driver side. When *P* is at the driver side, the jitter is large since the transfer functions exhibit local maxima and minima at 10 and 100 MHz, respectively (see Fig. 13).

Similar to the previous groups, by observing the parameter values and eye quality, we can group the schemes as follows: 1)  $T_u^c + P$  is similar to  $T_u^p + P$ ; 2)  $T_u^c + T_u^p$  is similar to  $T_u^p + T_u^p$ ; and 3)  $T_u^c + S$ ,  $T_u^p + S$ , and  $T_u^c + T_u^c$  are similar to  $T_u^p + T_u^c$ .

With the same driver side structure, Table VII shows that structure S has large jitter on average because S has larger reflection and higher low frequency (see Fig. 14). The eye opening of P is small compared to S since, according to Claim 1, its low-frequency magnitude is lower while the reflection is obvious. We can also find that schemes  $T_u^c + T_u^p$ ,  $T_u^p + T_u^p$ ,  $T_u^c + T_u^c$ , and  $T_u^p + T_u^c$  are very similar. As can be seen in Table XI, because of relatively large reflection, schemes in Group 4 are the most sensitive to parameter variations.

The observations from Group 4 can be summarized as follows.



Fig. 17. Step responses of scheme M + M and three representative schemes.



Fig. 18. Input impedances of scheme M + M and three representative schemes.

TABLE X CPU TIME FOR OPTIMIZING SCHEME  $P + T_u^c$  with Size Limit

| CPU time for simulating one step response  | $\sim 20 \text{ s}$ |
|--------------------------------------------|---------------------|
| No. of steps searched in the SQP algorithm | 984                 |
| Total CPU time for optimizing one scheme   | 20 967 s            |

- a) At the driver side, P has large jitter (40–48 ps).
- b) At the receiver side, P has the smallest eye opening (230–267 mV).
- c) At the receiver side, S has the largest jitter (41-48 ps).
- d) Schemes  $T_u^c + T_u^p$ ,  $T_u^p + T_u^p$ ,  $T_u^c + T_u^c$ , and  $T_u^p + T_u^c$  are very similar.

5) Summary: After analyzing these four groups, we can have the following conclusions: 1) schemes in Group 1 have lower jitter due to the matching condition at both sides; 2) schemes in Group 4 have larger eye opening due to reflections; 3) when used at the driver side, structure  $T_m^c$  is very similar to  $T_m^p$ ; 4) when used at the receiver end, structure  $T_m^c$  has slightly lower jitter than  $T_m^p$ , and structure  $T_u^c$  is very similar to  $T_u^p$ ; and 5) when used at the receiver end, structure P has smaller eye opening, while S has larger jitter.

SENSITIVITY OF EYE QUALITY WITH RESPECT TO PARAMETER VARIATIONScheme $V_{eye}^{max}$  $V_{eye}^{min}$  $V_{eye}^{max}$  $J_{max}$  $J_{max}$  $J_{max}$  $J_{max}$ Scheme $V_{eye}^{max}$  $V_{eye}^{min}$  $U_{eye}$  $J_{max}$  $J_{max}$  $J_{min}$  $J_{max}$  $V_{eye}$  $V_{eye}^{max}$  $V_{eye}^{max}$  $J_{max}$  $J_{min}$  $J_{max}$  $J_{min}$  $V_{eye}$  $V_{eye}$  $V_{eye}$  $V_{eye}$  $V_{eye}$  $V_{eye}$ 

TABLE XI

| Sellellie   | ' eye | ' eye | $V_{eye}^{\max}$ | Uniax | • min | $T_c$ | $T_c$ |
|-------------|-------|-------|------------------|-------|-------|-------|-------|
| $M + T_m^c$ | 0.170 | 0.147 | 14%              | 40.1  | 25.6  | 25.7% | 16.4% |
| M + P       | 0.149 | 0.115 | 23%              | 63.7  | 48.0  | 40.8% | 30.7% |
| $P + T_u^c$ | 0.320 | 0.255 | 20%              | 56.4  | 36.6  | 36.1% | 23.4% |



Fig. 19. Eye diagrams at the output considering the crosstalk effect. (a) M + M: eye is closed. (b) M + Tmc:  $V_{eye} = 0.19$  V, Jitter = 19.4 ps. (c) M + P,  $V_{eye} = 0.24$  V, Jitter = 22.0 ps. (d) P +  $Tuc V_{eye} = 0.38$  V, Jitter = 26.2 ps.

# B. Optimization Results with Size Limit and Further Discussion

In this section, the results with considering physical size limit of the parameters are discussed. From Section VI-A, we know that structures  $T_m^p$  and  $T_m^c$  have very similar effects in all schemes, and so do  $T_u^c$  and  $T_u^p$ . Thus, we can simplify the experiments by merging the similar schemes. Tables VIII and IX show the optimization results of the 19 schemes with size limit consideration. Besides the optimal solutions, eye openings, jitters, 3-dB bandwidth, and cost functions, the total power consumptions are also listed in the tables.

Comparing with the results in Tables IV–VII, we find out that the performance of most schemes becomes worse, except for P + M, M + S, P + P, and P + S. We choose three representative schemes for further experiments:  $M + T_m^c$  (the smallest jitter), M + P (the lowest power), and  $P + T_u^c$  (the largest eye opening and almost the smallest cost function). The efficiency of the proposed optimization flow and the advantages of the obtained passive equalization schemes are demonstrated in the following subsections.

1) CPU Time of the Optimization Flow: The optimization flow is run on a Linux machine with 2.8-GHz CPU and 2-GB memory. The CPU time for optimizing a typical scheme with size limit is shown in Table X. From the table, we can see that it takes 5.8 h to find the optimal solution for this equalization scheme with five variables, and the HSPICE simulation to obtain step response is the speed bottleneck, which takes more than 90% of the CPU time.



Fig. 20. Eye diagrams of scheme  $P + T_u^c$  at higher data rate. (a) 8 Gb/s:  $V_{eye} = 180$  mV, Jitter = 28 ps. (b) 10 Gb/s:  $V_{eye} = 55$  mV, Jitter = 21 ps.

2) Eye Diagram Comparison of Different Schemes: Using the step-response-based analysis method for optimization, we still need to validate and compare the optimized schemes with actual eye diagram, measured eye openings, and jitters, as shown in Fig. 15. The eye diagrams are simulated with 300 bits of PRBS input. Scheme M + M is used as reference.

From Fig. 15 we can see that  $M + T_m^c$  satisfies matching condition at both the driver and the receiver sides and has the smallest jitter. It also has the smallest low frequency magnitude, which can be explained by observing its transfer function in Fig. 16. The transfer function has the smallest magnitude and is very smooth, which makes the jitter very small. Fig. 15 shows that  $P + T_u^c$  has the largest eye opening. Comparing the transfer functions of  $P + T_u^c$  and M + P, we notice that  $P + T_u^c$  has larger magnitude at 3.2 GHz and low frequency, which explains that the eye opening of  $P + T_u^c$ is larger. Fig. 17 shows the step responses of these schemes, where  $M + T_m^c$  is smooth and stable after 5 ns due to matched driver and receiver sides, and the other two schemes have small up and downs due to reflections.

It also shows that  $P + T_u^c$  has the smallest rise time, which produces larger eye opening, and  $M + T_m^c$  has the slowest rise edge and therefore produces the smallest eye opening.

3) Total Power Comparison of Different Schemes: From Tables VIII and IX, we can see that the power consumption of scheme M + P is 6.2 mW, which is 21% lower than that of M + M. Among the representative schemes,  $P + T_u^c$  has the highest power consumption (8.77 mW), which is about 11% of the power overhead. From the definition of power, it can be written as

$$P = \frac{1}{2}VI^* = \frac{V \cdot V^*}{2Z_{in}^*} = \frac{|V|^2}{2|Z_{in}|e^{-j\phi}}$$
(32)

where V and I are the input voltage and current, and  $\phi$  is the phase of complex input impedance. The total power consumption measured from simulation is the real power. For the same input voltage, the power is determined by the input impedance. Fig. 18 shows the input impedances magnitude of the four schemes. It reveals that the largest impedance results in the lowest power consumption, and vice versa.

4) Sensitivity of Eye Quality: To study the sensitivity of eye quality with respect to the parameters variation, we perturb the RLGC values by  $\pm 15\%$  and simulate the range of eye opening and jitter. We summarize the sensitivity results of the representative schemes in Table XI. The largest fluctuation on eye opening is 23% for scheme M + P, and the highest variation of jitter over cycle time is 13% for scheme  $P + T_u^c$ . This is due to the unmatched nature of R-C, which makes the reflection more serious when parameters vary.

5) *Effect of Crosstalk:* To study the crosstalk effect for the representative schemes, we consider eight switching neighbors (four on right and four on left) with input pattern of "0101..." simultaneously. The eye diagrams at output with crosstalk effect are shown in Fig. 19. By comparing Figs. 15 and 19, we notice that the equalization schemes are robust against crosstalk.

6) Eye Diagrams for 8 and 10 Gb/s Bit Rates: For the three representative schemes, we perform the optimization flow for 8 Gb/s (with rise/fall time of 36 ps) and 10 Gb/s (with rise/fall time of 29 ps) bit rates to determine the optimal parameter values. We find out that the two schemes with only one equalizer,  $M + T_m^c$  and M + P, do not have eye at 8 Gb/s. Scheme  $P + T_u^c$  works very well at 8 Gb/s, and the eye diagrams are shown in Fig. 20. It should be pointed out that, for the 10 Gb/s bit rate, the eye opening drops below 100 mV, which may introduce recovery problem.

#### VII. CONCLUSION

In this paper, a set of low-power passive equalizers were investigated, which includes T-junction, parallel R-C, and series R-L structures. With equalizers inserted at the driver or the receiver side, the *s*-parameters, transfer functions, and voltage reflections of the channel were analyzed and compared. The relationship between power and input impedance was also shown. An efficient optimization flow combined with an algorithm predicting the worst case eye diagram was proposed, which obtains the RLC parameters of equalizers for the maximum eye area at channel output.

With the equalizers inserted at the chip or package level, 42 equalization schemes were investigated for the CPU– memory links. The optimization flow was applied for the CPU–memory interconnection of an IBM POWER6 system. Simulation results show that the schemes with both the driver and receiver matched have smaller jitter, while the schemes with neither the driver nor the receiver matched have larger eye opening. At 6.4 Gb/s bit rate, the worst case eye height of the equalized system (scheme  $P + T_u^c$ , i.e., parallel on-chip RC structure at the driver side, unmatched on-chip T-junction at the receiver side) can be larger than 300 mV with the power cost of 8.8 mW. Another scheme  $M + T_m^c$  (matched at the driver side, matched on-chip T-junction at the receiver side) yields the minimum jitter of 29 ps with 7.9 mW power cost, which is the same as that consumed by the scheme without any equalizer. Simulation results also demonstrate that the passive equalization schemes can operate at the bit rate of 8 Gb/s, and is not sensitive to the parameter variation and crosstalk effect.

#### REFERENCES

- [1] H. A. Affel, "Equalization of carrier transmissions," U.S. Patent 1 511 013, 1924.
- [2] H. W. Bode, "Attenuation equalizer," U.S. Patent 2 096 027, 1936.
- [3] E. S. Kuh and D. O. Pederson, *Principles of Circuit Synthesis*. New York: McGraw-Hill, 1959.
- [4] R. Sun, J. Park, F. O'Mahony, and C. P. Yue, "A low-power, 20-Gb/s continuous-time adaptive passive equalizer," in *Proc. IEEE Int. Symp. Circuit Syst.*, vol. 2. May 2005, pp. 920–923.
- [5] J. Shin and K. Aygun, "On-package continuous-time linear equalizer using embedded passive components," in *Proc. IEEE Electr. Perform. Electron. Packag.*, Atlanta, GA, Oct. 2007, pp. 147–150.
- [6] W.-D. Guo, F.-N. Tsai, G.-H. Shiue, and R.-B. Wu, "Reflection enhanced compensation of lossy traces for best eye-diagram improvement using high-impedance mismatch," *IEEE Trans. Adv. Packag.*, vol. 31, no. 3, pp. 619–626, Aug. 2008.
- [7] J. Friedrich, B. McCredie, N. James, B. Huott, B. Curran, E. Fluhr, G. Mittal, E. Chan, Y. Chan, D. Plass, S. Chu, H. Le, L. Clark, J. Ripley, S. Taylor, J. Dilullo, and M. Lanzerotti, "Design of the POWER6 microprocessor," in *Proc. IEEE Int. Solid State Circuits Conf.*, San Francisco, CA, Feb. 2007, pp. 96–97.
- [8] D. Dreps, "The 3rd generation of IBM's elastic interface on POWER6," HOT CHIPS, Aug. 2007.
- [9] H. Q. Le, W. J. Starke, J. S. Fields, F. P. O'Connell, D. Q. Nguyen, B. J. Ronchetti, W. M. Sauer, E. M. Schwarz, and M. T. Vaden, "IBM POWER6 microarchitecture," *IBM J. Res. Develop.*, vol. 51, no. 6, pp. 639–662, Nov. 2007.
- [10] J. E. Storer, Passive Network Synthesis. New York: McGraw-Hill, 1957.
- [11] B. K. Casper, M. Haycock, and R. M. Mooney, "An accurate and efficient analysis method for multi-Gb/s chip-to-chip signaling schemes," in *Proc. IEEE Symp. VLSI Circuits*, Jun. 2002, pp. 54–57.
- [12] H. Zhu, C. Cheng, A. Deutsch, and G. Katopis, "Predicting and optimizing jitter and eye-opening based on bitonic step response," in *Proc. IEEE Electr. Perform. Electron. Packag.*, Oct. 2007, pp. 155–158.
- [13] L. Zhang, W. Yu, and C. K. Cheng, "Low power passive equalizer optimization using tritonic step response," in *Proc. IEEE/ACM Des. Autom. Conf.*, Anaheim, CA, Jun. 2008, pp. 570–573.
- [14] R. Shi, W. Yu, Y. Zhu, and C. K. Cheng, "Efficient and accurate eye diagram prediction for high speed signaling," in *Proc. IEEE/ACM Int. Conf. Comput.-Aid. Des.*, Nov. 2008, pp. 655–661.
- [15] W. Yu, R. Shi, and C.-K. Cheng, "Accurate eye diagram prediction based on step response and its application to low-power equalizer design," *IEICE Trans. Electron.*, vol. E92-C, no. 4, pp. 444–452, Apr. 2009.
- [16] M. C. Biggs, "Constrained minimization using recursive quadratic programming: Some alternative subproblem formulations," in *Toward Global Optimization*, L. C. W. Dixon and G. P. Szego, Ed. Amsterdam, The Netherlands: North-Holland, 1975, pp. 341–349.
- [17] M. J. D. Powell, "A fast algorithm for nonlinearly constrained optimization calculations," in *Numerical Analysis* (Lecture Notes in Mathematics), vol. 630, G. A. Watson, Ed. New York: Springer-Verlag, 1978.
- [18] K. C. Chiang, C. H. Lai, A. Chin, T. J. Wang, H. F. Chiu, J.-R. Chen, and S. P. McAlister, "Very high-density (23 fF/μm<sup>2</sup>) RF MIM capacitors using high-κ TaTiO as the dielectric," *IEEE Electron Dev. Lett.*, vol. 26, no. 10, pp. 728–730, Oct. 2005.
- [19] H. Kim, J.-O. Plouchart, N. Zamdmer, N. Fong, L.-H. Lu, Y. Tan, K. A. Jenkins, M. Sherony, R. Groves, M. Kumar, and A. Ray, "Highperformance 3-D on-chip inductors in SOI CMOS technology for monolithic RF circuit applications," in *Proc. IEEE Rad. Freq. Integr. Circuits Symp.*, Jun. 2003, pp. 591–594.



Ling Zhang received the B.S. degree in electronic engineering and the M.S. degree in computer engineering both from Tsinghua University, Beijing, China, in 2002 and 2004, respectively, and the Ph.D. degree from the Department of Computer Science and Engineering, University of California-San Diego, San Diego, in 2009.

She joined Broadcom Corporation, San Diego, in 2009, and is currently a Staff Scientist in the Wireless Connectivity Group. Her current research interests include on-chip and off-chip high-performance

Wenjian Yu (S'01-M'04-SM'10) received the

B.S. and Ph.D. degrees in computer science from Tsinghua University, Beijing, China, in 1999 and

He joined Tsinghua University, in 2003, where he is an Associate Professor with the Department of

Computer Science and Technology. He has visited

the Computer Science and Engineering Department,

University of California-San Diego, San Diego, sev-

eral times during the period from September 2005 to January 2008. His current research interests include

interconnect analysis and optimization, low-skew clock network distribution, on-chip global routing techniques, and innovative logic structure.

2003, respectively.

parasitic extraction, modeling and simulation of interconnects, and a broad

Dr. Yu was a Technical Program Committee Member of the Association of

Computing Machinery/IEEE Asia South-Pacific Design Automation Conference in 2005, 2007, and 2008, and a Technical Program Committee Member

of the International Workshop on System Level Interconnect Prediction in

2009. He was the recipient of the distinguished Ph.D. Award from Tsinghua

University in 2003. He has served as a reviewer for the IEEE TRANSACTIONS



Alina Deutsch (M'83-SM'92-F'99) received the B.S. degree in electrical engineering from Columbia University, Columbia, NY, in 1971, and the M.S. degree in electrical engineering Syracuse University, Syracuse, NY, in 1976.

She has been with the IBM T. J. Watson Research Center, Yorktown Heights, NY, since 1971 and retired after 38 years in 2009. She worked in several areas, including testing of semiconductor and magnetic bubble memory devices. She designed unique lossy transmission line configurations, and

developed unique high-frequency high-impedance coaxial probes as well as a novel short-pulse measurement technique for characterization of resistive transmission lines. She was a Research Staff Member working on the design, analysis, and measurement of packaging and very large scale integration chip interconnections for future digital processor and communication applications. Her work involved 3-D modeling, signal integrity and noise simulation, and testing of a large range of package lossy transmission lines from printedcircuit boards, cables, connectors, to thin-film wiring on multichip modules and on-chip wiring. She also managed the Interconnect and Packaging Analysis Project that develops advanced electromagnetic field-solver codes. She has authored 46 papers published in refereed technical journals, has given numerous invited and tutorial talks, and holds 16 patents.

Dr. Deutsch received the Outstanding Technical Achievement, Research Division, and S/390 Division Team Awards from IBM in 1990, 1993. 1996, 1999-2003, 2005, 2006, and 2009. She co-chaired the IEEE Topical Meeting on Electrical Performance of Electronic Packaging for four years, was Technical Program Co-Chair for the International Microelectronics and Packaging Society Next Generation Integrated Circuit (IC) and Package Design Workshop for three years, co-chaired the Components, Packaging & Manufacturing Technology (CPMT) Society Future Directions in IC and Package Design workshop for six years, served as Guest Editor of the IEEE TRANSACTIONS ON ADVANCED PACKAGING for five years, and was an Associate Editor of the IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES. She is a member of Tau Beta Pi and Eta Kappa Nu. She has served as an elected member of the IEEE CPMT Society Board of Governors for 2000 to 2002 and Vice-Chair of the CPMT Society Electrical Design, Modeling, and Simulation Technical Committee.



range of numerical methods.

Yulei Zhang (S'08) received the B.E. degree in electrical engineering from Tsinghua University, Beijing, China, in 2007, and the M.S. degree in electrical and computer engineering from the University of California-San Diego (UCSD), San Diego, in 2009. He is currently pursuing the Ph.D. degree in the Department of Electrical and Computer Engineering, UCSD.

He was an intern with Bluetooth Integrated Circuit Design Group, Broadcom Corporation, San Diego, CA, in 2009. His current research interests include

the design and optimization of high-speed low-power on-chip/off-chip interconnects and low-power clock distribution network design.



George A. Katopis (S'71-M'74-SM'99-F'02) is a Distinguished Engineer Emeritus with the IBM System Group, Poughkeepsie, NY, where he spent 35 years as a Technical Leader and Strategist for the first- and second-level packaging solutions for the IBM Z-servers. He is author of over 100 publications in the fields of his interests, which include switching noise prediction and containment, signal integrity, electrical modeling of packaging structures, and cost and architecture of multilevel packages. He holds ten patents in the above areas.



communication.

Renshen Wang received the B.E. degree from Tsinghua University, Beijing, China, in 2005, and the M.S. and Ph.D. degrees from the University of California, San Diego, in 2007 and 2010, respectively, all in computer science.

He is currently a Research and Development Engineer with the Placement and Route Division, Mentor Graphics, Wilsonville, OR, focusing on floor planning. His current research interests include computer-aided detection algorithms on floor planning, chip-packaging routing, and onchip



Daniel Dreps received the B.S.E.E. degree from Michigan State University, East Lansing, in 1983.

He is now a Distinguished Engineer with IBM Systems and Technology Group, Armonk, NY. During his IBM career, he has designed and developed transistor models, fiber-optic links, applicationspecific integrated circuit technology custom elements, and high-speed serial links for IBM servers. He has published several research papers and holds more than 40 patents in areas of interconnect and server design. His current research interests include

high-speed link development and application for the entire range of IBM servers.



James F. Buckwalter (S'01–M'06) received the B.S. degree in electrical engineering from the California Institute of Technology, Pasadena, CA, in 1999, the M.S. degree in electrical engineering from the University of California at Santa Barbara, Santa Barbara, in 2001, and the Ph.D. degree in electrical engineering from the California Institute of Technology, in 2006.

He was a Research Scientist with Telcordia Technologies, Piscataway, NJ, from 1999 to 2000. In 2004, he was with the IBM T. J. Watson Research

Center, Yorktown Heights, NY. In 2006, he joined Luxtera, Carlsbad, CA, where he developed high-speed circuits for optical interconnects. In July 2006, he joined the faculty of the University of California-San Diego, La Jolla, where he is an Assistant Professor of electrical engineering.

Dr. Buckwalter was the recipient of the IBM Ph.D. Fellowship in 2004, the Defense Advanced Research Projects Agency Young Faculty Award in 2007, and the National Science Foundation CAREER Award in 2011.



**Ernest S. Kuh** (S'49–M'57–F'65–LF'94) received the B.S. degree from the University of Michigan, Ann Arbor, in 1949, the M.S. degree from Massachusetts Institute of Technology, Cambridge, in 1950, the Ph.D. degree from Stanford University, Palo Alto, CA, in 1952, the Doctor of Engineering, Honoris Causa, from Hong Kong University of Science and Technology, Kowloon, Hong Kong, in 1997, and the Doctor of Engineering degree from the National Chiao Tung University, Hsinchu, Taiwan, in 1999.

He is the William S. Floyd, Jr. Professor Emeritus in Engineering and a Professor in the Graduate School of the Department of Electrical Engineering and Computer Sciences (EECS), University of California, Berkeley. He joined the EECS Department Faculty in 1956. From 1968 to 1972, he served as a Chair of the Department, and from 1973 to 1980, he served as a Dean of the College of Engineering. From 1952 to 1956, he was a member of the Technical Staff at Bell Telephone Laboratories, Murray Hill, NJ.

Prof. Kuh has received numerous awards and honors, including the American Society for Engineering Education Lamme Medal, the IEEE Centennial Medal, the IEEE Education Medal, the IEEE Circuits and Systems Society Award, the IEEE Millennium Medal, the C&C Prize in 1996, and the Electronic Design Automation Consortium Phil Kaufman Award in 1998. He is a member of the National Academy of Engineering, the Academia Sinica, and a foreign member of the Chinese Academy of Sciences. He is a Fellow of the American Association for the Advancement of Science.



**Chung-Kuan Cheng** (S'82–M'84–SM'95–F'00) received the B.S. and M.S. degrees in electrical engineering from the National Taiwan University, Taipei, Taiwan, and the Ph.D. degree in electrical engineering and computer sciences from the University of California, Berkeley, in 1984.

He was a Senior Computer Aided Design Engineer at Advanced Micro Devices Inc., Sunnyvale, CA, from 1984 to 1986. He joined the University of California-San Diego (UCSD), San Diego, in 1986, where he is a Professor in the Computer Science

and Engineering Department and an Adjunct Professor in the Electrical and Computer Engineering Department. He served as a Chief Scientist at Mentor Graphics Corporation, Wilsonville, OR, in 1999. His current research interests include medical modeling and analysis, network optimization, and design automation on microelectronic circuits.

Prof. Cheng was an Associate Editor of the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN (TCAD) for 1994 and 2003. He was a recipient of the Best Paper Award, IEEE TCAD, in 1997 and 2002, the NCR Excellence in Teaching Award, School of Engineering, UCSD, in 1991, and the IBM Faculty Award in 2004, 2006, and 2007. He was appointed as an Honorary Guest Professor of Tsinghua University from 2002 to 2008.