New Game, New Goal Posts: A Recent History

New Game, New Goal Posts: A Recent History

New Game, New Goal Posts: A Recent History of Timing Closure Andrew B. Kahng UCSD CSE and ECE Departments [email protected] http://vlsicad.ucsd.edu A. B. Kahng, Timing Closure, DAC-2015 Session 12 1 What is Timing Closure? Most critical phase of modern system-on-chip implementation No timing closure = no tapeout Timing closure is end result of Years of methodology/script/signoff development Months of block- and top-level final physical implementation Weeks of final pass including manual noise, DRC fixes Changes Process/device technology Modeling standards EDA tooling Design methodology Signoff criteria Demand for innovations in timing closure A. B. Kahng, Timing Closure, DAC-2015 Session 12

2 Agenda Timing Closure and New Contexts Example Challenges Example Near-Term Mitigations Futures and Conclusions A. B. Kahng, Timing Closure, DAC-2015 Session 12 3 Traditional View of Timing Closure N. MacDonald, Broadcom Corp., Timing Closure in Deep Submicron Designs, 2010 DAC Knowledge Center article TOP-LEVEL NETLIST / SPEF BLOCK-LEVEL NETLIST / SPEF Static Timing Analysis for all Modes / Corners About 5 iterations Timing Closed Breakdown of Timing Violations on per Block Basis Manual Repair of Timing Failures Operations Permitted at Each Iteration (in order of preference)

(1) Vt Swap, Resizing, Buffer Insertion, NDR Changes, Useful Skew (2) Vt Swap, Resizing, Buffer Insertion, NDR Changes (3) Vt Swap, Resizing, Buffer Insertion (4) Vt Swap, Resizing (5) Vt Swap Violation Classes Addressed for Each Iteration (in order of priority) (1) Electrical Rule Violations (2) Noise Violations (3) Setup Violations (4) Hold Violations A. B. Kahng, Timing Closure, DAC-2015 Session 12 4 Context I: Race to End of Roadmap Paper model to v1.0 SPICE model: ~12 months @N10 Many near-term red bricks: ArF, Cu, low-k, Foundry-fabless dynamics: who gives up margin ? Time constants limit design-manufacturing co-evolution Mismatches among these time constants (Years) Tech development, app market definition, architecture/front-end design

(Months) RTL-to-GDS implementation, reliability qualification (Weeks) Fab latency, cycles of yield learning, design re-spins, mask flows Model-hardware miscorrelation Model guardbanding Faster node enablement is challenging !! (Days) Process tweaks, design ECOs A. B. Kahng, Timing Closure, DAC-2015 Session 12 5 Context II: Low-Power Grand Challenge Green datacenters Cloud Big data Low power = High complexity Mobility multiple supply voltages, power and clock gating, DVFS, MTCMOS, multi-Lgate, Internet of Things

Increased timing closure burden A. B. Kahng, Timing Closure, DAC-2015 Session 12 6 Recent History 90nm 65nm 45/40nm 28nm Temp inversion Maxtrans Dynamic IR PBA Fixed-margin spec Noise EM MCMM 20nm Multipatterning

16/14nm 10nm 7nm MOL, BEOL R MIS Cell-POCV Phys-aware timing ECO AOCV / POCV Min implant LVF BTI BEOL, MOL variations Signoff criteria with AVS SOC complexity Fill effects Layout rules A. B. Kahng, Timing Closure, DAC-2015 Session 12 7 Changes I Rise of MOL and BEOL resistivity, variability impacts Multi-patterning BEOL corner explosion M2

V1 M1 V0 Mint Vint M0G Fin BEOL M0A MOL Poly Criticality of margin reduction M3 Inter-layer dielectric spacing Inter-metal dielectric M2 M1 Higher-dimensional delay/slew modeling; color-aware P&R + signoff Liberty Variation Format (LVF) shows reduced pessimism A. B. Kahng, Timing Closure, DAC-2015 Session 12 8

Changes II Rapid, near-universal adoption of adaptivity (e.g., AVS) setup violation becomes hazy; removes DC part of timing margin Performance monitor Supply voltage Control block Circuit Path-based analysis with SI enabled is needed earlier in flow Runtime (s) Runtime, license cost overheads 180 160 140 120 100 80 60 40 20 0 pba has >4x runtime Runtime of pba vs. gba to find top 10K timing paths with SI enabled (28 FDSOI) gba

pba JPEG gba pba AES See: http:// vlsicad.ucsd.edu/Publications/Conferences/311/c311.pdf http://vlsicad.ucsd.edu/Publications/Conferences/325/c325.pdf A. B. Kahng, Timing Closure, DAC-2015 Session 12 9 New Game, New Goal Posts? Design Synthesis/Opt OLD 1 mode Setup-hold SI Cw only NLDM Technology and

Design Enablement Architecture; RTL; SP&R; Timing/Noise ECOs SPICE; ITF; Library/IP; Testchips NEW Analysis Modeling MIS; SHPR; SI; PBA; -dynamic LVF; BEOL/MOL s; Lib groupss; Lib groups Signoff Yield vs. Slack; MCMM; TBC; AVS; Corner vs. Flat Margins

MCMM Cell-POCV / LVF Dynamic IR Wide/exploding corners, corner reduction, crosscorners (BEOL Cw, Ccw, RCw, temp, VDD) Flat margin selection Noise closure Aging/AVS Timing Closure A. B. Kahng, Timing Closure, DAC-2015 Session 12 10 Agenda Timing Closure and New Contexts Example Challenges Example Near-Term Mitigations Futures and Conclusions A. B. Kahng, Timing Closure, DAC-2015 Session 12 11 Multi-Input Switching Multi-input Switching (MIS) = More than one input switches

at the same time Conventional timing libraries consider only single-input switching (SIS) MIS can significantly change arc delays Need more comprehensive timing model 3.00E-11 FO3 Stage Delay (s) 2.50E-11 2.00E-11 rise_MIS rise_SIS fall_MIS fall_SIS 1.50E-11 1.00E-11 5.00E-12 0.00E+00 Normal VDD 80% VDD Technology: 28FDSOI Design: chained NAND2 gates with FO3 A. B. Kahng, Timing Closure, DAC-2015 Session 12 12 BEOL Multi-Patterning Impacts Mandrel

Spacer Mx metal Line-end cuts Mwidth Wire1width = Mwidth Swidth Mspace Line-end extensions Floating fill wires Wire2width = Mspace 2*Swidth Mandrel A. B. Kahng, Timing Closure, DAC-2015 Session 12 13 Placement-Sizing Interference New interferences between post-layout optimization and P&R Rules for device layers (FEOL) become considerably more complex and restrictive Minimum implant width rules for implant region Minimum notch and jog width rule for oxide diffusion (OD) OD HVT

LVT HVT HVT LVT LVT HVT HVT Cell boundary A. B. Kahng, Timing Closure, DAC-2015 Session 12 14 Placement-Sizing Interference (cont.) Drain-to-drain abutment (DDA) D D D S Poly Active region Cell boundary D S

Connection Power/ground Example solution DDA violation Min implant width violation Min jog/notch width violation Min implant width violation Intertwine the historically separate tasks of P&R and postroute optimization A. B. Kahng, Timing Closure, DAC-2015 Session 12 15 Corner Explosion Vdd Operating modes: nominal, turbo, LP1, LP2 NOM

Turbo NOM lifetime FE corners: FF, FFG, FS, SF, TT, SSG, SS BE corners: C-worst, Cc-worst, RC-best SS T3 H2 T2 H1 T1 SSG TT FFG M3 Inter-layer dielectric S2 M2 W2

M1 FF Typical C-best C-worst RC-best RC-worst Transistor speed WW typical min max max min WT typical min max max min WH Typical max min max min Temp corners: temperature inversion corners Inter-metal dielectric

Split corners: memory, logic rails with synch interfaces A. B. Kahng, Timing Closure, DAC-2015 Session 12 16 16 Agenda Timing Closure and New Contexts Example Challenges Example Near-Term Mitigations Futures and Conclusions A. B. Kahng, Timing Closure, DAC-2015 Session 12 17 I. Improved Variation Modeling Monte Carlo path delay simulation shows asymmetric path delay distribution under process variation Need separate values for setup and hold analysis LVF can handle such non-Gaussian distribution (from [Rithe et al.]) A. B. Kahng, Timing Closure, DAC-2015 Session 12 18

II. Tightened BEOL Corners (TBC) Routed design [ICCD14] Routed design Classify timing critical paths GTBC ECO using CBC Timing analysis using conventional BEOL corners (CBC) violation = 0? No done Conventional Signoff ECO using TBC No GCBC

Timing analysis using TBC Timing analysis using CBC violation = 0? violation = 0? ECO using CBC No done Our work A. B. Kahng, Timing Closure, DAC-2015 Session 12 19 Pessimism in Conventional BEOL Corners (CBC) Assumption: a max (setup) path pj is safe when the delay evaluated at a given CBC is larger than nominal delay + 3j dj(YCBC) 3j + dj(Ytyp) For a given path, we can compare the statistical delay variation and the delay obtained from a given CBC

j = 3j / ddj(YCBC) ddj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC {YYcw, Ycb, Yrcw, Yrcb} pessimism A small j implies there is da(Ylarge )-d (Y ) 3j j CBC -3 j typ delay Large pessimism A. B. Kahng, Timing Closure, DAC-2015 Session 12 20 Scaling Factor Delay Variation @Cw,RCw Paths with small ddrcw and ddcw have large E.g., there are j > 0.6 when ((ddrcw < 3%) AND (ddcw < 3%)) Identify paths for tightened BEOL corners based on dd rcw and ddcw Wd(Yrcw)/d(Ytyp) Wd(Ycw)/d(Ytyp)

A. B. Kahng, Timing Closure, DAC-2015 Session 12 21 Practical Filter for TBC-Amenable Paths Gtbc = paths which can be safely signed off using tightened corners: (Path with (ddcw larger than Acw)) OR (Path with (ddrcw larger than Arcw)) Wd(Yrcw)/d(Ytyp) Acw Arcw Wd(Ycw)/d(Ytyp) A. B. Kahng, Timing Closure, DAC-2015 Session 12 22 Benefits of Tightened BEOL Corners #Timing violations reduced by 24% to 100% [Moores Law: 1% / week !] TBC-0.6 : more benefits Tradeoff between reduced margin vs. #paths which use TBC TBC-0.5 WNS (ns) LEON 0 -0.02 -0.04 -0.06

-0.08 -0.1 -0.12 -0.14 -0.16 -0.18 TBC-0.6 TBC-0.6 TBC-0.7 1600 1400 1200 1000 800 600 400 200 0 TBC-0.7 SUPERBLUE12 TBC-0.5 LEON CBC NETCARD

TBC-0.5 LEON TNS (ns) CBC CBC #Timing violations WNS and TNS are reduced by up to 100ps and 53ns SUPERBLUE12 TBC-0.6 NETCARD TBC-0.7 SUPERBLUE12 NETCARD 0 -10 -20 -30 -40 -50 -60

-70 -80 -90 A. B. Kahng, Timing Closure, DAC-2015 Session 12 23 [ISQED14] III. Flexible FF Timing Margin Recovery setup-hold-c2q flexible model c2q1 ... Setup time, hold time and clock-to-q hold (c2q) delay of FF values interdependent, but values interdependent, but NOT fixed Flexible FF timing model can exploit operating (function/test) modes values interdependent, but Free pessimism reduction in STA setup-hold-c2q c2qn fixed model Goal: Find best {Ysetup, hold, c2q} for each FF instance Sequential LP: setup-c2q opt hold-c2q opt

C2q-setup-hold surface setup c2q hold c2q c2q setup hold A. B. Kahng, Timing Closure, DAC-2015 Session 12 24 Flexible Timing Model Reduce Pessimism Independent datapaths in PBA: using fixed FF timing model loses performance optimization opportunity c2q: 20ps setup: 10ps FF1 480ps Total: 500ps 470ps 470ps

setup: 10ps 20ps 460ps FF3 c2q: 20ps 10ps 460ps 480ps FF2 Total: 500ps c2q: 10ps 20ps setup: 20ps 10ps Total: 500ps 500ps! 520ps? A. B. Kahng, Timing Closure, DAC-2015 Session 12 25 Improved Timing Signoff Flow Netlist (and SPEF, if routed) Extract path timing information Takeaways

LP formulation with flexible flip-flop timing model Solve Sequential LP Next (STA_FTmax , STA_FTmin) Solution Annotate new timing model for each flip-flop Fix timing violations for free 48ps average improvement of slack over 5 designs in a foundry 65nm technology Better exploitation of disjoint cycles/modes More accurate modeling of setup-hold-c2q tradeoff Circuit optimization should natively exploit FF timing model flexibility Timing signoff with annotated timing A. B. Kahng, Timing Closure, DAC-2015 Session 12

26 IV. Better Signoff Definition [DATE13] VBTI : Voltage for BTI-aging estimation Vlib : Supply voltage for timing library characterization Vfinal: Vdd of a circuit with AVS at end-of-lifetime VBTI BTI |Vtt| Vlib Derated library Circuit implementation and signoff Circuit implementation depends on VBTI and Vlib ? VBTI and Vlib depend on aging during AVS (Vfinal) Vfinal

Chicken & Egg Loop BTI degradation and AVS Vfinal depends on circuit circuit A. B. Kahng, Timing Closure, DAC-2015 Session 12 27 Observations and Heuristics Observation #1: Vfinal is not sensitive to cells along the timing-critical path Observation #2: WVt with a constant Vfinal throughout lifetime adaptive Vdd Heuristic #1: Use average of critical path replicas to estimate Vfinal (Vheur) Heuristic #2: approximate Vdd in AVS by constant Vheur Solve Chicken & Egg Loop by having VBTI = Vlib = Vheur Vfinal A. B. Kahng, Timing Closure, DAC-2015 Session 12 28

Experimental Results: A Knee Point Optimistic aging library large power penalty Ignore AVS larger area Low Vlib High Vlib Low VBTI Slower circuit Less aging Faster circuit Less aging High VBTI Slower circuit More aging Faster circuit More aging Overly pessimistic aging library large area penalty Our method finds Knee point for balanced area and power tradeoff Experiment setup: DC/AC BTI @ 125C 32nm PTM technology 4 benchmark circuit implementations

A. B. Kahng, Timing Closure, DAC-2015 Session 12 29 Agenda Timing Closure and New Contexts Example Challenges Example Near-Term Mitigations Futures and Conclusions A. B. Kahng, Timing Closure, DAC-2015 Session 12 30 Food for Thought EDA tool innovation in timing closure space has been helpful E.g., physically-aware ECO, dynamic IR-aware STA, Process and device innovation will continue to challenge timing closure Actual foundry-specific metal fill early in design Process enhancement (e.g., air gap) Self-heating from high current density in FinFET What about SoC-level design closure complexity? Better timing budgeting, constraints evolution, coordination of top- vs. block-level effort

A. B. Kahng, Timing Closure, DAC-2015 Session 12 31 Look Out For Margin becomes scarcer Low-hanging fruits being rapidly harvested Critical: better analysis accuracy, model-hardware correlation at extreme modes BEOL + MOL + Multi-Patterning Resistance scaling, pitch scaling, variation delicate balancing act Need better modeling and corner definition Bring together library, placement, routing, STA Variation modeling Statistical SPEF LVF, unified model of PVT variation (reduce #libraries!) Signoff Wide adoption of adaptivity (e.g., AVS) with new signoff criteria/goals Design-specific tightened corners Cross corners (FSG, SFG) Thermal and stress? 3D integration! A. B. Kahng, Timing Closure, DAC-2015 Session 12 32 Thanks to Rob Aitken for inviting this talk Christian Lutkemeyer, Isadore Katz, Sorin Dobre,

Tuck-Boon Chan, Kwangok Jeong, Nancy MacDonald and John Redmond for discussions and inputs UCSD VLSI CAD Laboratory students: Hyein Lee, Jiajia Li, Mulong Luo, Yaping Sun, Wei-Ting Jonas Chan A. B. Kahng, Timing Closure, DAC-2015 Session 12 33 THANK YOU ! A. B. Kahng, Timing Closure, DAC-2015 Session 12 34 BACKUP SLIDES A. B. Kahng, Timing Closure, DAC-2015 Session 12 35 Delay Variation Some paths have > 1.0 a CBC can underestimate delay variations But these paths have larger delays at the other corner RC-worst is the dominant corner Wdelay at RC-worst > Wdelay at C-worst C-worst is the dominant corner Wdelay at C-worst > Wdelay at RC-worst C-worst corner underestimates delay variations, but these paths are dominated by the RC-worst corner

Wdelay at C-worst [d(Ycw) d(Ytyp)] / d(Ytyp) < 1.0 delay variations are covered by the RC-worst corner Wdelay at RC-worst [d(Yrcw) d(Ytyp)] / d(Ytyp) A. B. Kahng, Timing Closure, DAC-2015 Session 12 36 Aging Signoff Corner with AVS Timing signoff: ensure circuit meets performance target under PVT variations & aging Conventional signoff approach: Analyze circuit timing at worst-case corners Fix timing violations, re-run timing analysis What is the Vdd signoff corner for aging + AVS? Circuit performance model (Vlib) Low Vdd Low Vdd High Vdd Slower circuit

Less aging Faster circuit Too Less aging optimistic Faster circuit More aging ? BTI model (VBTI) High Slower circuit Too Vdd More aging pessimistic ? A. B. Kahng, Timing Closure, DAC-2015 Session 12 37 Minimum Implant Area Constraint Small feature sizes cannot be patterned with ArF 0.7 0.6 436 nm 0.75 NA 0.5 365 nm

193 nm K1 0.4 248 nm 0.3 ArF (193nm wavelength) 0.5 NA 0.85 NA 1.2 NA 1.2 NA 2D Practical Limit 1.35 NA 1D Practical Limit 1.35 NA 0.2 1.35 NA 0.1 0 130nm

90nm 65nm 45nm 32nm 22nm 14nm 10nm Source: L. Liebmann and A. Torres, DAC, 2011 One example of challenges: control of implant area Minimum implant area is constrained A narrow cell cannot be sandwiched with different Vt cells Min implant area constraint Vt2 Vt1 Vt2 Violation A. B. Kahng, Timing Closure, DAC-2015 Session 12 38

A. B. Kahng, Timing Closure, DAC-2015 Session 12 39

Recently Viewed Presentations

  • A Round for Christmas Morn Bells ring out

    A Round for Christmas Morn Bells ring out

    A Round for Christmas Morn Bells ring out on Christmas morn Ding, dong, ding, dong. Telling us that Christ was born Ding dong, Ding a ding dong.
  • Enhancing capabilities Ohio Safety Expo: Columbus, OH Presentation

    Enhancing capabilities Ohio Safety Expo: Columbus, OH Presentation

    Quantifiable Productivity Benefit. legX (left) and backX AC (right) being used for baggage handling. shoulderX being used in the Automotive Industry. All devices can be worn while operating vehicles (backX AC shown above) backX AC being used for warehousing .
  • Elisabetta Adami - PhD Research Presentation

    Elisabetta Adami - PhD Research Presentation

    Of course this all rouses rage as well as admiration. Finally, the less explicit the request, the more amount of shared knowledge is required to perceive topic-relatedness. Indeed, the best-video Ever, which is not an explicit request, is authored by...
  • Presentación de PowerPoint

    Presentación de PowerPoint

    Lorenzo Guadalupe Rocha Cedillo. SDA. Unidad de Tecnología de la Información. ... Zapata Ruiz. SDA. Subdirector. Arq. Heriberto Ramón Niño Olivo ... Carlos Nazario Duana Duran. CA12. Auxiliar de Auditor. Diana Guadalupe Padrón Mejía. CA12. Auxiliar de Auditor.
  • Risk Assessment of Maritime Navigation across the Greater

    Risk Assessment of Maritime Navigation across the Greater

    Safety of Life at Sea (SOLAS) Convention, requires contracting governments to provide navigation safety services. MARPOL Convention - preventing . and minimizing pollution from ships - both accidental pollution and that from routine operations ...
  • Understanding the personal life cycle

    Understanding the personal life cycle

    Key milestones. Leaving school. When a person ceases to be in full-time education and looks for employment. Gaining employment. Being offered and accepting a paid job
  • The 12 Disciples - Bingo

    The 12 Disciples - Bingo

    Bartholomew. Thomas. Matthew. James 2. Thaddeus. Simon. Matthias. ... Bartholomew - Bartholomew is known for being an honest man who was convinced by Jesus' greatness upon his meeting with Him. Not a lot is known about him. ... The 12...
  • Part 2: The Formal Elements and Their Design

    Part 2: The Formal Elements and Their Design

    Part 2: The Formal Elements and Their Design. Chapter 4 Recap - Line. Thinking Back: What is a contour line? The perceived line that marks the border of an object in space. What are some of the functions and qualities...