Design Productivity Crisis - VAST lab

Design Productivity Crisis - VAST lab

Task 1091.001: Highly Scalable Placement by Multilevel Optimization Task Leaders: Jason Cong (UCLA CS) and Tony Chan (UCLA Math) Students with Graduation Dates: Michalis Romesis (UCLA CS, March 2005 ---graduated) Kenton Sze (UCLA Math, July 2006 --- graduated) Min Xie (UCLA CS, September 2006 --- graduated) Guojie Luo (UCLA CS, September 2010) Research Staff: Joe Shinnerl, UCLA CS Industrial Liaisons Patrick McGuinness, Freescale Semiconductor, Inc. Natesan Venkateswaran, IBM Corporation Amit Chowdhary, Intel Corporation 20/1/30 UCLA VLSICAD LAB 2 Task Description and Anticipated Result Highly scalable multilevel, multiheuristic placement algorithms that address the critical placement needs of nanometer designs: scalability multi-constraint optimization --- timing, routability, power, manufacturability, etc. support of mixed-sized placement and incremental design. Quantitative study of the optimality and scalability of placement algorithms Construction of synthetic benchmarks with known optima to identify the

deficiencies of existing methods Our goal is to achieve one-process-generation benefit through innovation of physical-design technologies, especially placement. 20/1/30 UCLA VLSICAD LAB 3 Task Deliverables Report on new placement benchmarks with known optimal or near optimal solutions for all major objectives and constraints. Scalability and optimization studies on existing placement techniques (Completed 3-Nov-2003) Experiments and reports on the applicability of integrated AMG-based weighted aggregation and weighted interpolation. Improvement measured on both PEKO examples and industrial examples from SRC member companies ( Completed 1-Jun2004) Experiments and reports on multiheuristic, multilevel relaxation and the scalable incorporation of complex constraints into the enhanced multilevel framework. Improvement measured on both PEKO and industrial examples (Completed 1-Jun2005) A highly scalable placement tool that (i) supports multi-constraint optimization, mixed-sized placement, and incremental design and (ii) produces best-of-class results for both PEKO and industrial examples from SRC member companies (Completed 1-Jun-2006) Final report summarizing research accomplishments and future direction (PlannedOct-31, 2006) 20/1/30 UCLA VLSICAD LAB 4 Accomplishments in the Past Year

1. Improvements in mPL for routing density control [Best quality, ISPD 2006 contest] 2. Thermal-Driven Placement 3. Heterogeneous Placement 20/1/30 UCLA VLSICAD LAB 5 Relative Wirelength A Brief History of mPL mPL 1.0 [ICCAD00] UNIFORM CELL SIZE ESC Clustering Goto relaxation mPL 1.1 FC clustering Partitioning added to legalization mPL 2.0 RDFL relaxation Primal-dual netlist pruning mPL 3.0 [ICCAD03] QRS relaxation AMG interpolation Multiple V cycles NON-UNIFORM CELL SIZE 2000

20/1/30 2001 2002 mPL 5.0 mPL 4.0 Improved DP Backtracking V cycle 2003 2004 UCLA VLSICAD LAB Multilevel force directed Mixed-size capability mPL 6.0 Enhanced Routability handling 2005 2006 year 6 mPL: Generalized Force-Directed Placement Use of accurate objective functions [Bertsekas, 82, Naylor et al, 01] Optimization-based bin-density constraint formulation min

W ( x) s .t . ( x) , 1 1 where ( x ) d ( x ), c . Iterative Uzawa solver W ( x k 1 ) k ( x k ) ( k 1 ) ij ( k ) ij ( ij ) is a generalized force Multilevel for better runtime and wirelength 20/1/30 UCLA VLSICAD LAB 7 Accomplishments in the Past Year 1. Improvements in mPL for routing density control [Best quality, ISPD 2006 contest] 2. Thermal-Driven Placement 3. Heterogeneous Placement 20/1/30 UCLA VLSICAD LAB 8

Core Engine for Density Control Initial Finest Problem Final Placement coarsening One V cycle with comparable quality Minimum perturbation in the last stages of GFD interpolation coarsening Significant speed up without losing solution quality interpolation coarsening interpolation Coarsest Problem Overall scheme Routing density handling Residual density in each bin Even distribution of dummy density into bins Cell area inflation for better convergence GFD with Density Control Minimun perturbation 20/1/30 UCLA VLSICAD LAB 9

Macro Spreading Need area density below target value A1 w H [Nam, ISPD06] w2 Target distance between neighboring w1 W A2 fij macros A1 A2 w w1 w2 H : target density Spreading represented as objective n n min dxi dyi i 1 n fx ij fyij

i 1 i 1, j i dxi and dyi : perturbation 20/1/30 Hij x fxij and fyij : piece-wise linear function UCLA VLSICAD LAB 10 Experiment Results on ISPD06 mPL6 produces the best solution quality using ISPD06 routability-driven metric 20/1/30 UCLA VLSICAD LAB 11 Demonstration of mPL6 http://cadlab.cs.ucla.edu/cpmo/videos/mPL6-density.wmv 20/1/30 UCLA VLSICAD LAB 12 Accomplishments in the Past Year 1. Improvements in mPL core engine for mixed-size global placement 2. Thermal-Driven Placement 3. Heterogeneous Placement 20/1/30

UCLA VLSICAD LAB 13 Motivation High power density due to technology scaling Problems caused by high temperature Hot spots become more harmful Higher temperature Higher leakage power More heat Previously negligible effects become first-order effects Difficult estimation for power, timing, etc 20/1/30 UCLA VLSICAD LAB 14 Thermal Model P Cxy Cz Tj,1 Tj,4 Ti Tj,3 Tj,2 One layer mesh to model the substrate j (Ti - Tj) Cxy + (Ti Tsink) Cz = Pi

Cxy, Cz are the thermal conductance for the substrate and the heat sink Tsink Solved by Fast DCT Solve T from CT = P, given C and P Diagonalize C = T is the discrete cosine matrix is a diagonal matrix T = -1-1 P 20/1/30 UCLA VLSICAD LAB 15 Formulation & Solution minimize WL( x ) subject to i ( x) i ( x) (Nonoverlap) Ti ( x) ti ( x) Tdes (Temperatu re) Implement i(x) and ti(x) with filler cells and filler power without area Tdes is a given by user Solved by Uzawa Algorithm WL( x ( k 1) ) (i k ) i ( x ( k 1) ) i( k )Ti ( x ( k 1) ) 0 i i

(i k 1) (i k ) ( i ( x ( k 1) ) ) i( k 1) i( k ) (Ti ( x ( k 1) ) Tdes ) As additional thermal-aware GFD following a WL-driven V-Cycle 20/1/30 UCLA VLSICAD LAB 16 Experiment Results on IBM-FastPlace circuit T_even ibm01 ibm02 ibm03 ibm04 ibm05 ibm06 ibm07 ibm08 ibm09 ibm10 ibm11 ibm12 ibm13 ibm14 ibm15 ibm16 ibm17 ibm18 Average 60 60 60 60 60 60 60 60

60 60 60 60 60 60 60 60 60 60 20/1/30 Final Initial T WL T WL Qual impr WL incr 68.65 67.78 68.99 77.00 66.89 68.37 69.93 70.42 70.14 71.07 67.90 72.20 66.37 69.56

66.76 71.87 76.38 73.69 70.22 1.62E+006 3.62E+006 4.80E+006 5.94E+006 9.41E+006 4.90E+006 8.22E+006 9.38E+006 9.44E+006 1.79E+007 1.42E+007 2.25E+007 1.72E+007 3.31E+007 3.95E+007 4.48E+007 6.16E+007 4.29E+007 60.34 60.30 60.38 61.21 60.38 60.42 60.62 61.80 60.47 60.89 60.69 61.74 60.87 61.18 60.73 61.03 61.33 61.19 60.87

1.71E+006 3.82E+006 5.08E+006 6.45E+006 9.99E+006 5.01E+006 8.72E+006 9.59E+006 9.88E+006 1.85E+007 1.50E+007 2.38E+007 1.78E+007 3.47E+007 4.02E+007 4.58E+007 6.42E+007 4.51E+007 96.1% 96.1% 95.7% 92.9% 94.4% 95.0% 93.8% 82.7% 95.3% 91.9% 91.3% 85.7% 86.3% 87.6% 89.1% 91.4% 91.9% 91.3% 91.6% 1.06 1.05 1.06 1.08

1.06 1.02 1.06 1.02 1.05 1.03 1.05 1.06 1.03 1.05 1.02 1.02 1.04 1.05 1.05 UCLA VLSICAD LAB Quality improvement (Tinit T final ) (Tinit Teven ) Teven is the ideal temperature with the same total power Max. on-chip temperature: Tinit after Step 1 Tfinal = Tdes after Step More than 90% quality improvement within 5% WL increase 17 Accomplishments in the Past Year 1. Improvements in mPL for routing density control [1st quality, ISPD 2006 contest] 2. Thermal-Driven Placement 3. Heterogeneous Placement

20/1/30 UCLA VLSICAD LAB 18 Motivation Need for placement on array type chips with pre-fabricated resources FPGA Structured ASIC Need for heterogeneous capability Memory, DSP, etc Block on sites of the same type 20/1/30 UCLA VLSICAD LAB 19 Related Work Academia VPR [Betz & Rose 97], PATH [Kong 02], SPCD [Chen & Cong 04,05], PPFF [Maidee et al, 03], CAPRI [Gopalakrishnan et al, 06] [ Most comparisons to out-dated tools No heterogeneous capability Industry Quartus II [Altera Corp.], ISE [Xilinx Inc.] Proprietary chips only Techniques not publicly documented 20/1/30

UCLA VLSICAD LAB 20 Heterogeneous Placement by mPL-H First analytical placer for heterogeneous placement Framework based on mPL6 [Chan et al, 05] Multiple layered placement DSP M-RAM One logical layer for each resource Forbidden regions blocked by obstacles LAB Uniform wirelength computation 20/1/30 Filler cells on each layer UCLA VLSICAD LAB 21 Demonstration of mPL-H http://cadlab.cs.ucla.edu/cpmo/videos/mPL-H.wmv 20/1/30

UCLA VLSICAD LAB 22 Experiment Setting Verilog netlist Quartus_map Quartus_fitter Clustered .vqm netlist Chip type mPL-H .xml Stratix Description .qsf placement Quartus_router 20/1/30 UCLA VLSICAD LAB 23 Wirelength Comparison circuit #LAB fip_risc8 mux64_16bit mux8_128bit oc_cordic_p2r oc_cordic_r2p oc_aes_core oc_aes_core_inv oc_aquarius oc_cfft_1024x12

oc_des_des3area oc_des_des3perf oc_des_perf_opt oc_fpu oc_mem_ctrl oc_mips oc_oc8051 oc_video_compression_systems_dct oc_video_compression_systems_jpeg oc_wb_dma os_blowfish oc_ethernet Avg. 219 150 141 111 157 181 183 646 191 120 1569 550 793 387 387 331 4440 3924 396 168 265 20/1/30 Quartus 5.0 PWL RWL 6322 15828 4560

9464 3608 7328 2982 6852 4239 8848 16362 27100 18639 30288 32684 78616 5915 12256 8273 15588 62982 116312 17443 31548 26570 64324 16502 38428 18550 43916 13640 31172 165835 423292 139076 370064 25614 57128 27023 43832 12013 24812 1.00 1.00 PWL 5872

4582 3541 2786 3889 15537 17962 31703 5757 8139 59611 17096 24583 16808 18639 13332 172902 142234 24999 22569 11935 mPL-H ratio RWL 0.93 15304 1.00 9728 0.98 6556 0.93 6264 0.92 8260 0.95 25944 0.96 30680 0.97 78280 0.97 11988 0.98 15472

0.95 118028 0.98 32476 0.93 63628 1.02 39132 1.00 43260 0.98 30520 1.04 436708 1.02 368528 0.98 57704 0.84 39804 0.99 25128 0.97 WL still important for architecture evaluation mPL-H is 3% better in HPWL, and 2% better in routed WL than Quartus II v5.0 UCLA VLSICAD LAB ratio 0.97 1.03 0.89 0.91 0.93 0.96 1.01 1.00 0.98 0.99

1.01 1.03 0.99 1.02 0.99 0.98 1.03 1.00 1.01 0.91 1.01 0.98 24 1800 1600 1400 1200 1000 800 600 400 200 0 Quartus II 5.0 20/1/30 mPL-H 44 40 15 69 64 6 39

6 38 7 26 5 19 1 18 1 15 7 14 1 #LAB 11 1 runtime(s) Runtime Comparison mPL-H can be 2X faster than Quartus II v5.0 when the circuit becomes sufficiently large UCLA VLSICAD LAB 25 Overall Accomplishments Over the Funding Period circuit ibm01 ibm02 ibm03 ibm04 ibm05 ibm06 ibm07

ibm08 ibm09 ibm10 ibm11 ibm12 ibm13 ibm14 ibm15 ibm16 ibm17 ibm18 Avg. mPL3 WL runtime(s) 1.88E+06 211 3.77E+06 456 5.45E+06 501 6.34E+06 565 1.06E+07 876 5.88E+07 756 9.62E+06 1000 9.77E+06 1578 1.10E+07 1243 1.98E+07 2017 1.67E+07 1568 2.51E+07 2124 2.04E+07 2281 3.74E+07 3776

4.57E+07 5289 5.10E+07 5891 7.07E+07 7022 4.86E+07 7127 1.00 1.00 20/1/30 mPL5 WL runtime(s) 1.65E+06 63 3.62E+06 128 4.58E+06 110 5.71E+06 147 9.96E+06 158 5.12E+07 197 8.18E+06 260 9.28E+06 379 9.24E+06 373 1.74E+07 444 1.40E+07 439 2.23E+07 510 1.65E+07 570 3.14E+07 1095

3.88E+07 1375 4.34E+07 1532 6.15E+07 1685 4.13E+07 1853 0.87 0.26 circuit adaptec1 adaptec2 adaptec3 adaptec4 bigblue1 bigblue2 bigblue3 bigblue4 Avg. mPL5 WL runtime(s) 8.72E+07 5296 1.05E+08 5691 2.65E+08 7015 2.29E+08 6433 1.12E+08 3454 2.01E+08 7145 4.33E+08 32825 9.53E+08 36343 1.20 1.31

34% reduction in WL over 3 years One technology generation advancement UCLA VLSICAD LAB mPL6 WL runtime(s) 7.79E+07 2894 9.20E+07 2995 2.14E+08 9353 1.94E+08 8812 9.68E+07 3636 1.52E+08 10207 3.44E+08 13564 8.29E+08 30540 1.00 1.00 26 Technology Transfer in 2006 Discussions at conferences and workshops ASPDAC 2006, Yokohama, Japan ISPD 2006, San Jose, USA DAC 2006, San Francisco, USA Benchmark Releases (PEKO-MS) http://cadlab.cs.ucla.edu/~pubbench mPL release: http://cadlab.cs.ucla.edu/src_686_mpl/

20/1/30 UCLA VLSICAD LAB 27 Software Download Record PEKO/PEKU [2002 now] More than 360 downloads SRC member companies Cadence, IBM, Intel, Mentor Graphics,etc. NON-SRC member companies Synopsys, Magma, Monterey Design, etc. Universities CMU, Michigan, MIT, UC Berkeley, UCSD, etc., mPL [2001 now] More than 480 downloads SRC member companies Cadence, Intel, Mentor Graphics,etc. NON-SRC member companies Synopsys, Magma, Intrinsity, Oasys, etc. Universities

20/1/30 CMU, Michigan, Stanford, UCSD, Natl Taiwan U., etc., UCLA VLSICAD LAB 28 Publications in 2006 Conference papers ASPDAC 2006: J. Cong, M. Xie, A Robust Detailed Placement for Mixed-size IC Designs. ISPD 2006: T. F. Chan, J. Cong, J. Shinnerl, K. Sze and M. Xie, mPL6: Enhanced Multilevel Mixed-size Placement. Thesis Kenton Sze, Multilevel Optimization for VLSI Circuit Placement. Min Xie, Constraint-Driven Large Scale Circuit Placement Algorithms. 20/1/30 UCLA VLSICAD LAB 29 Room for Further Improvement? mPL4 20/1/30 mPL5 Swirls are difficult to correct with localized refinement UCLA VLSICAD LAB 30

Recently Viewed Presentations

  • DNA - Pierre T.F. Riggs Football - Home

    DNA - Pierre T.F. Riggs Football - Home

    _Replication_: Copying DNA to make 2 identical DNA before cell Divides _Transcription__: Using DNA template to form single stranded RNA that leaves the nucleus and transports the information to the ribosomes _Translation____: Use the RNA message to produce specific protein...
  • CS 294-5: Statistical Natural Language Processing

    CS 294-5: Statistical Natural Language Processing

    Ordering of elimination of hidden variables can affect size of factors generated. Worst case: running time exponential in the size of the Bayes' net ... Evidence influences the choice of downstream variables, but not upstream ones (C isn't more likely...
  • Process and Process Models

    Process and Process Models

    Informal Approach to Analysis Data Flow Modeling Example DFD: Enrolling in a University Data flow diagrams Data flow diagrams Data flow diagrams… DFD Example DFD Conventions Data flow diagrams… Other Approaches to RA Requirements Specification Characteristics of an SRS Characteristics…
  • Priotization Process

    Priotization Process

    PROPOSED TIMETABLE. Rubrics by Tuesday, February 19. Special Meeting on . February 22, from 1 - 2 pm for presentations. Electronic votes by Sunday, February 24. Vote outcome to be presented at . Roundtable on February 25, 2013. Vote outcome...
  • Meditační workshop

    Meditační workshop

    [email protected] Meditace „Akt věnování pozornosti pouze jedné věci, ať už coby náboženská aktivita, či coby způsob pro dosažení zklidnění a relaxace." Cambridge Dictionary
  • Smoke Free Campus Effective January 1, 2008 Smoke

    Smoke Free Campus Effective January 1, 2008 Smoke

    Secondhand smoke increases a non-smoker's risk for heart disease and worsens symptoms of adults already suffering from asthma, allergies or bronchitis. UNC supports this expanded dimension of the no-smoking policy because of its tremendous health benefits for the entire University...
  • High throughput crystallography à la SGC

    High throughput crystallography à la SGC

    28 March 2007 Bootstrap in refinement Gábor Bunkóczi Bootstrap - basics Bootstrap - aims Bootstrap - algorithm Bootstrap - implementation Bootstrap - results Bootstrap - development Bootstrap in refinement Gábor Bunkóczi Bootstrap - basics Bootstrap - aims Bootstrap - algorithm...
  • Photo Album

    Photo Album

    To assist the Northern Rivers community to access legal advice and representation to advance animal welfare through the legal system, and engage in law reform and community education activities to improve the legal protections for animals. Objectives •Act as a...