CROW A Low-Cost Substrate for Improving DRAM Performance,

CROW A Low-Cost Substrate for Improving DRAM Performance,

CROW A Low-Cost Substrate for Improving DRAM Performance, Energy Efficiency, and Reliability Hasan Hassan Minesh Patel Jeremie S. Kim A. Giray Yaglikci Nika Mansouri Ghiasi Saugata Ghose Nandita Vijaykumar Onur Mutlu Summary Source code available in July: github.com/CMU-SAFARI/CROW Challenges of DRAM scaling: regular rows SA

SA SA SA SA copy rows SA Introduces copy rows into a subarray The benefits of a copy row: Efficiently duplicating data from regular row to a copy row Quick access to a duplicated row Remapping a regular row to a copy row CROW decoder Copy-Row DRAM (CROW) regular row

decoder High access latency bottleneck for improving system performance/energy Refresh overhead reduces performance and consume high energy Exposure to vulnerabilities (e.g., RowHammer) CROW is a flexible substrate with many use cases: CROW-cache & CROW-ref (20% speedup and consumes 22% less DRAM energy) Mitigating RowHammer We hope CROW enables many other use cases going forward 2 Outline 1. DRAM Operation Basics 2. The CROW Substrate CROW-cache: Reducing DRAM Latency CROW-ref: Reducing DRAM Refresh Mitigating RowHammer 3. Evaluation 4. Conclusion 3

DRAM Organization DRAM Subarray DRAM Cell DRAM Row Memory Bus Memory Controller CPU Sense Amplifier 4 Accessing DRAM DRAM Subarray Activate DRAM

Cell DRAM Row Precharge Read Sense Amplifier 5 Outline 1. DRAM Operation Basics 2. The CROW Substrate CROW-cache: Reducing DRAM Latency CROW-ref: Reducing DRAM Refresh Mitigating RowHammer 3. Evaluation 4. Conclusion 6 Challenges of DRAM Scaling

DRAM 1 access latency 2 refresh overhead 3 exposure to vulnerabilities 7 Our Goal We want a substrate that enables the duplication and remapping of data within a subarray 8 The Components of CROW

CROW row row regular decoder decoder decoder DRAM Subarray regular rows DRAM SA SA SA SA SA

SA copy rows CROW-table Memory Controller 9 CROW Operation 1: Row Copy regular row decoder DRAM Subarray CROW decoder regular rows DRAM

copy rows SA SA SA SA SA SA ACT-c (copy) Memory Controller 10 Row Copy: Steps source row: 1 Activation of the source row

2 Charge sharing destination row: 3 Beginning of restoration 4 Activation of the destination row 5 Sense Amplifier Restoration of both rows to source data 11 Row Copy: Steps source row: 1 Activation of the source row 2 Charge sharing destination row:

3 Beginning of restoration 4 Activation of the destination row Enables quickly copying aRestoration regular row of both rows 5 to source data into Sense a copy row Amplifier 12 CROW Operation 2: Two-Row Activation regular row decoder DRAM Subarray

CROW decoder regular rows SA SA SA SA SA SA copy rows DRAM ACT-t (two row)

Memory Controller 13 Two-Row Activation: Steps both charged or discharged 1 Activation of two rows 2 Charge sharing fast 3 Restoration Sense Amplifier 14 Two-Row Activation: Steps both charged or

discharged 1 Activation of two rows 2 Charge sharing fast 3 Restoration Enables fast access to data that is duplicated Sense across a regular row and a copy row Amplifier 15 Outline 1. DRAM Operation Basics 2. The CROW Substrate CROW-cache: Reducing DRAM Latency CROW-ref: Reducing DRAM Refresh Mitigating RowHammer

3. Evaluation 4. Conclusion 16 CROW-cache Problem: High access latency Key idea: Use copy rows to enable low-latency access to most-recently-activated regular rows in a subarray CROW-cache combines: row copy copy a newly activated regular row into a copy row two-row activation activate the regular row and copy row together on the next access Reduces activation latency by 38% 17 CROW-cache Operation ACT-t Memory Controller

CROW-table copy row 0 row X SA SA SA SA SA copy rows SA ACT-c regular rows CROW decoder

DRAM regular row decoder DRAM Subarray Request Queue load row X [bank conflict] load row X 1 CROW-table miss 2 Allocate a copy row 3 Issue ACT-c (copy) 1 CROW-table hit 2 Issue ACT-t (two row) 18

CROW-cache Operation regular rows CROW decoder DRAM regular row decoder DRAM Subarray ACT-t CROW-table SA SA SA

SA SA SA copy rows Request Queue load row X [bank conflict] load row X 1 CROW-table miss 2 Allocate a copy row 3 Issue ACT-c 1 CROW-table hit Second activation of row X is

faster Memory Controller copy row 0 row X 2 Issue ACT-t 19 Outline 1. DRAM Operation Basics 2. The CROW Substrate CROW-cache: Reducing DRAM Latency CROW-ref: Reducing DRAM Refresh Mitigating RowHammer 3. Evaluation 4. Conclusion 20 CROW-ref

Problem: Refresh has high overheads. Weak rows lead to high refresh rate weak row: at least one of the rows cells cannot retain data correctly when refresh rate is decreased Key idea: Safely reduce refresh rate by remapping a weak regular row to a strong copy row CROW-ref uses: row copy copy a weak regular row to a strong copy row CROW-ref eliminates more than half of the refresh requests 21 CROW-ref Operation Remap weak rows to strong 2 copy rows 3 On ACT, check the CROW-table SA SA SA

SA SA If remapped, activate a copy 4 row SA strong strong weak strong strong Retention Time strong Profiler Perform retention time 1 profiling 22

CROW-ref Operation Remap weak rows to strong 2 copy rows 3 On ACT, check the CROW-table SA SA SA SA SA If remapped, activate a copy 4 row SA strong strong weak

strong strong Retention Time strong Profiler Perform retention time 1 profiling How many weak rows exist in a DRAM chip? 23 Identifying Weak Rows DRAM Retention Time Profiler REAPER [Patel+, ISCA17] PARBOR [Khan+, DSN16] AVATAR [Qureshi+, DSN15] At system boot or during runtime 100% Probability

Weak cells are rare [Liu+, ISCA13] weak cell: retention < 256ms ~1000/238 (32 GiB) failing cells 1; 9.90E-01 80% 60% 40% 2; 3.10E-01 20% 0% 1 2 3.30E-04 3.30E-11

4 8 Weak rows in a subarray 24 Identifying Weak Rows DRAM Retention Time Profiler REAPER [Patel+, ISCA17] PARBOR [Khan+, DSN16] AVATAR [Qureshi+, DSN15] At system boot or during runtime 100% Probability Weak cells are rare [Liu+, ISCA13] weak cell: retention < 256ms ~1000/238 (32 GiB) failing cells

1; 9.90E-01 80% 60% 40% 2; 3.10E-01 20% 0% 1 2 3.30E-04 3.30E-11 4 8

rows in a subarray A few copy rows areWeak sufficient to halve the refresh rate 25 Outline 1. DRAM Operation Basics 2. The CROW Substrate CROW-cache: Reducing DRAM Latency CROW-ref: Reducing DRAM Refresh Mitigating RowHammer 3. Evaluation 4. Conclusion 26 Mitigating RowHammer victim aggressor victim

SA SA SA SA SA SA activate precharge Key idea: remap victim rows to copy rows 27 Outline 1. DRAM Operation Basics 2. The CROW Substrate CROW-cache: Reducing DRAM Latency CROW-ref: Reducing DRAM Refresh

Mitigating RowHammer 3. Evaluation 4. Conclusion 28 Methodology Simulator DRAM Simulator (Ramulator [Kim+, CAL15]) https://github.com/CMU-SAFARI/ramulator Source code available in July: github.com/CMU-SAFARI/CROW Workloads 44 single-core workloads SPEC CPU2006, TPC, STREAM, MediaBench 160 multi-programmed four-core workloads By randomly choosing from single-core workloads Execute at least 200 million representative instructions per core

System Parameters 1/4 core system with 8 MiB LLC LPDDR4 main memory 8 copy rows per 512-row subarray 29 N o r m a liz e d D R A M E Sp e e d u p CROW-cache Results 1.10 7.5% 7.1% 1.08 1.06 1.04 1.02 1.00

single-core four-core 8.2% 1.00 0.98 0.96 0.94 0.92 0.90 0.88 0.86 single-core 6.9% four-core * with 8 copy rows and a 64Gb DRAM chip (sensitivity in paper)

30 N o r m a liz e d D R A M E Sp e e d u p CROW-cache Results 1.10 7.5% 7.1% 1.08 1.06 1.04 1.02 8.2% 1.00 0.98 0.96

0.94 0.92 0.90 0.88 0.86 6.9% CROW-cache improves single-/four-core and energy * with 8 copy rows a performance 64Gb DRAM chip (sensitivity in paper) 1.00 single-core four-core single-core four-core 31

11.9% 1.14 1.12 1.1 7.1% 1.08 N o r m a liz e d D R A M E n Speedup CROW-ref Results 7.8% 17.2% 1.00 0.95 1.06

0.90 1.04 0.80 0.85 single-core four-core 0.75 0.70 single-core four-core * with 8 copy rows and a 64Gb DRAM chip (sensitivity in paper) 32 11.9%

1.14 1.12 1.1 1.08 7.1% N o r m a liz e d D R A M E n Speedup CROW-ref Results 7.8% 17.2% 1.00 0.95 1.06 0.90 0.85

1.04 CROW-ref significantly single-core four-core reduces the performance and energy overhead of DRAM refresh * with 8 copy rows a 64Gb DRAM chip (sensitivity in paper) 0.80 0.75 0.70 single-core four-core 33 N o r m a liz e d D R A M E n

Speedup Combining CROW-cache and CROW-ref CROW-(cache+ref) 20% Ideal CROW-cache + no refresh 17% 1.30 1.20 1.10 1.00 0.90 0.80 0.70 single-core four-core 23% 22%

CROW-(cache+ref) Ideal CROW-cache + no refresh 0.80 0.78 0.76 0.74 0.72 0.70 single-core four-core 34 N o r m a liz e d D R A M E n Speedup Combining CROW-cache and CROW-ref CROW-(cache+ref) 20%

Ideal CROW-cache + no refresh 17% 1.30 1.20 1.10 1.00 0.90 0.80 0.70 single-core four-core 23% 22% CROW-(cache+ref) Ideal CROW-cache + no refresh 0.80 0.78

CROW-(cache+ref) provides more performance and DRAM energy benefits than each mechanism alone 0.76 0.74 0.72 0.70 single-core four-core 35 Hardware Overhead For 8 copy rows and 16 GiB DRAM: 0.5% DRAM chip area 1.6% DRAM capacity 11.3 KiB memory controller storage CROW is a low-cost substrate 36 Other Results in the Paper Performance and energy sensitivity to:

Number of copy-rows per subarray DRAM chip density Last-level cache capacity CROW-cache with prefetching CROW-cache compared to other in-DRAM caching mechanisms: TL-DRAM [Lee+, HPCA13] SALP [Kim+, ISCA12] 37 Outline 1. DRAM Operation Basics 2. The CROW Substrate CROW-cache: Reducing DRAM Latency CROW-ref: Reducing DRAM Refresh Mitigating RowHammer 3. Evaluation 4. Conclusion 38 Conclusion

Source code available in July: github.com/CMU-SAFARI/CROW Challenges of DRAM scaling: regular rows SA SA SA SA SA copy rows SA Introduces copy rows into a subarray The benefits of a copy row:

Efficiently duplicating data from regular row to a copy row Quick access to a duplicated row Remapping a regular row to a copy row CROW decoder Copy-Row DRAM (CROW) regular row decoder High access latency bottleneck for improving system performance/energy Refresh overhead reduces performance and consume high energy Exposure to vulnerabilities (e.g., RowHammer) CROW is a flexible substrate with many use cases: CROW-cache & CROW-ref (20% speedup and consumes 22% less DRAM energy) Mitigating RowHammer We hope CROW enables many other use cases going forward 39 CROW

A Low-Cost Substrate for Improving DRAM Performance, Energy Efficiency, and Reliability Hasan Hassan Minesh Patel Jeremie S. Kim A. Giray Yaglikci Nika Mansouri Ghiasi Saugata Ghose Nandita Vijaykumar Onur Mutlu Backup Slides Latency Reduction with MRA 42 Mitigating RowHammer victim aggressor victim SA

SA SA SA SA SA activate precharge Key idea: remap victim rows to copy rows 43 CROW-cache Performance single-core HHHH AVERAGE ...

h264-dec libq stream-cp mcf lbm zeus 6.6% 7.5% 7.1% 0.7% tpch2 1.20 1.15 1.10 1.05 1.00

0.95 0.90 leslie3d Speedup CROW-1 CROW-8 CROW-64 CROW-128 Ideal CROW-cache (100% Hit Rate) four-core44 Speedup CROW-ref Performance 8 Gbit 16 Gbit 1.20 1.15 1.10 1.05 1.00

0.95 0.90 32 Gbit 64 Gbit 11.9% 7.1% single-core four-core 45 N o r m a liz e d D R A M CROW-ref Energy Savings 17.2% 8 Gbit

16 Gbit 32 Gbit 7.8% 64 Gbit 1.00 0.95 0.90 0.85 0.80 0.75 0.70 single-core four-core 46

Speedup - CROW-cache Single-core 47 Speedup - CROW-cache Four-core 48 Energy CROW-cache 49 2.2 1.8 1.4 1.0 0.6 4% 6%

8% 10% 12% 14% 16% Speedup Chip Area Overhead Normalized DRAM Energy Comparison to TL-DRAM and SALP 30% 25% 20% 15% 10% 5% 0% 4% 6% 8% 10% 12% 14% 16% Speedup

50 Slide on RLTL 51 Speedup CROW-ref 52 Energy CROW-ref 53 CROW-cache + ref 54 CROW-table Organization 55

tRCD vs tRAS 56 MRA Area Overhead 57 DRAM Charge over Time Ready to Precharge Ready to Access Cell Ready to Access Charge Level Cell Sense Amplifier

charge Data 1 Sense-Amplifier Data 0 Sensing tRCD ACT Restore R/W Precharge time PRE tRAS 58

Recently Viewed Presentations

  • Recursive and Explicit Forms of Arithmetic Sequences

    Recursive and Explicit Forms of Arithmetic Sequences

    Linear Functions Recursive and Explicit Formulas for Arithmetic Sequences Modified from online version - Parziale An arithmetic sequence is a sequence with a increase or decrease also known as the _____ In the sequence 10, 40, 70, 100, 130….
  • Rhetoric: The power of language

    Rhetoric: The power of language

    define rhetorical terms and types of figurative language used in speeches and memoir. discuss how rhetoric and figurative language impact my personal experience as an audience member of a speech or reader of a memoir. What is a Speech?
  • Writing Compliant ETRs State Approved Training Module 4-2-2012

    Writing Compliant ETRs State Approved Training Module 4-2-2012

    The district must provide the parent the PR-01, a copy of the ETR and afford them the opportunity to review the ETR in the future. Procedural Safeguards: OAC 3301-51-05(F) OAC 3301-51-07(J) * * For re-evaluations: The ETR team is the...
  • Sport - New Internationalist

    Sport - New Internationalist

    Since the 1979 revolution, Iranian women have not been allowed to go to live sports events because they are 'un-Islamic'. But in 2012, this ban started to include volleyball too. And volleyball is the most popular sport in Iran. So...
  • Russia, China, and the Birth of Communism

    Russia, China, and the Birth of Communism

    Russian Imperialism and Communism. Outcome: Absolute Rulers of Russia. Constructive Response Question. How were the reigns of Ivan III and Ivan the Terrible similar/different? Why does history call Peter "the Great?" ...
  • Temporal Aspects of Visual Extinction

    Temporal Aspects of Visual Extinction

    X-Linked Inheritance Females have two X-chromosomes, males have one. All boys inherited their X-chromosome from their mother. Boys vulnerable to recessive mutations on the X-chromosome. Example: Red-green color blindness. Plunnett square For traits that are determined by a single gene,...
  • Fudge a Mania by Judy Blume

    Fudge a Mania by Judy Blume

    Fudge-a-Mania by Judy Blume. Chapters 9-11. antique - an old piece of furniture or other object that is usually valuable. A typewriter is an antique. We don't use them anymore. Now we use computers.
  • Microbes&quot;Investigate biological ideas relating to ...

    Microbes"Investigate biological ideas relating to ...

    4°C, 20°C, and 28°C. What difference does Temperature make? ... MRS GREN. processes. Viruses are not really living things at all. They can't move by themselves, they can't reproduce without other cells, they don't respire, or excrete, or feed.