Linearly Compressed Pages: - Department of Computer Science ...
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi Xin, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons, Michael A. Kozuch Executive Summary Main memory is a limited shared resource Observation: Significant data redundancy Idea: Compress data in main memory
Problem: How to avoid inefficiency in address computation? Solution: Linearly Compressed Pages (LCP): fixed-size cache line granularity compression 1. Increases memory capacity (62% on average) 2. Decreases memory bandwidth consumption (24%) 3. Decreases memory energy consumption (4.9%) 4. Improves overall performance (13.9%) 2 Potential for Data Compression Significant redundancy in in-memory data: 0x00000000 0x0000000B 0x00000003 0x00000004
How can we exploit this redundancy? Main memory compression helps Provides effect of a larger memory without making it physically larger 3 Challenges in Main Memory Compression 1. Address Computation 2. Mapping and Fragmentation 4 Challenge 1: Address Computation Cache Line (64B) Uncompressed Page Address Offset
Virtual Address Physical Address Physical Page (? KB) Fragmentation 6 Outline Motivation & Challenges Shortcomings of Prior Work
LCP: Key Idea LCP: Implementation Evaluation Conclusion and Future Work 7 Key Parameters in Memory Compression Compression Ratio Address Comp. Latency Decompression Latency Complexity and Cost
8 Shortcomings of Prior Work Compression Mechanisms IBM MXT [IBM J.R.D. 01] Compression Address Decompression Complexity Ratio Comp. Latency and Cost Latency 2X
64 cycles 9 Shortcomings of Prior Work (2) Compression Mechanisms IBM MXT [IBM J.R.D. 01] Robust Main Memory Compression [ISCA05] Compression Address Decompression Complexity Ratio Comp. Latency And Cost
Latency 10 Shortcomings of Prior Work (3)
Compression Mechanisms IBM MXT [IBM J.R.D. 01] Robust Main Memory Compression [ISCA05] LCP: Our Proposal Compression Address Decompression Complexity Ratio Comp. Latency And Cost Latency
M E How to avoid 2 accesses ? Metadata (MD) cache 14 Key Ideas: Summary Fixed compressed size per cache line Metadata (MD) cache 15 Outline
Motivation & Challenges Shortcomings of Prior Work LCP: Key Idea LCP: Implementation Evaluation Conclusion and Future Work 16 LCP Overview Page Table entry extension PTE compression type and size (fixed encoding) OS support for multiple page sizes 4 memory pools (512B, 1KB, 2KB, 4KB) Handling 4KB
Decompress Memory Controller MD Cache Processor 4KB 1KB $Line Disk 2KB 1KB DRAM 1. Initial PageLine Compression
2. Cache Read 3. Cache Line Writeback 21 Handling Page Overflows Happens after writebacks, when all slots in the exception storage are already taken Compressed Data M E0 E1 E2 Exception Storage
Two possible scenarios: Happens infrequently - page size Type-1 overflow: requires larger physical $ line (e.g., 2KB instead 1KB)instructions once perof~2M Type-2 overflow: requires decompression and full uncompressed physical page (e.g., 4KB) 22 Compression Algorithms Key requirements: Low hardware complexity
Low decompression latency High effective compression ratio Frequent Pattern Compression [ISCA04] Uses simplified dictionary-based compression Base-Delta-Immediate Compression [PACT12] Uses low-dynamic range in the data 23 Base-Delta Encoding [PACT12] 32-byte Uncompressed Cache Line 0xC04039C0 0xC04039C8 0xC04039D0 0xC04039F8 0xC04039C0 Base
0x00 0x08 0x10 0x38 12-byte Compressed Cache Line BDI 1 byte 1 bytehas two bases: [PACT12] Simple Hardware: 1. 20 zero base (for narrow
values) Fast Decompression: bytes saved vector addition and comparison 2. arbitrary base (firstarithmetic non-zero value in the cache line) 1 byte Effective: good compression ratio 24 LCP-Enabled Optimizations Memory bandwidth reduction: 64B 64B 64B
64B 1 transfer instead of 4 Zero pages and zero cache lines Handled separately in TLB (1-bit) and in metadata (1-bit per cache line) 25 Outline Motivation & Challenges
Shortcomings of Prior Work LCP: Key Idea LCP: Implementation Evaluation Conclusion and Future Work 26 Methodology Simulator: x86 event-driven based on Simics Workloads (32 applications) SPEC2006 benchmarks, TPC, Apache web server System Parameters L1/L2/L3 cache latencies from CACTI [Thoziyoor+, ISCA08] 512kB - 16MB L2 caches DDR3-1066, 1 memory channel Metrics Performance: Instructions per cycle, weighted speedup Capacity: Effective compression ratio
Bandwidth: Bytes per kilo-instruction (BPKI) Energy: Memory subsystem energy 27 Evaluated Designs Design Description Baseline Baseline (no compression) RMC Robust main memory compression[ISCA05] (RMC) and FPC[ISCA04] LCP-FPC LCP framework with FPC LCP-BDI
LCP framework with BDI[PACT12] LZ Lempel-Ziv compression (per page) 28 C o m p r e s s i o n R a ti o Effect on Memory Capacity 32 SPEC2006, databases, web workloads, 2MB L2 cache Baseline 2.5 2.0 1.5 1.0 0.5 0.0 RMC
1.59 LCP-FPC 1.52 LCP-BDI LZ 1.62 1.00 LCP-based designs achieve competitive average compression ratios with prior work 29 N o r m a liz e d B P K I
Effect on Bus Bandwidth Better 32 SPEC2006, databases, web workloads, 2MB L2 cache 1.0 0.8 0.6 0.4 0.2 0.0 Baseline 1.00 RMC 0.79 LCP-FPC 0.80
LCP-BDI 0.76 LCP-based designs significantly reduce bandwidth (24%) (due to data compression) 30 P e r f o r m a n c e Im p r o Effect on Performance 16% 14% 12% 10% 8% 6% 4% 2% 0%
RMC 1-core LCP-FPC 2-core LCP-BDI 4-core LCP-based designs significantly improve performance over RMC 31 N o r m a l iz e d E n e r g y Effect on Memory Subsystem Energy
0.4 0.2 0.0 LCP framework is more energy efficient than RMC 32 N o r m a liz e d # o f P a g e F a Effect on Page Faults 32 SPEC2006, databases, web workloads, 2MB L2 cache 1.2 1 0.8 0.6 0.4 0.2 0
Baseline 256MB 512MB LCP-BDI 768MB 1GB DRAM Size LCP framework significantly decreases the number of page faults (up to 23% on average for 768MB) 33 Other Results and Analyses in the Paper
Analysis of page overflows Compressed page size distribution Compression ratio over time Number of exceptions (per page) Detailed single-/multicore evaluation Comparison with stride prefetching performance and bandwidth 34 Conclusion Old Idea: Compress data in main memory Problem: How to avoid inefficiency in address computation? Solution: A new main memory compression framework called LCP (Linearly Compressed Pages)
Key idea: fixed-size for compressed cache lines within a page Evaluation: 1. 2. 3. 4. Increases memory capacity (62% on average) Decreases bandwidth consumption (24%) Decreases memory energy consumption (4.9%) Improves overall performance (13.9%) 35 Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi Xin, Onur Mutlu,
Todd C. Mowry Phillip B. Gibbons, Michael A. Kozuch Backup Slides 37 Large Pages (e.g., 2MB or 1GB) Splitting large pages into smaller 4KB subpages (compressed individually) 64-byte metadata chunks for every sub-page 2KB 2KB 2KB M 2KB
38 Physically Tagged Caches Core Virtual Address Critical Path TLB L2 Cache Lines tag tag tag Address Translation Physical
Address data data data 39 Changes to Cache Tagging Logic Cache Lines Before: p-base tag tag tag data data data After:
p-base c-idx p-base physical page base address c-idx cache line index within the page 40 1E-03 1E-04 1E-05 1E-06 Geo 1E-08 lbm 1E-07 gcc Type-1 Overflows per instr.
(log-scale) Analysis of Page Overflows 41 Frequent Pattern Compression Idea: encode cache lines based on frequently occurring patterns, e.g., first half of a word is zero 0x0000000 0x0000000 0xFFFFFFF 0xABCDEF 0xABCDEF 0xF 001 0x0001 000 011 111 1 0 FFFF F F
Frequent Patterns: 000 All zeros 001 First half zeros 010 Second half zeros 011 Repeated bytes 100 All ones 111 Not a frequent pattern 0x0000000 1 0x0000000 0 0xFFFFFFF F 0xABCDEF FF 0x000 001
The purpose of this study was to develop a community of learners, who worked collaboratively to investigate the Gifted and Talented program, in an effort to build a comprehensive, thoughtful and progressive program that serves the Ridgefield School District well.
Med Hypotheses 2001; 57: 224-30 Becker's theory Injury currents(DC) The body does have a means of activating its own semiconductor bioelectric circuits to send endogenous biological electricity where it is needed for healing the classical description of acupuncture meridians are...
Learning Outcomes Process: Explore the theme of conflict through the stimuli of poetry, drama mediums of mime, movement and gesture and the drama elements of contrasts and characterisation. Poem Exploration In groups of 5/6 you will receive a copy of...
Kepler's first law of planetary orbits states that. planets orbit the Sun. orbits are noncircular. orbits are elliptical in shape. all of the above are stated. Explanation: Kepler's laws apply to . all. orbiting objects. The Moon orbits Earth in...
* Dr Haizmann AllEnergy Turkey * Dr Haizmann AllEnergy Turkey * Dr Haizmann AllEnergy Turkey * * Dr Haizmann AllEnergy Turkey * Dr Haizmann AllEnergy Turkey Agree that the reference to nodal pricing could be misleading and would not use...
Using Dialogue in Narrative Notes. We need to follow certain formatting rules when using dialogue so that the reader doesn't get confused: Use quotation marks ("...") around the dialogue. Start a new line each time a different character begins speaking
$300 Answer from Vocab What is a scuffle? $400 Question from H4 To invent something. $400 Answer from H4 What is concoct? $500 Question from H4 Very careful and precise. $500 Answer from Greek Culture What is meticulous? $100 Question...
Ready to download the document? Go ahead and hit continue!