Shared memory architectures Shared memory architectures Multiple CPUs (or cores) One memory with a global address space May have many modules All CPUs access all memory through the global address space All CPUs can make changes to the
shared memory Changes made by one processor are visible to all other processors? Data parallelism or function parallelism? Shared memory architectures How to connect CPUs and memory? Shared memory architectures
One large memory One the same side of the interconnect Mostly Bus Memory reference has the same latency Uniform memory access (UMA) Many small memories Local and remote memory Memory latency is different Non-uniform memory access (NUMA)
UMA Shared memory architecture (mostly bus-based MPs) Many CPUs and memory modules connect to the bus dominates server and enterprise market, moving down to desktop Faster processors began to saturate bus, then bus technology advanced
today, range of sizes for bus-based systems, desktop to large servers (Symmetric Multiprocessor (SMP) machines). Bus bandwidth in Intel systems Front side bus(FSB) bandwidth in Intel systems Pentium D 133 MHz200 MHz
4 64-bit 6400 MB/s-12800 MB/s NUMA Shared memory architecture Identical processors, processors have different time for
accessing different part of the memory. Often made by physically linking SMP machines (Origin 2000, up to 512 processors). The current generation SMP interconnects (Intel Common System interface (CSI) and AMD hypertransport) have this flavor, but the processors are close to each other. Various SMP hardware organizations Cache coherence problem
Due to the cache copies of the memory, different processors may see the different values of the same memory location. Processors see different values for u after event 3. With a write-back cache, memory may store the stale date. This happens frequently and is unacceptable to applications. Bus Snoopy Cache Coherence protocols Memory: centralized with uniform access time and bus interconnect.
Example: All Intel MP machines like diablo Bus Snooping idea Send all requests for data to all processors (through the bus) Processors snoop to see if they have a copy and respond accordingly. Cache listens to both CPU and BUS. The state of a cache line may change by (1) CPU memory operation, and (2) bus transaction (remote CPUs memory operation).
Requires broadcast since caching information is at processors. Bus is a natural broadcast medium. Bus (centralized medium) also serializes requests. Dominates small scale machines. Types of snoopy bus protocols Write invalidate protocols Write to shared data: an invalidate is sent to the bus (all
caches snoop and invalidate copies). Write broadcast protocols (typically write through) Write to shared data: broadcast on bus, processors snoop and update any copies. An Example Snoopy Protocol (MSI) Invalidation protocol, write-back cache Each block of memory is in one state
Clean in all caches and up-to-date in memory (shared) Dirty in exactly one cache (exclusive) Not in any cache Each cache block is in one state: Shared: block can be read Exclusive: cache has only copy, its writable and dirty Invalid: block contains no data. Read misses: cause all caches to snoop bus (bus transaction) Write to a shared block is treated as misses (needs bus
transaction). MSI protocol state machine for CPU requests MSI protocol state machine for Bus requests MSI protocol state machine (combined)
Some snooping cache variations Basic Protocol Three states: MSI. Can optimize by refining the states so as to reduce the bus transactions in some cases. Berkeley protocol Five states, M owned, exclusive, owned shared. Illinois protocols (five states) MESI protocol (four states)
M modified and Exclusive. Used by Intel MP systems. Multiple levels of caches Most processors today have on-chip L1 and L2 caches. Transactions on L1 cache are not visible to bus (needs separate snooper for coherence, which would be expensive). Typical solution: Maintain inclusion property on L1 and L2 cache so that
all bus transactions that are relevant to L1 are also relevant to L2: sufficient to only use the L2 controller to snoop the bus. Propagating transactions for coherence in the hierarchy. Large share memory multiprocessors The interconnection network is usually not a bus. No broadcast medium cannot snoop. Needs a different kind of cache coherence protocol.
Basic idea Use a similar idea of snoopy bus Snoopy bus with the MSI protocol Cache line has three states (M, S, and I) Whenever we need a cache coherence operation, we tell the bus (central authority). CC protocol for large SMPs Cache line has three states Whenever we need a cache coherence operation, we tell the
central authority serializes the access performs the cache coherence operations using point-to-point communication. It needs to know who has a cache copy, this information is stored in the directory. Cache coherence for large SMPs Use a directory for each cache line to track the state of every block in the cache. Can also track the state for all memory blocks
directory size = O(memory size). Need to used distributed directory Centralized directory becomes the bottleneck. Who is the central authority for a given cache line? Typically called cc-NUMA multiprocessors ccNUMA multiprocessors Directory based cache coherence
protocols Similar to snoopy protocol: three states Shared: > 1 processors have the data, memory up-todate Uncached: not valid in any cache Exclusive: 1 processor has data, memory out-of-date Directory must track: Cache state Which processors have data when it is in shared state Bit vector, 1 if a particular processor has a copy Id and bit vector combination
Directory based cache coherence protocols No bus and do not want to broadcast Typically 3 processors involved: Local node where a request originates Home node where the memory location of an address resides (this is the central authority for the page) Remote node has a copy a cache block (exclusive or shared)
Directory protocol messages example Directory based CC protocl in action Local node (L): WriteMiss(P, A) to home node Home node: cache line in shared state at processors P1, P2, P3 Home node to P1, P2, P3: invalidate(P, A)
Home node: cache line in exclusive state at processor L. Summary Share memory architectures UMA and NUMA Bus based systems and interconnect based systems Cache coherence problem Cache coherence protocols
Project Context. The National Early Warning Score initiative is a work stream of the Acute Medicine Programme in association with other Clinical Programmes, Quality & Patient Safety, Office of the Nursing and Midwifery Services Director, Clinical Indemnity Scheme, the Assistant...
Conflict power laws. ... 50-50no holes. level zero. Our Lord Jesus Christ. the . way, the truth, and the life. the. hypotenuse . is indeed the path of . peace! The unique geometric solution. only straight and solid condition without...
If the force tends to rotate the structure clockwise the sense of the moment vector is into the plane of the structure and the magnitude of the moment is negative. 3 - * Varignon's Theorem The moment about a give...
I will also designate a student to call 911. 28 27 26 25 Empty Braeden McCollum Empty 21(end) 23 (end) Christina Johnson Brittany Starr HOME STATION 2 22 (end) 24 (end) Destiny Crossin Da'Nia Robinson 20 19 18 17 Ashley...
Character Education (2006) by Daniel Lapsley and Darcia Narvaez. In W. Damon and Richard Lerner (Eds.), Handbook of child psychology (6th Ed.). New York: Wiley. A very up-to-date evaluation of moral education by leading experts. Cults (1999) by Marc Galanter....
Design exercise #1. Write a typing-break reminder program. Offer the hard-working user occasional reminders of the perils of Repetitive Strain Injury, and encourage the user to take a break from typing.
NW. NE. SW. SE. 04/03/18 * Who . we are: broadly - OBOEC (Office of Business Opportunity & Equity Compliance) staff throughout regions *Labor Compliance Specialist - monitor state let construction projects (not locally let) *Who will be the LCS...
Ready to download the document? Go ahead and hit continue!