









## **Memory Hierarchy** Technology cost / GB Access time SRAM ~ \$10,000 ~ 1 ns Cache Speed DRAM ~ \$100 ~ 100 ns Main Memory Hard Disk ~ \$1 ~ 10,000,000 ns Virtual Memory Size Copyright © 2007 Elsevier



# Memory Performance Example 1 A program has 2,000 load and store instructions 1,250 of these data values found in cache The rest are supplied by other levels of memory hierarchy What are the miss and hit rates for the cache?



## Memory Performance Example 2

- Suppose processor has 2 levels of hierarchy: cache and main memory
- $t_{\text{cache}} = 1$  cycle,  $t_{MM} = 100$  cycles

Copyright © 2007 Elsevier

• What is the AMAT of the program from Example 1?



## Memory Performance Example 2 • Suppose processor has 2 levels of hierarchy: cache and main memory • $t_{cache} = 1$ cycle, $t_{MM} = 100$ cycles • What is the AMAT of the program from Example 1 when using this memory system? AMAT = $t_{cache} + MR_{cache}(t_{MM})$ = [1 + 0.375(100)] cycles = 38.5 cycles ELEVIER

## Gene Amdahl, 1922 -

• Amdahl's Law: the effort spent on increasing the performance of a subsystem is wasted unless the subsystem affects a large percentage of the overall performance

Cofounded three companies, including one called Amdahl

Corporation in 1970



Copyright © 2007 Elsevier

Copyright © 2007 Elsevier

## **Cache Design Questions**

- What data is held in the cache?
- How is data found?
- What data is replaced?

We'll focus on data loads, but stores follow same principles

```
LSEVIER
```

## Cache

Copyright © 2007 Elsevier

### A safe place to hide things

- Highest level in memory hierarchy
- Fast (typically ~ 1 cycle access time)
- Ideally supplies most of the data to the processor
- Usually holds most recently accessed data

## What data is held in the cache?

- Ideally, cache anticipates data needed by processor and holds it in cache
- But impossible to predict future
- So, use past to predict future temporal and spatial locality:
  - Temporal locality: if processor accesses data not held in cache, copy data from lower level of hierarchy into cache. Next time, data is available in cache.
  - Spatial locality: copy neighboring data into cache too. Block size = number of bytes copied into cache at once.

Copyright © 2007 Elsevier

































| Cache Organization Recap |                                                                                                                                                                                                                            |                         |                           |      |  |  |  |  |  |
|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|---------------------------|------|--|--|--|--|--|
|                          | <ul> <li>Capacity: <i>C</i></li> <li>Block size: <i>b</i></li> <li>Number of blocks in cache: <i>B</i> = <i>C/b</i></li> <li>Number of blocks in a set: <i>N</i></li> <li>Number of Sets: <i>S</i> = <i>B/N</i></li> </ul> |                         |                           |      |  |  |  |  |  |
|                          |                                                                                                                                                                                                                            | Number of Ways          | Number of S               | Sets |  |  |  |  |  |
|                          | Organization                                                                                                                                                                                                               | ( <i>N</i> )            | (S = B/N)                 | )    |  |  |  |  |  |
|                          | Direct Mapped                                                                                                                                                                                                              | (N)<br>1                | (S = B/N) B               | )    |  |  |  |  |  |
|                          | Direct Mapped<br>N-Way Set Associative                                                                                                                                                                                     | (N) $1$ $1 < N < B$     | (S = B/N) $B$ $B / N$     | )    |  |  |  |  |  |
|                          | Direct Mapped<br>N-Way Set Associative<br>Fully Associative                                                                                                                                                                | (N) $1$ $1 < N < B$ $B$ | (S = B/N) $B$ $B / N$ $1$ |      |  |  |  |  |  |

## **Capacity Misses**

Copyright © 2007 Elsevier

- Cache isn't big enough to hold all data of interest at one time
- If the cache is full and program tries to access data X that is not in cache, cache must evict data Y to make room for X
- Capacity miss occurs if program then tries to access Y again
- X will be placed in a particular set based on its address
- In a direct mapped cache, there is only one place to put X
- In an associative cache, there are multiple ways where X could go in the set.
- How to choose Y to minimize chance of needing it again?
- Least recently used (LRU) replacement: the least recently used block in a set is evicted when the cache is full.

8-<34;











## Multilevel Caches Larger caches have lower miss rates But also have longer access times Expand the memory hierarchy to multiple levels of caches Level 1: small and fast (e.g. 16 KB, 1 cycle) Level 2: larger and slower (e.g. 256 KB, 2-6 cycles) Even more levels are possible



## **Virtual Memory**

- Gives the illusion of a bigger memory without the high cost of DRAM
- Each program uses virtual addresses
  - The entire virtual address space is stored on a hard disk, but DRAM acts as a cache for the most commonly accessed parts
  - These virtual addresses are translated by the CPU into *physical* addresses in DRAM. If the data is not in DRAM, it is fetched from the hard disk.
- Each program can have its own virtual to physical mapping
  - Two programs can use the same virtual address for different data
  - Programs don't need to be aware that other ones are running
  - One program (or virus) can't corrupt the memory used by another

8-<42>

- This is called memory protection

Copyright © 2007 Elsevier





Copyright © 2007 Elsevier

Physical memory acts as cache for virtual memory

| Cache        | Virtual Memory      |
|--------------|---------------------|
| Block        | Page                |
| Block Size   | Page Size           |
| Block Offset | Page Offset         |
| Miss         | Page Fault          |
| Tag          | Virtual Page Number |



## Virtual Memory Definitions Page size: amount of memory transferred from hard disk to DRAM Address translation: determining the physical address from the virtual address Page table: lookup table used to translate virtual addresses to physical addresses









| Virtual Memory Example                                                 |                                                                                                                                                               |                                           |  |  |  |  |  |
|------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------|--|--|--|--|--|
| What is the physical address of virtual address 0x247C?                | Virtual<br>Page<br>Number                                                                                                                                     |                                           |  |  |  |  |  |
|                                                                        | 0x7FFFF000 - 0x7FFFFFF<br>0x7FFFE000 - 0x7FFFFFF<br>0x7FFFD000 - 0x7FFFDFFF<br>0x7FFFC000 - 0x7FFFFFFFF<br>0x7FFFB000 - 0x7FFFBFFF<br>0x7FFFA000 - 0x7FFFAFFF | 7FFFF<br>7FFFD<br>7FFFC<br>7FFFB<br>7FFFB |  |  |  |  |  |
| Physical<br>Page<br>Number Physical Addresses                          | 0x7FFF9000 - 0x7FFF9FFF                                                                                                                                       | 7FFF9                                     |  |  |  |  |  |
| 7FFF 0x7FFF000 - 0x7FFFFFF<br>7FFE 0x7FFE000 - 0x7FFEFFF               | 0x00005000 - 0x00005FFF<br>0x00004000 - 0x00004FFF<br>0x00003000 - 0x00003FFF                                                                                 | 00005<br>00004<br>00003                   |  |  |  |  |  |
| 0001 0x0001000 - 0x0001FFF<br>0x0000000 - 0x0000FFF<br>Physical Memory | 0x00002000 - 0x00002FFF<br>0x00001000 - 0x00001FFF<br>0x00000000 - 0x00000FFF<br>Virtual Memory                                                               | 00002                                     |  |  |  |  |  |
| © 2007 Elsevier 0 2007 Elsevier 8-<51>                                 |                                                                                                                                                               |                                           |  |  |  |  |  |

















## **Translation Lookaside Buffer (TLB)**

- Use a translation lookaside buffer (TLB)
  - Small cache of most recent translations
  - Reduces number of memory accesses required for most loads/stores from two to one



## **Memory Protection**

Copyright © 2007 Elsevie

- Multiple programs (processes) run at once
- Each process has its own page table
- Each process can use entire virtual address space without worrying about where other programs are
- A process can only access physical pages mapped in its page table – can't overwrite memory from another process

## **Virtual Memory Summary**

**Example Two-Entry TLB** 

470

=

Entry 1

Page Number Page Numbe

1 0x7FFFD 0x0000

Physical

Physical

Address

Virtual

Entry 0

Page Number Page Numbe

Physical

0y7EEE

ľ12

Virtual

0x7FFF 47C

Virtual Page Numbe

Virtual

Address

Copyright © 2007 Elsevier

- Virtual memory increases capacity
- A subset of virtual pages are located in physical memory
- A page table maps virtual pages to physical pages – this is called address translation
- A TLB speeds up address translation
- Using different page tables for different programs provides memory protection

Copyright © 2007 Elsevier

## Memory-Mapped Input/Output (I/O)

Copyright © 2007 Elsevie

- Processor accesses I/O devices (like keyboards, monitors, printers) just like it accesses memory
- Each I/O device assigned one or more address
- When that address is detected, data is read from or written to I/O device instead of memory
- A portion of the address space dedicated to I/O devices (for example, addresses 0xFFFF0000 to 0xFFFFFFFF in reserved segment of memory map)



Copyright © 2007 Elsevie



 Selects between memory and I/O devices as source of data sent to the processor



















| Softwa                        | ire l                                     | Driv                                                       | er for th                                                                                          | е           | Speech Chip                                                                                                                                                  |
|-------------------------------|-------------------------------------------|------------------------------------------------------------|----------------------------------------------------------------------------------------------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                               |                                           |                                                            |                                                                                                    |             |                                                                                                                                                              |
| init:                         | addi<br>addi<br>lui<br>addi               | \$t1,<br>\$t2,<br>\$t3,<br>\$t4,                           | \$0, 1<br>\$0, 20<br>0x1000<br>\$0, 0                                                              | #<br>#<br># | <pre>\$t1 = 1 \$t2 = array size * 4 \$t3 = array base address \$t4 = 0 (array index)</pre>                                                                   |
| start:<br>loop:               | sw<br>lw<br>beq                           | \$t1,<br>\$t5,<br>\$0,                                     | 0xFF04(\$0)<br>0xFF08(\$0)<br>\$t5, loop                                                           | #<br>#<br># | ALD = 1<br>\$t5 = SBY<br>loop until SBY == 1                                                                                                                 |
|                               | add<br>lw<br>sw<br>sw<br>addi<br>beq<br>i | \$t5,<br>\$t5,<br>\$t5,<br>\$0,<br>\$t4,<br>\$t4,<br>\$t4, | <pre>\$t3, \$t4<br/>0(\$t5)<br/>0xFF00(\$0)<br/>0xFF04(\$0)<br/>\$t4, 4<br/>\$t2, done<br/>t</pre> | # # # # # # | <pre>\$t5 = address of allophone<br/>\$t5 = allophone<br/>ALD = 0 to initiate speech<br/>increment array index<br/>last allophone in array?<br/>repeat</pre> |
| done:<br>Copyright © 2007 Els | sevier                                    |                                                            |                                                                                                    |             | 8-<76>                                                                                                                                                       |
|                               |                                           |                                                            |                                                                                                    |             | ELSEVIER                                                                                                                                                     |









## 





## Course Summary You have learned about: Combinational and sequential logic Schematic and HDL design entry Digital building blocks: adders, ALUs, multiplexers, decoders, memories, etc. Assembly language – computer architecture Processor design – microarchitecture Memory system design The world is an increasingly digital place You have the tools to design and build powerful digital circuits that will shape our world Use this power wisely and for good!