# Chapter Virtual Memory

1



| 2 Introduction | Memory                                                                                                                                                                                                                                                                                                                                     |  |
|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
|                | Baby had an instruction set where the programmer<br>wrote absolute addresses.<br>If you wanted to add a line all the subsequent lines<br>would have the wrong number.<br>Any jump backs – say for loops would also need<br>modification.<br>Wrote the code – created the opcodes and keyed<br>them in.<br>Corrections were time consuming. |  |
|                | Any subroutine was also a problem. How do you<br>write a subroutine which can be linked in with an<br>arbitrary programme?                                                                                                                                                                                                                 |  |
| Program        | Assembler to automate recalculating the offsets<br>etc.<br>Relative addressing.<br>Start at 0, all commands at an offset.<br>On loading merely add start address to all offsets                                                                                                                                                            |  |
|                |                                                                                                                                                                                                                                                                                                                                            |  |

That is **not** how it would have been described.

## Relocatable

Such code is called re-locatable. Makes code development much easier (possible?)

Can think of it as a virtual address space for the subroutine  $0-173_8$  which is mapped onto real memory from  $326_8-531_8$  code is called re-locatable.

# **Memory Limit**

Early machines had a small address space. An even smaller amount of memory. You wrote your programs to fit in the available **physical** memory.

If it wouldn't fit you simplified the programme. Found clever ways of saving an instruction. But the limit was a real hard limit.

#### **4** Overlays

# Address mapping

People tried to write self modifying code, but it was never more than a (very small) minority. The real solution to create overlays.



That is **not** how it would have been described. The programmer must identify when different sections of code are required and arrange for the sections to be loaded at the appropriate time.

| <b>5</b> Overlays | Larger than physical memory<br>It was not seen as a single virtual address space<br>which was mapped into a smaller physical space.<br>There were a number of constraints:<br>a)Variables required outside the overlay had to be<br>defined elsewhere, in a permanently resident |
|-------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Program           | section<br>b)With multiple overlays you need to make sure<br>that all overlays are resident at the appropriate<br>times. (Including the call into the section)<br>c)You needed a root section which was always                                                                   |
| Program           | d)The values in the overlay would not be present if<br>it was overwritten and reloaded.                                                                                                                                                                                          |
| Program           | e)You needed to be careful about the state of registers.                                                                                                                                                                                                                         |
| Program           | Experience did not translate between different machines.                                                                                                                                                                                                                         |
|                   |                                                                                                                                                                                                                                                                                  |

| <b>6</b> Overlays         | Example                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                          |
|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------|
| Permanent         Section | Real time system to control an experiment.          0.0-0.7s beam of particles interact to create events.         State of detectors need to be monitored.         0.7-2.3s gap between beam. Study of output from the detectors during the last <b>burst</b> .         repeat for 2 hours.         10 minute gap         repeat for three weeks         A slight advantage in that there is an external signal which indicates the start and end of beam.         Hard to write, hard to test and verify correct operation.         It would hang approximately once every 24 hours.         Simple in that temporal localisation was enforced by environment. | Taking data.<br>ensuring data<br>quality |
|                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | <b> </b>                                 |



## Virtual memory terminology

Virtual memory shares many concepts with cache, but different terms.

The unit which is swapped in and out of memory is called a *page* or *block* 

A cache miss becomes a page fault or (address fault)

The processor produces *virtual addresses* translated to *physical addresses* in memory *Memory mapping* or *address translation* Special place on disk – *the page file* Excessive *faulting* leads to repeated replacement of the pages in memory *disk thrashing* May lead to very little work being undertaken.

Cache replacement is implemented in hardware Virtual memory strategies are OS determined. Miss penalty is higher so the algorithm can be more complex.



| Parameter     | 1 <sup>st</sup> Level<br>cache   | Virtual Memory                       |
|---------------|----------------------------------|--------------------------------------|
| Unit size     | 16-128 bytes                     | Up to 8TB                            |
| Hit Time      | 1-3 cycles                       | 100-200 cycles                       |
| Miss penalty  | 8-200 cycles                     | 1-10 million<br>cycles               |
| Access time   | 6-60 cycles                      | 0.8-8M cycles                        |
| Transfer time | 2-40 cycles                      | 0.2-2M cycles                        |
| Miss Rate     | 0.1-10%                          | 10 <sup>-6</sup> -10 <sup>-3</sup> % |
| Miss Time     | 0.1%-22%                         | 0.5%-50%                             |
| Mapping       | 25-45 physical<br>to 14-20 cache | 32-64 bit virtual-<br>25-45 physical |

Processor address space determines the size of virtual memory – hence importance of 64 bit. VM can be fixed size **pages** or variable size **segments**. Pages lead to inefficient memory usage as the pages may contain many empty cells. Modern machines base their system on pages although the may allow multi-page transfer, which are also called segments.

## **10** Segment v Page **VM v**

|                    | Page                     | Segment                                                               |  |
|--------------------|--------------------------|-----------------------------------------------------------------------|--|
| Words per address  | 1                        | 2                                                                     |  |
| Programmer visible | Invisible                | Possibly visible                                                      |  |
| Replace a block    | Trivial                  | Computationally hard                                                  |  |
| Memory efficiency  | Internal gaps            | Gaps between segments<br>too small to fill                            |  |
| Disk traffic       | Yes. Choice of page size | No -small segments are<br>inefficient. Access time ><br>transfer time |  |

High page fault penalty, so optimise for low page faulting.

## 11 details



The memory divided into **frames** (or page frames) The compiled programmes are divided into **pages** 

They are of course the same size.

A problem is now apparent.

Compile code contains memory locations: location of data words, targets of jumps and subroutine calls.

**12** Addresses

The programme with addresses starting at some arbitrary location. This is the logical address of the instruction or data word.

When a page is copied into a frame, the locations are all at a physical address in memory.

Need to translate from logical to physical. Add the base address of the physical location to the logical location.

Like the overlay when we bring a page in from disk it may go into a new place.

## Modern OS use **demand paging**

A page is in only bought in off disk when it is required.

Access to a page not in memory generates a **page fault** (cf cache miss)

It will overwrite a page already in memory **page replacement** for which we need an algorithm analogous to cache replacement. Dynamic version of the relocatable loader

#### 13 Translation\*

The size of the

logical address

Virtual memory

space may be larger than the physical memory Physical address of memory



All programs share the same physical address space Machine Language programs must be aware of machine organisation. (compilers for HLL) Program can access any machine resource.

Programs run in a standard virtual address space. Address translation managed by hardware which maps virtual address to physical.

#### Address translation supports:

multi-threaded programming.

Stacks may be allowed to grow in disjoint physical memory, but in contiguous virtual memory

#### Protection

protection of sensitive areas of memory protection of threads from each other Kernel data protected from user programs **security Sharing** 

Sharing of memory between processes (threads) Speedy interchange between programs

Memory acts as a cache for disk

Security is difficult to enforce

Address translation adds complexity, but with numerous benefits

## 14 Translation



Both real and virtual memory is addressed in pages.

Processor generates virtual addresses, memory addressed are translated into physical address – which may be in memory or on disk



Address in page same physical and virtual. In this case  $2^{12}$ 

Size of virtual memory is not the same as physical.

VAX 11/780 had typically 1-4Mbyte of physical memory, but 4.3Gbyte virtual. Reading a large amount of data into "memory" (say a large array) and then process it without data read.



| <b>15</b> Page Table* | Finding the page                                                                                                                                                                       |  |
|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
|                       | Page fault is very expensive, transferring data<br>from disk is thousands of times slower than even<br>from main memory – fractions of a percent<br>reductions in rate are worthwhile. |  |
|                       | Complex algorithms can be used to track page usage.                                                                                                                                    |  |
|                       | we don't want conflict misses<br>so Fully associative placement.                                                                                                                       |  |
|                       | But search to find which frame has the required page is too time consuming                                                                                                             |  |
|                       | 4Gbytes memory – which is not large<br>4Kbyte pages – which is standard                                                                                                                |  |
|                       | 1 million comparisons – even 1 comparison per<br>clock operation gives ¼ millisecond for a location<br>which is <b>in</b> memory                                                       |  |
|                       |                                                                                                                                                                                        |  |





## **18** Replacement | **Replacing pages**

LRU is the algorithm of choice. Full LRU means structure update on each access

Approximate LRU: reference or use bit set on access. But cleared periodically. Page with bit=0 can be replaced.

## Writes

Write through is impractical – as is write buffer. For disks access time > transfer time.

Copy pages not items (spatial locality). Update page in memory – set **dirty bit** 

If page is replaced then only write out before replacing.

If possible choose a page without the dirty bit set.

| 19 Page Table | Problems with the page table                                                                                                                                              |
|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|               | <ul> <li>32 virtual address – 4KByte pages ⇒ 2<sup>20</sup> entries</li> <li>4bytes per entry</li> <li>4 Mbyte – page table.</li> <li>1 page table per process</li> </ul> |
|               | Currently my machine has 77 processes. – 300 Mbytes<br>1 Gbyte – memory 30% on the page table.                                                                            |
|               | Solution                                                                                                                                                                  |
|               | Page the page tables.                                                                                                                                                     |
|               | Tables sit in the OS virtual space (not process)                                                                                                                          |
|               | A page in memory must have its part of the page table<br>in memory.                                                                                                       |
|               | There are parts of memory which are non-paged.<br>Data in these areas cannot be swapped out or<br>overwritten.                                                            |
|               | Some OS pages are in non-paged memory – don't<br>want to have to wait for some parts of the OS                                                                            |
|               |                                                                                                                                                                           |
|               |                                                                                                                                                                           |

#### **20** Translation

#### Flow diagram for address translation



This is very good, but it has caused a problem.

When a physical address is required, it needs one memory access to get the instruction/data.

With a virtual address you need to access the page table to get the physical address and then access the physical address.

The memory access time has just doubled. To get round this problem a special cache is provided which holds part of the page table.

## Translation lookaside buffer TLB



## **Replacing pages**

Address translation is likely to be used again very shortly; both temporal locality and spatial locality.

Keep recently used translations in a special cache **Translation-lookaside buffer** 

dirty bit, valid bit must be available here.



#### **22** Operation

## Flow diagram for address translation



A diagram of what happens when the cpu wants to translate a logical address into a physical address.

First check the TLB.

Then need to access the page pointed to by the TLB.

**23** TLB

## **TLB Miss**

TLB Hit – pass address and set dirty bit for write TLB miss:

page in memory but not in TLB. Load address into TLB, copy dirty bit back to the page table for the replaced page into and restart

page nor in memory. Throw an exception, handle in software or hardware.

Typical values Miss rate 0.01-1%

TLB size 16-512 entries Block size 1-2 page table entries Hit time 0.5-1 clock cycles Miss penalty 10-100 cycles

Storage and replacement strategy has to be defined for the TLB.

| 24 Page Table | Problems with the page table                                                                                                                                                                       |                 |
|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|
|               | 32 virtual address – 4KByte pages $\Rightarrow 2^{20}$ entries<br>4bytes per entry                                                                                                                 |                 |
|               | 4 Mbyte – page table.<br>1 page table per process                                                                                                                                                  | Many techniques |
|               | Currently my machine has 77 processes. – 300 Mbytes<br>1 Gbyte – memory 30% on the page table.                                                                                                     | implications    |
|               | <b>Solutions</b><br>Limit register restricts the size of the table. Grow page<br>table as required. Addresses can only grow in one<br>direction.                                                   |                 |
|               | Systems want to grow from high memory down and low<br>memory up. So two pages tables and two registers. If<br>memory is filled sparsely page table will grow large.                                |                 |
|               | Make the page table the size of memory – placing the virtual address via a hashing function. <b>Inverted page table.</b>                                                                           |                 |
|               | <b>Multi-Level pages tables</b><br>1 level is at segment (multi-page level) pointing to page<br>table if segment is used. Useful for sparse allocation<br>(matrix problems) – decode more complex. |                 |
|               |                                                                                                                                                                                                    | Virtual Mem     |



| <b>26</b> Review* | Review                                                       |                                                                                                                                                                                                                                                                                         |                                                                             |                                                                        |                                                                                                                 |
|-------------------|--------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
|                   | Placing a b<br>Finding a                                     | olock : associativity: Direct; n-way; f<br>block                                                                                                                                                                                                                                        | ull                                                                         |                                                                        |                                                                                                                 |
| Associativity     |                                                              | Location method                                                                                                                                                                                                                                                                         |                                                                             | Tag                                                                    | comparisons                                                                                                     |
| Direct mapped     |                                                              | Index                                                                                                                                                                                                                                                                                   | 1                                                                           |                                                                        |                                                                                                                 |
| n-way se          | t associative                                                | Set index, then search entries within the set                                                                                                                                                                                                                                           |                                                                             | t n                                                                    |                                                                                                                 |
| Fully ass         | ociative                                                     | Search all entries                                                                                                                                                                                                                                                                      |                                                                             | #ent                                                                   | ries                                                                                                            |
| Full look         | up table                                                     | 0                                                                                                                                                                                                                                                                                       |                                                                             |                                                                        |                                                                                                                 |
|                   | Replacing<br>Writing to<br>Misses: C<br>C<br>C<br>Miss ameli | a block: LRU - random<br>a block: write through: simpler need<br>write back<br>Compulsory; increase block size<br>Capacity misses; increase size –<br>Conflict miss: increase assoc -<br>collision)<br>ioration<br>equested word first<br>it under miss and miss under miss<br>orefetch | ds buffe<br>Inc<br>Ir<br>Ir<br>Large<br>slow:te<br>Cache<br>fast m<br>Depen | er<br>creas<br>icrea<br>icrea<br>icrea<br>men<br>echr.<br>echr.<br>emo | se miss penalty<br>ase access time<br>ase access time<br>nories are<br>nology and size<br>parently large<br>ory |

Depends on locality