Salsa - Multiprogramming in Nachos

Topics

Nachos

Virtual Memory in Nachos

To understand how multiple processes execute in a system, think about how many students can work together by logging on to the same machine. It is the responsibility of the OS to coordinate their execution.

[ Source code in code/userprog| Lab| Quiz ]

Stock Nachos assumes only a single user program exists at any given time. Thus, mainMemory contains instructions for only one user program starting at location mainMemory . Thus, there is a one to one mapping between virtual and physical addresses - virtual address N maps to physical address N.

For the multiprogramming lab , you will need to add multiprogramming support to Nachos. This raises several new issues. First, now that we can have more than one process in memory, not all of them will begin from location mainMemory . Thus, we need a way to allocate memory to processes. Second, we need to coordinate access to the console device - only one process should be able to write to the console at any time. Third, understanding and implementing Exec and Join system calls becomes necessary.

Memory management

The run time mapping from virtual to physical addresses is done by the memory-management unit (MMU) in operating systems, which is a hardware device. A number of different schemes are used - Contiguous allocation, Paging, Segmentation, and Segmentation with paging. Here, we first describe contiguous allocation and paging and then discuss how to implement these in Nachos.

Contiguous allocation

This is one of the simplest memory allocation schemes. In this scheme, the instructions for the process are stored contiguously - in one block. Memory is divided into partitions, each containing exactly one process. Initially, all memory is available for user processes, and is considered as one large block of available memory. The operating system keeps a table indicating which parts of memory are available and which are occupied. When a process arrives and needs memory, the OS searches for a hole large enough for the process. If such a hole is available, memory is allocated to the process. When a process terminates, it releases its memory, which the operating system may then allocate to another process.

Thus, at any time, there is a set of holes, of various sizes, scattered in memory. When a process arrives, the operating system has to choose which hole to allocate to the process. Three strategies are commonly used for this purpose:

First-fit Allocate the first hole that is big enough. Search for a hole stops as soon as one big enough for the process is found.
Best-fit Allocate the smallest hole that is big enough. The whole set of holes needs to be examined. This strategy produces the smallest leftover hole.
Worst-fit Allocate the largest hole. Again, the whole set of holes need to be searched. This strategy produces the largest leftover hole which may be more useful than the smaller leftover hole from a best-fit approach.

Simulations have shown that first-fit and best-fit are better than worst-fit in terms of decreasing time and increasing storage utilization. All the strategies described above suffer from external fragmentation . This exists when there is enough free memory to allocate to a process but is spread out in non-contiguous holes - storage is fragmented into a number of small holes, enough to load a process but spread out in memory. One solution to this problem is compaction . This would shuffle memory so that all the free blocks would be combined together in one large block.

Another problem that may occur with contiguous allocation is internal fragmentation . If a process is allocated more memory than it needs because there was no hole available that was an exact fit, there is some memory that the process has that it does not require. Internal fragmentation is thus internal to a process.

Paging

Paging permits the physical address space of a process to be non-contiguous, thus allowing a process to be allocated physical memory wherever it is available. Physical memory is divided into fixed-sized blocks called frames . Logical memory is divided into equal-sized blocks called pages. When a process is to be loaded into memory, its pages are loaded into any available memory frames. A page table is maintained for each process. This contains the base address of each page in memory.

Every address generated by the CPU is divided into two parts - the page number and page offset . The page number is used to index into the page table to get the base address. This is combined with the page offset to determine the physical address. Note that with this scheme, there is no external fragmentation because any free frame can be allocated to a process that needs it. We may, however, still have internal fragmentation. If the memory requirements of a process are not exact multiples of the page size, the last frame allocated will not be completely full.

Each operating system has its own method of storing page tables. Most allocate a page table for each process. A pointer to the page table is stored with the other register values in the PCB. When the scheduler starts or switches to a process, it reloads the user registers and the correct page-table values from the page table for the process.

Implementation in Nachos - the AddrSpace class

To implement multiprogramming, you can provide any memory management scheme. For now, all virtual addresses map directly to physical addresses. To implement multiprogramming, you must change the way Nachos loads processes into memory. To see how this is done, see the constructor function of the AddrSpace class in addrspace.cc. Note the two private variables - pageTable and numPages . The first is a page table for the process and the second contains the number of pages the process occupies in memory. Since each process has a unique address space, there is a page table for each process.

    pageTable = new TranslationEntry[numPages];
    for (i = 0; i < numPages; i++) {
        pageTable[i].virtualPage = i; // for now, virtual page # = phys page #
        pageTable[i].physicalPage = i;
        pageTable[i].valid = TRUE;
        pageTable[i].use = FALSE;
        pageTable[i].dirty = FALSE;
        pageTable[i].readOnly = FALSE;  // if the code segment was entirely on 
                                        // a separate page, we could set its 
                                        // pages to be read-only
    }

The above code fragment from the AddrSpace constructor, creates a page table for the process. Note that for each page the process occupies in memory, virtual page number is the same as the physical page. To implement multiprogramming, you must change this. Your memory management scheme must figure out what the physical page number for each virtual page is and fill it in.

To see how the MIPS simulator knows where to locate the instructions for a process, note the private variables pageTable and pageTableSize in the Machine class (machine.h ). When the scheduler switches between two processes, it calls a routine RestoreState(), defined in the AddrSpace class ( addrspace.cc). This routine initializes the pageTable and pageTableSize of the machine class to the page table for the process. Thus, the simulator's data structures are properly initialized to point to those of the process before it starts executing the instructions for that process.