Salsa - File system

Topics

Nachos

Networking

[ Source code in code/filesys | Quiz ]

A file system provides the mechanism for on-line storage of and access to information belonging to the OS and its users. Here, we first discuss the file system in very general terms and then explain how it is implemented in Nachos. Note that this is not an exhaustive discussion of a file system; it is rather an overview providing information sufficient for you to understand the Nachos file system.

The logical storage unit for information is files. These are mapped by the OS into physical devices. In general, they are treated as a sequence of bits, bytes, lines or records - the meaning is defined by the file's creator. Information about all the files is kept in a directory structure that resides on secondary storage along with the files.

The interface that the OS provides for the file system comprises, among other things, of operations to manipulate files, e.g., creating, opening, writing or truncating a file. Many of these file operations involving searching the directory structure for the entry associated with the file. Since searching the file entry for each operation performed on the file can be expensive, the OS maintains a small table containing information about all open files (the open-file table). When a file operation is requested, an index into this file table is used to get the appropriate entry. When the file is no longer being used, it is closed by the process and the operating system removes its entry from the open file table.

Each process also maintains a table of all the files that it has opened. This stores information about how the file(s) are being used, e.g., the current location of the file pointer. Each entry in this table points to the corresponding entry in the open-file table.

Organization of Files

To manage the data or information that the file system manages, it is organized in different levels. First, the file system is broken into partitions - typically, a disk on a system contains at least one partition, which is a low level structure in which files and directories reside. They may be used to provide separate areas within one disk, each being treated as a separate storage device, or partitions may be larger than a disk in order to allow grouping of disks in one logical structure. Each partition contains information about files it contains. This information is kept in a device directory which stores information such as name, location, size and type, for all files on that partition.

The directories themselves may be organized in several ways, the simplest being the single-level directory. All files are contained in the same directory. The two-level directory structure allows each user to have their own directory. The tree-structured directory is a generalization of the two-level structure, containing multiple levels.

File System Implementation

File Implementation The most important issue in file storage is keeping track of which disk blocks go with which file. Different operating systems use different methods - contiguous allocation and linked list allocation are important to know. In the former, each file is stored as a contiguous block of data on the disk, in the latter, the file is kept as a linked list of disk blocks - the first word of each block is used as a pointer to the next one. UNIX uses i-nodes to keep track of which blocks belong to each file. An i-node is a table that lists the attributes and disk addresses of the file's blocks. The first few disk addresses are stored in the i-node itself, so for small files, all the necessary information is in the i-node itself which is fetched from disk to main memory when the file is opened. For larger files, one of the addresses in the i-node is the address of a disk block called a single indirect block which contains additional disk addresses. If this is insufficient, another address called the double indirect block may contain the address of a block that contains a list of single indirect blocks.

Directory Implementation The simplest method is to use a linear list of file names with pointers to the data blocks. This requires a linear search to find a particular entry. Hash tables are also used by some operating systems - a linear list stores the directory entries but a hash function based on some computation from the file name returns a pointer to the file name in the list. Thus, directory search time is greatly reduced.

In UNIX, each entry in the directory contains just a file name and its i-node number. When a file is opened, the file system takes the file name and locates its disk blocks. The i-node is read into memory and kept there until the file is closed.

Implementation in Nachos

The following figure shows how the Nachos file system is implemented. There are five classes that describe it. All the classes are defined and implemented in code/filesys. The FileSystem class is defined in the file filesys.h(cc) and contains routines to manage the file system. Each file in the file system has a file header (equivalent to an i-node in UNIX), a number of data blocks and an entry in the file system directory. The FileSystem class contains routines to create, open and delete a file. The BitMap class (bitmap.h(cc)) in code/userprog) is used to keep track of free disk sectors. There is also a directory that stores all the file names and their file headers.

The FileHeader class (filehdr.h) is used to manage the file header (similar to an i-node in UNIX). The file header itself is used to locate where on disk the file's data is stored. The DirectoryEntry class represents a file in the directory. It contains the name of the file and where the file's header is located on disk. The Directory class defines a directory and some operations that can be performed on it, e.g., adding or removing an entry from the directory. The OpenFile class defines data structures and some routines to perform operations on open files, e.g., read, write, seek etc.

For implementing file system calls, you can use the routines defined in class FileSystem for opening and creating files. These will return a pointer to an OpenFile object. For operating on that object, e.g., reading or writing to it, you can use the routines defined in class OpenFile.

One important thing to note in the source code is that the Nachos software comes with two versions of the file system - there is a stub version and a real version. The former just redefines the Nachos file system operations on the UNIX file system on the machine running the Nachos simulation. The reason for providing this is that the latter implementation does not provide all the functionality that a real file system does. For example, there is no synchronization for concurrent accesses to files; there is a restriction on the file size; and only a limited number of files can be added to the system. Fixing all these problems is part of a lab; this course however, does not require the lab. Since the lab on system calls and multiprogramming does require the use of a full-fledged file system, the stub version provided is used by those labs. The stub version is defined in filesys.h and openfile.h.

Source code in code/filesys
Quiz

Last modified on Thursday, 29-May-97 17:03:35 EDT