CMSC 412 Project 5
GOSFS Filesystem
Part a: due Sunday, December 2, 6:00pm
Part b: due Tuesday, December 11, 6:00pm
1. Introduction
GeekOS currently has a filesystem of type PFAT
residing in IDE disk 0 (disk image diskc.img).
The goal of this project is to develop a new filesystem type,
called GOSFS,
and to create an instance of it in IDE disk 1 (disk image diskd.img).
You can continue to use the existing PFAT filesystem
to load user programs to test your GOSFS filesystem implementation.
Here, we use the term filesystem type to mean a set of rules
for laying out directories and files in a disk
and for adding, modifying and deleting them.
We use the term filesystem to mean an instance
of a filesystem type.
For example,
PFAT is a filesystem type
(whose rules are implemented in pfat.c);
diskc.img is a PFAT filesystem.
GOSFS is another filesystem type
(whose rules you will implement in gosfs.c);
your diskd.img will be a GOSFS filesystem.
Note
-
Part a:
No need to implement single-indirect or double-indirect blocks
(i.e., assume that file size is less than 8*4KB).
Implement everything else.
-
Part b:
Implement single-indirect and double-indirect blocks.
-
Discussion slides (OLD)
-
Text
(depending on your edition,
chapter numbers may differ):
File-system interface (chapter 11),
File-system implementation (chapter 12).
2. Context: VFS and blockdev
GeekOS has a virtual filesystem (VFS) onto which
PFAT and GOSFS (and other types of) filesystems are "mounted".
VFS acts as a wrapper to these mounted filesystems,
allowing them to be accessed by users in a uniform manner.
Initially VFS consists of just an empty root directory ("/").
At the end of OS initializing,
the PFAT filesystem in IDE 0 is mounted onto VFS at position
"/c",
after which user programs can access the PFAT filesystem
as the VFS subdirectory "/c".
Similarly,
your GOSFS filesystem can be mounted at another point
(say "/d") and accessed.
GeekOS also has a virtual block device into which
any block-structured storage device (e.g., IDE, floppy) can be
registered.
The block device acts as a wrapper to these registered storage devices,
allowing them to be accessed in a uniform manner.
The users in this case are other parts of the kernel
(filesystems, paging, etc.).
The figure below illustrates the context.
User programs invoke system calls,
which invoke functions in vfs.c,
which invoke functions in pfat.c or gosfs.c,
which invoke functions in blockdev.c,
which in turn invoke functions in ide.c.
You have to implement the functions in in gosfs.c.
The others are already implemented.
You will need to understand how vfs.c and pfat.c work
and how to use blockdev.c.
The pfat functions that are called by vfs have names
of the form PFAT_<function>
(see pfat.c).
Similarly, the gosfs functions that are called by vfs
have names of the form GOSFS_<function>
(see gosfs.c).
Note that vfs does not call these functions by name.
Rather vfs gets pointers to these functions at run time
(when a filesystem type is registered,
when a filesystem is mounted, etc.).
There are also pfat and gosfs functions that call vfs functions,
for example,
to register a filesystem type (Register_Filesystem),
to allocate a File struct (Allocate_File).
3. GOSFS
GOSFS provides a much richer interface than PFAT.
The directories of a GOSFS filesystem form a tree.
A directory can hold up to 36 entries (files/directories).
A directory entry (file or directory) has a name
of up to 64 characters (bytes),
including the null at the end.
A full path to a file is at most 1024 characters.
Directories and files can be created and deleted.
Existing files can opened for reading and writing.
A raw disk can be formatted to hold a GOSFS filesystem.
(In contrast, PFAT is read-only, has only one directory,
its file names are at most 11 characters,
and there is no format capability.)
Internally, GOSFS uses a block size of 4KB (just like PFAT).
Recall that an IDE disk in GeekOS has a block size of 512 bytes.
Thus in an IDE disk with a GOSFS filesystem,
each gosfs block is stored in 8 successive disk blocks.
Gosfs block 0 (stored in disk blocks 0-7)
is the so-called superblock;
it holds info about this particular filesystem
(disk size, free blocks, root directory's location, etc.).
Gosfs blocks 1 and higher contain files and directories or are free.
You should keep track of free gosfs blocks using a bitvector.
A library called bitset is provided
(see bitset.h and bitset.c) that manages a bitvector
and find bits that are 0 (i.e. corresponding to free gosfs blocks).
Note that one bit in the bitvector corresponds to a gosfs (4KB) block
(i.e., 8 disk blocks).
So a bitvector of is 8192 bits (1024 bytes) can keep track
of a disk of size 8192 * 4KB = 32 MB.
To read or write to a file on disk,
the file has to be first opened
(see Open or GOSFS_Open below).
This creates a File struct containing
a cache of (some blocks of) the file,
the current read/write position in the file,
etc.
A library called bufcache is provided
that does the cacheing.
You can make use of it if you want.
3.1. GOSFS functions
Here is an informal look at some gosfs functions.
See gosfs.c for a more complete list of functions.
Except for Init_GOSFS, all the functions below are called by vfs.
(As mentioned above, these gosfs functions may call vfs functions.)
-
Init_GOSFS():
Register the GOSFS filesystem type with vfs.
(Sets pointers in vfs to some GOSFS_ functions.)
-
GOSFS_Format(...):
Format a block device with an empty GOSFS filesystem,
i.e., creates superblock and empty root directory.
The call identifies the block device
(which should have been previously registered with blockdev).
-
GOSFS_Mount(...):
Mount a GOSFS filesystem.
The call identifies a block device
(expected to have a GOSFS filesystem)
and the place in vfs where it is to be mounted.
The return contains info of that filesystem
and pointers to some GOSFS_ functions).
-
GOSFS_Open(...):
Returns a File struct corresponding
to a file in a mounted GOSFS filesystem.
The call identifies the mounted filesystem
and the path to the file in the filesystem.
The file is now open:
it can be read and written.
This call is also used to create a file.
-
GOSFS_Close(...):
Close an open file.
-
GOSFS_Read(...):
Read a specified part of an open file.
-
GOSFS_Write(...):
Write a specified part of an open file.
-
GOSFS_Create_Directory(...):
Create a directory entry.
-
-
3.2. User process file descriptors
Each user process will have a file descriptor table that keeps track
of the files that the process currently has open.
Every user process should be able to have up to 10 files open at once.
The file descriptors for a user process are kept in the
files[MAX_OPEN_FILES] array in struct User_Context.
If a process has less than 10 files open
then some of the entries in the table are free
(field openFile.fsType equal to FS_TYPE_NONE).
The file descriptor management is already implemented
(see function Open() in vfs.c).
3.3. Filenodes and Directory structure
The internal structure of a GOSFS directory is defined in gosfs.h.
Each directory in GOSFS takes up a single gosfs block.
It is an array of 36 GOSFS filenodes (GOSFSfileNode),
one for every possible entry (file or subdirectory).
(The limit of 36 ensures that the array fits in a single 4KB block.)
A filenode has the following fields:
name of the entry;
size of the entry's data;
whether the entry is active;
whether the entry is a directory;
and an array of pointers to gosfs blocks containing the entry's data.
(There are also some fields (isSetUid, acls) not relevant for project 5.)
The array of pointers is called blocks.
It is used as follows.
-
For a filenode of a directory,
only block[0] is used
(because the data of the directory fits in one gosfs block).
-
For a filenode of a file,
the number of blocks entries used depends on the size of the file.
A gosfs block that holds file data is referred to as a direct block.
You will use indexed allocation to track
the direct blocks of the file.
-
Entries 0 through 7 point to eight direct blocks,
holding the first 8*4KB of data of the file.
-
Entry 8 points to a so-called single-indirect block,
which is a gosfs block containing pointers to direct blocks.
-
Entry 9 points to a so-called double-indirect block,
which is a gosfs block containing pointers to single-indirect blocks.
The superblock and root directory have no associated GOSFSfileNode.
Every other directory and every file has an associated GOSFSfileNode.
3.4. Disk Layout
A guideline is provided above.
Gosfs block 0 is called SUPERBLOCK,
and contains filesystem housekeeping data.
Gosfs blocks ≥ 1 contain files and directories or are empty.
-
The Magic number at the very beginning should be 0xbeebee,
indicating that the disk contains a GOSFS filesystem.
If you try to mount a drive and you don't find the magic number,
return error.
-
Root Dir Pointer holds the number of the gosfs block
containing the root directory.
-
Size is the size of the disk in 4KB blocks.
(32M / 4K = 8K for the example above)
-
Free Blocks Bitmap is Size bits large,
that is, Size/8 bytes large.
(8K / 8 = 1K for the example above).
Every gosfs block has an associated bit.
When you do a Format(), you make a raw disk usable with GOSFS. That is:
-
Get the drive's Size; convert it to number of gosfs blocks.
IDE_getNumBlocks() in ide.c tells you the number of disk blocks.
-
Figure out Free Blocks Bitmap size; mark them all free.
-
Create a gosfs block containing a valid but empty directory.
That will be the root directory. Make Root Dir Pointer point to it.
-
Mark the superblock and the root directory block as used
in the Free Blocks Bitmap.
-
If everything went OK, write the Magic.
The disk is now ready to be mounted and used.
Keep in mind that the superblock and root directory
have no associated GOSFSfileNode.
4. VFS System Calls
Here are the system calls concerning the virtual filesystem (VFS).
As you see,
the semantics of these calls is very similar to that in UNIX.
Keep in mind that each call vectors through the VFS layer
before invoking a corresponding GOSFS_ or PFAT_ function.
You have to implement the GOSFS_ functions.
The PFAT_ functions are already implemented;
look in pfat.c for a complete implementation of a filesystem.
int Mount(char *dev, char *prefix, char *fstype)
:
-
System call: SYS_MOUNT
-
Return 0 on success.
-
Return a negative value on failure:
-
a filesystem is already mounted under prefix.
-
a parameter has an illegal value.
-
filesystem settings (on device) has invalid magic or version fields
or block size is not a multiple of 512 bytes.
(No need to check for other things,
e.g., the number and start location,
the total number of blocks, can be arbitrary.)
|
int Open(char *path, int permissions)
:
-
System call: SYS_OPEN
-
Return new file descriptor number on success.
-
Return a negative value on failure:
-
path excluding final name does not exist.
-
path does not exist,
path excluding final name exists,
permissions does not include O_CREATE.
-
O_WRITE and O_CREATE not allowed for directories;
use CreateDirectory instead.
-
Comments
|
int Open_Directory(char *path)
:
-
System call: SYS_OPEN_DIRECTORY
-
Return new file descriptor number on success.
-
Return a negative value on failure:
-
path does not exist
-
does not end in a directory.
|
int Close(int fd)
:
-
System call: SYS_CLOSE
-
Return 0 on success.
-
Return a negative value on failure:
-
fd is not within 0-9.
-
fd is not an open file.
|
int Delete(char *path)
:
-
System call: SYS_DELETE
-
Return 0 on success.
-
Return a negative value on failure:
-
path does not exist.
-
path ends in non-empty directory.
-
Comments
-
If Delete is called and the file is still open in other threads
or even in the thread that called Delete,
all subsequent operations on that file except Close() should fail.
|
int Read(int fd, char *buffer, int length)
:
-
System call: SYS_READ
-
If success and fd is an open file:
-
length is the number of bytes to read.
-
return number of bytes read.
It can be less than length
(e.g., reading close to end of file).
-
increase filePos appropriately.
-
If success and fd is an open directory:
length is the number of dirEntries to read.
-
return number of dirEntries read.
It can be less than length.
-
increase filePos appropriately.
-
data put into buffer should be formatted
as an array of dirEntry structs.
(dirEntry is defined in fileio.h.)
Return a negative value on failure:
-
fd not within 0-9.
-
fd is not an open file or directory.
-
fd was not open with O_READ flag.
|
int Read_Entry(int fd, struct VFS_Dir_Entry *dirEntry)
:
-
System call: SYS_READ_ENTRY
-
Return 1 on success.
-
Return a negative value on failure:
-
fd is not a directory
-
file pointer is at end of directory.
|
int Write(int fd, char *buffer, int length)
:
-
System call: SYS_WRITE
-
length is number of bytes to write.
-
Return number of bytes written and increase filePos
on success.
"Grow on write":
allocate blocks "on the fly" if past end of file.
-
Return a negative value on failure:
-
fd not within 0-9.
-
fd is not an open file.
-
fd was not open with O_WRITE flag.
-
fd is a directory.
|
int Stat(char *path, VFS_File_Stat *stat)
:
-
System call: SYS_STAT
-
Return 0 on success.
-
Return a negative value on failure:
-
file is not found or is not readable.
|
int FStat(int fd, VFS_File_Stat *stat)
:
-
System call: SYS_FSTAT
-
Return 0 on success.
-
Return a negative value on failure:
-
fd not within 0-9 or not an open file.
|
int Seek(int fd, int offset)
:
-
System call: SYS_SEEK
-
offset is an absolute position;
can be equal to fileSize.
-
Return 0 on success.
-
Return a negative value on failure:
-
fd not within 0-9 or not an open file or offset > fileSize.
|
int CreateDirectory(char *path)
:
-
System call: SYS_CREATEDIR
-
Return 0 on success.
-
Create directories recursively if needed,
e.g. CreateDirectory("/d/d1/d2/d3/d4") creates d1 inside of d,
d2 inside of d1, etc., if they don't exist already.
This operation should be atomic:
either the whole directory chain is created
or no directory is created.
-
Return a negative value on failure:
-
path already exists (ending in file or directory).
-
regular file encountered on the path to the final name.
|
int Format(int device, char *filesystem_type)
:
-
System call: SYS_FORMAT
-
Return 0 on success.
-
Need only handle GOSFS (i.e., filesystem_type "gosfs").
-
Return a negative value on failure:
-
illegal value for device
(it must work with IDE 1, higher is optional)
-
device is in use, i.e. mounted.
-
Comments:
Don't need to format in init code;
so you can save your data between sessions.
|
5. Notes
You do not need to consider situations where two processes have the same file open. You do not need to consider situations where one process opens the same file twice without closing it in between.
Too allow you to cache information, the VFS layer includes a Sync function. When the Sync function is called, all changed state needs to be saved to disk (i.e. the machine can be rebooted after it). You may choose to make all operations synchronous, in that case sync will be a no-op.
If a read() is called on a directory, the data returned should be in the form of an array of dirEntry structures. The length argument and the return value will indicate the number of entries to read and the number of entries that were read, rather than the number of bytes.
Make sure your Mount() works well, so that we can test your project.
If we cannot Mount() a GOSFS filesystem, we cannot grade your project.
You might also want to mount "/d" automatically in Main()
to speed up your testing,
but the code you submit should not mount "/d" automatically.
However, "/c" should be mounted automatically in Main(), as usual.
You should support disk sizes of at least 32 MB.
More than 32 MB is optional.
Following the procedure described in the
"How to create an arbitrary size big diskd.img" section above,
in your submitted project,
when someone types gmake, a 32 MB file should be created.
You should support file sizes of at least 5 MB
(double-indirect threshold crossed, yes).
More than 5 MB is optional.
6. Testing
As you saw at the top,
in src/user there are some programs that can be used to test
your file management syscalls: cp,c, ls.c, mkdir.c, mount.c, p5test.c.