LMP1: I/O and Filesystems
=========================
Welcome to LMP1, the first long MP. LMP1 is the first stage of a project aimed
at creating a simple yet functional networked filesystem. In this MP, you will
learn about and use POSIX file system calls, while subsequent LMPs will
introduce memory management, messaging, and networking functionality. If you
implement all parts of this MP correctly, you will be able to reuse your code
for future MPs.
This first LMP concentrates on the file I/O portion of the project.
Specifically, you will implement a custom filesystem and test its performance
using a filesystem benchmark. A benchmark is an application used to test the
performance of some aspect of the system. We will be using Bonnie, a real
filesystem benchmark, to test various performance aspects of the filesystem we
implement.
LMP1 consists of four steps:
1. Read the code; run the Bonnie benchmark and the LMP1 test suite.
2. Implement Test Suite 1 functionality, encompassing basic file I/O operations.
3. Implement Test Suite 2-4 functionality (directory operations, file
creation/deletion, and recursive checksumming).
4. Modify Bonnie to use your client-server file I/O methods.
Code structure
————–
The code for this project is structured according to the client-server
model. The client code (filesystem benchmark) will interact with the
server (filesystem) only through interface functions defined in
fileio.h:
int file_read(char *path, int offset, void *buffer, size_t bufbytes);
int file_info(char *path, void *buffer, size_t bufbytes);
int file_write(char *path, int offset, void *buffer, size_t bufbytes);
int file_create(char *path,char *pattern, int repeatcount);
int file_remove(char *path);
int dir_create(char *path);
int dir_list(char *path,void *buffer, size_t bufbytes);
int file_checksum(char *path);
int dir_checksum(char *path);
These functions represent a simple interface to our filesystem. In Steps 2 and
3 of this MP, you will write the code for functions implementing this interface,
replacing the stub code in fileio.c. In Step 4, you will modify a Bonnie method
to use this interface, rather than calling the normal POSIX I/O functions
directly. The purpose of Step 4 is to help test our implementation.
Step 1: Understanding the code
——————————
1. Compile the project, execute Bonnie and the test framework.
Note: you may need to add execute permissions to the .sh files using
the command “chmod +x *.sh”.
Try the following:
make
./lmp1
(this runs the Bonnie benchmark – it may take a little while)
./lmp1 -test suite1
(run Test Suite 1 – this has to work for stage1)
make test
(run all tests – this has to work for stage2)
2. Read through the provided .c and .h files and understand how this
project is organized:
bonnie.c – a version of the filesystem benchmark
fileio.c – file I/O functions to be implemented
fileio.h – declaration of file I/O functions
restart.c – restart library (available for use in fileio.c)
restart.h – declaration of restart library functions
util.c – useful utility functions
util.h – declaration of useful utility functions and macros
In particular, pay close attention to the comments in fileio.h and
bonnieb.c. You should understand what each of the following functions
in bonnie.c does before undertaking the remainder of the MP:
fill_file_char()
file_read_rewrite()
file_read_rewrite_block()
fill_file_block()
fill_read_getc()
file_read_chunk()
newfile()
Step 2: Basic I/O operations
—————————-
Implement file_read, file_write and file_info operations in fileio.c.
If done correctly, your code should pass all suite1 tests:
./lmp1 -test suite1
Running tests…
1.read ::pass
2.info ::pass
3.write ::pass
Test Results:3 tests,3 passed,0 failed.
IMPORTANT: fileio.c is the only file you should modify for this step.
Step 3: More filesystem operations
———————————-
Implement file and directory operations for suite2 (dir_create and
dir_list), suite3 (file_create and file_remove), and suite4
(file_checksum and dir_checksum).
You can test each operation and test suite individually:
./lmp1 -test dirlist
./lmp1 -test suite2
All tests should now pass:
make test
Running tests…
1.read ::pass
2.info ::pass
3.write ::pass
4.dirlist ::pass
5.dircreate ::pass
6.remove ::pass
7.create ::pass
8.filechecksum ::pass
9.dirchecksum ::pass
Test Results:9 tests,9 passed,0 failed.
Step 4: Performance testing
—————————
In this step, we will change parts of Bonnie to use our filesystem
interface.
Make the function file_read_rewrite_block() in bonnie.c to call your
fileio.c functions instead of POSIX I/O operations. When answering the
questions below, use this modified version of bonnie.c.
Before making this change, it’s a good idea to write pseudocode comments
for what each part of file_read_rewrite_block() does, so that you
understand the code and can perform the change correctly. There may
not be an exact one-to-one correspondence between our filesystem
interface and the POSIX commands.
Note: In future LMPs, we will be using the fileio.h interface in a
similar manner, but we will call the functions remotely, via a message
queue.
Questions
———
Q1. Briefly explain what the following code from bonnie.c does:
if ((words = read(fd, (char *) buf, Chunk)) == -1) …
Q2. Is the above an example of a block read or a character read? What
is the value of the variable ‘words’ if the read succeeds? Fails?
Q3. Explain the meaning of the flag value (O_CREAT | O_WRONLY |
O_APPEND) for the POSIX function open().
Q4. Run Bonnie. What is being measured by each test function?
Q5. Look at the summary results from the Bonnie run in Q4. Does Bonnie
measure latency, throughput or something else? Justify your answer.
Q6. Compare character reads with block reads using Bonnie. Which is
faster? Why do you think this is the case?
Q7. Copy and paste the performance measures output when running Bonnie
benchmarks in a local directory and again in an NFS-mounted directory.
Is one kind of disk access noticeably slower over the network, or are
all tests significantly slower?
Your home directory may be an NFS mount, whereas /tmp and /scratch are local
disks. To test your code in /tmp, do the following:
mkdir /tmp/your_username
cp lmp1 /tmp/your_username
cd /tmp/your_username
./lmp1
(record the output)
cd
rm -fr /tmp/your_username
Q8. How does Bonnie handle incomplete reads, e.g., due to interruptions
from signals? Justify why Bonnie’s approach is good or bad for a
filesystem benchmark program.
Q9. By now you should be very familiar with the self-evaluation test
harness we provide for the MPs. Examine the function test_file_read()
in lmp1_tests.c, which tests your file_read() function from Step 2.
What does this test check for, specifically? You may want to copy and
paste the code for this function in your answer, and annotate each
quit_if or group of related quit_ifs with a comment.
LMP2: Memory Management
=======================
This machine problem will focus on memory. You will implement your own
version of malloc() and free(), using a variety of allocation strategies.
You will be implementing a memory manager for a block of memory. You will
implement routines for allocating and deallocating memory, and keeping track of
what memory is in use. You will implement four strategies for selecting in
which block to place a new requested memory black:
1) First-fit: select the first suitable block with smallest address.
2) Best-fit: select the smallest suitable block.
3) Worst-fit: select the largest suitable block.
4) Next-fit: select the first suitable block after
the last block allocated (with wraparound
from end to beginning).
Here, “suitable” means “free, and large enough to fit the new data”.
Here are the functions you will need to implement:
initmem():
Initialize memory structures.
mymalloc():
Like malloc(), this allocates a new block of memory.
myfree():
Like free(), this deallocates a block of memory.
mem_holes():
How many free blocks are in memory?
mem_allocated():
How much memory is currently allocated?
mem_free():
How much memory is NOT allocated?
mem_largest_free():
How large is the largest free block?
mem_small_free():
How many small unallocated blocks are currently in memory?
mem_is_alloc():
Is a particular byte allocated or not?
We have given you a structure to use to implement these functions. It is a
doubly-linked list of blocks in memory (both allocated and free blocks). Every
malloc and free can create new blocks, or combine existing blocks. You may
modify this structure, or even use a different one entirely. However, do not
change function prototypes or files other than mymem.c.
IMPORTANT NOTE: Regardless of how you implement memory management, make sure
that there are no adjacent free blocks. Any such blocks should be merged into
one large block.
We have also given you a few functions to help you monitor what happens when you
call your functions. Most important is the try_mymem() function. If you run
your code with “mem -try <args>”, it will call this function, which you can use
to demonstrate the effects of your memory operations. These functions have no
effect on test code, so use them to your advantage.
Running your code:
After running “make”, run
1) “mem” to see the available tests and strategies.
2) “mem -test <test> <strategy>” to test your code with our tests.
3) “mem -try <args>” to run your code with your own tests
(the try_mymem function).
You can also use “make test” and “make stage1-test” for testing. “make
stage1-test” only runs the tests relevant to stage 1.
As in previous MPs, running “mem -test -f0 …” will allow tests to run even
after previous tests have failed. Similarly, using “all” for a test or strategy
name runs all of the tests or strategies. Note that if “all” is selected as the
strategy, the 4 tests are shown as one.
One of the tests, “stress”, runs an assortment of randomized tests on each
strategy. The results of the tests are placed in “tests.out” . You may want to
view this file to see the relative performance of each strategy.
Stage 1
——-
Implement all the above functions, for the first-fit strategy. Use “mem -test
all first” to test your implementation.
Stage 2
——-
A) Implement the other three strategies: worst-fit, best-fit, and next-fit. The
strategy is passed to initmem(), and stored in the global variable “myStrategy”.
Some of your functions will need to check this variable to implement the
correct strategy.
You can test your code with “mem -test all worst”, etc., or test all 4 together
with “mem -test all all”. The latter command does not test the strategies
separately; your code passes the test only if all four strategies pass.
Questions
=========
1) Why is it so important that adjacent free blocks not be left as such? What
would happen if they were permitted?
2) Which function(s) need to be concerned about adjacent free blocks?
3) Name one advantage of each strategy.
4) Run the stress test on all strategies, and look at the results (tests.out).
What is the significance of “Average largest free block”? Which strategy
generally has the best performance in this metric? Why do you think this is?
5) In the stress test results (see Question 4), what is the significance of
“Average number of small blocks”? Which strategy generally has the best
performance in this metric? Why do you think this is?
6) Eventually, the many mallocs and frees produces many small blocks scattered
across the memory pool. There may be enough space to allocate a new block, but
not in one place. It is possible to compact the memory, so all the free blocks
are moved to one large free block. How would you implement this in the system
you have built?
7) If you did implement memory compaction, what changes would you need to make
in how such a system is invoked (i.e. from a user’s perspective)?
8) How would you use the system you have built to implement realloc? (Brief
explanation; no code)
9) Which function(s) need to know which strategy is being used? Briefly explain
why this/these and not others.
10) Give one advantage of implementing memory management using a linked list
over a bit array, where every bit tells whether its corresponding byte is
allocated.