CMSC216 Lab11: File Stats and Timing Strides
- Due: 11:59pm Mon 25-Nov-2024 on Gradescope
- Approximately 1.00% of total grade
CODE DISTRIBUTION: lab11-code.zip
CHANGELOG: Empty
1 Rationale
This lab covers two independent but short concepts in separate problems.
Build systems are essential to program development to automate
repetitive tasks such as compiling code, running tests, and tracking
changes which would necessitate running such tasks. The make
build
system is among the oldest build system and is ubiquitous across Unix
systems making it a useful to know a few basics about it.
The second exercise introduces the use of clock()
on a problem
involving structs
that are arranged differently in memory. The
timing allows one to observe how struct
layout affects performance
which reinforces the central idea of efficient memory access:
sequential access patterns are faster than better than strided or
random access patterns.
Associated Reading / Preparation
Background material and sample codes are provided in this document. Those interested in more detail might consider the following online resources:
- Managing Projects with GNU Make, Third Edition by Robert
Mecklenburg. A now-free book that covers the most common Make system
implementation by the GNU Project. The subtitle "The Power of GNU
Make for Building Anything" resonates with Prof Kauffman as this
document was built by running
make
: it's not just for code. - Links resulting from internet searches like "comparison of build systems" which will contrast Make to other newer alternatives such as CMake, Apache Ant, Gradle, and others that appear in any listing of build automatition software.
- In addition to
clock()
which is among the oldest time-measuring functions in the standard C library, there are many variations available which allow measuring what kind of time is desired.gettimeofday()
which returns Real Time (Wall Time) rather than CPU timeclock_gettime()
which accepts parameters which control which of CPU or Real time is desiredgetrusage()
which provides resource usage in of the running process including CPU time, memory allocated, etc. CPU time is split between the User and System times.
Grading Policy
Credit for this exercise is earned by completing the code/asnwers here
and submitting a Zip of the work to Gradescope. Students are
responsible to check that the results produced locally via make test
are reflected on Gradescope after submitting their completed
Zip. Successful completion earns 1 Engagement Point.
Lab Exercises are open resource/open collaboration and students are encouraged to cooperate on labs. Students may submit work as groups of up to 5 to Gradescope: one person submits then adds the names of their group members to the submission.
See the full policies in the course syllabus.
2 Codepack
The codepack for the HW contains the following files:
File | Description | |
---|---|---|
QUESTIONS.txt |
EDIT | Questions to answer: fill in the multiple choice selections in this file. |
demo-makefile-little |
Study | Problem 1: Example Makefiles to study |
demo-makefile-big |
Study | Problem 1: Example Makefiles to study |
prob1-makefile/Makefile |
EDIT | Problem 1: Template Makefile to complete for problem 1 |
clock_demo.c |
Provided | Problem 2 demo on how the clock() function is used |
struct_stride.c |
EDIT | Problem 2 code used to observe CPU timing differences |
Makefile |
Build | Enables make test and make zip |
QUESTIONS.txt.bk |
Backup | Backup copy of the original file to help revert if needed |
QUESTIONS.md5 |
Testing | Checksum for answers in questions file |
test_quiz_filter |
Testing | Filter to extract answers from Questions file, used in testing |
test_lab11.org |
Testing | Tests for this lab |
testy |
Testing | Test running scripts |
3 Problem 1 Background: Makefiles
Building programs often involves compiling separate source files and
eventually combining them through linking to create an executable.
There are often associated tasks such as running automated tests,
creating distribution files, and generating documentation that may be
supported by the build system as well. Among the oldest build systems
still in use is make
and its associated Makefiles
. This problem
covers the basics of how to create a Makefile
.
Structure of Makefiles
The basic structure of a Makefile
involves rules which relate
targets, dependencies, and commands. Simple Makefiles
comprise a
list of rules.
# comment above Rule 1 target1 : dependency1A depency1B command1A command1B command1C # comment above Rule 2 target2 : dependency2A command2A command2B # comment above Rule 3 target3 : dependency3A dependency3B dependency3B command3A command3B
When invoked, make target3
will load the contents of the Makefile
and examine the relationship between target3
and its
dependencies. If make
determines it is necessary, it may run
command3A, command3B
in sequence to "update" target3
. In other
cases, make
may determine that there is no need to run additional
commands as target3
is already up to date. Similarly, running make
target1
will possibly run command1A, command1B, command1C
.
NOTE: Commands associated with a rule MUST BE INDENTED WITH A TAB. This is a notoriously bad "feature" of make which can occasionally cause problems. Be aware: leading whitespace for commands should be TAB characters.
The most common kind of target is a file that should be created such
as a .o
file or an executable program. The directory
demo-makefile-little
has an example Makefile
showing this:
1: # Simle Makefile to build program main_prog 2: 3: # first rule in the file is the default: typing `make` is the same as 4: # typing `make main_prog` 5: 6: # rule 1 7: main_prog : main_func.o func_01.o 8: gcc -o main_prog main_func.o func_01.o 9: 10: # rule 2 11: main_func.o : main_func.c 12: gcc -c main_func.c 13: 14: # rule 3 15: func_01.o : func_01.c 16: gcc -c func_01.c 17: 18: # rule 4 19: clean : 20: rm -f *.o 21: rm -f main_prog
The rules that appear are
main_prog
depends on themain_func.o
andfunc_01.o
; the program is created by invoking GCC on these filesmain_func.o
depends onmain_func.c
which is created by running GCCfunc_01.o
depends onfunc_01.c
and is created by running GCCclean
is a "phony" target with no dependencies which will remove compile artifacts by using therm
command
When one types make
or make main_prog
in that directory, the
following will appear:
>> cd demo-makefile-little/ >> make gcc -c main_func.c gcc -c func_01.c gcc -o main_prog main_func.o func_01.o
Observe that make
detects that the two dependencies for Rule 1 are
incomplete: main_func.o
and func_01.o
are not present. To create
them, it will generate them as targets according to for the rules that
have them as targets, Rule 2 and Rule 3. After running the commands
for these two rules, the commands for Rule 1 are executed to generate
the program.
Variables and Shortcuts in Makefiles
Makefiles support a large number of features such as explicit
variables, automatic variables, pattern-based rules, and many
others. A few of these are described in the sample file
demo-makefile-little/Makefile-shortcuts
. It accomplishes the same
task as the Makefile
but utilizes some features which are useful to
know about. The comments in this file describe some of these features.
1: # Simle Makefile to build program main_prog 2: 3: # first rule in the file is the default: typing `make` is the same as 4: # typing `make main_prog` 5: 6: # rule 1 7: main_prog : main_func.o func_01.o 8: gcc -o main_prog main_func.o func_01.o 9: 10: # rule 2 11: main_func.o : main_func.c 12: gcc -c main_func.c 13: 14: # rule 3 15: func_01.o : func_01.c 16: gcc -c func_01.c 17: 18: # rule 4 19: clean : 20: rm -f *.o 21: rm -f main_prog
By default, typing make
will search for a file named Makefile
but
the following invocation allows using a specific file other than that:
>> make -f Makefile-shortcuts
Dependency Detection
Makefiles are useful for larger projects to automatically detect
dependencies which must be updated. The example in
demo-makefile-big/
shows a "larger" project with more files. Observe
the following:
>> make # COMPILE 1 gcc -c main_func.c gcc -c func_01.c gcc -c func_02.c gcc -c func_03.c gcc -c func_04.c gcc -c func_05.c gcc -c func_06.c gcc -c func_07.c gcc -c func_08.c gcc -c func_09.c gcc -c func_10.c gcc -c func_11.c gcc -c func_12.c gcc -c func_13.c gcc -c func_14.c gcc -c func_15.c gcc -c func_16.c gcc -c func_17.c gcc -c func_18.c gcc -c func_19.c gcc -c func_20.c gcc -o main_prog main_func.o func_01.o func_02.o func_03.o func_04.o func_05.o func_06.o func_07.o func_08.o func_09.o func_10.o func_11.o func_12.o func_13.o func_14.o func_15.o func_16.o func_17.o func_18.o func_19.o func_20.o >> make # COMPILE 2 make: Nothing to be done for 'all'. >> rm func_12.o # delete a .o >> touch func_05.c # make func_05.c look like it has been edited >> make # COMPILE 3 gcc -c func_05.c # only out of sync or gcc -c func_12.c # missing targets are regenerated gcc -o main_prog main_func.o func_01.o func_02.o func_03.o func_04.o func_05.o func_06.o func_07.o func_08.o func_09.o func_10.o func_11.o func_12.o func_13.o func_14.o func_15.o func_16.o func_17.o func_18.o func_19.o func_20.o
The lesson above is in COMPILE 3
: make detects that most of the .o
files are in sync with their source files. Only the missing / out of
sync targets have their rules applied to regenerated them and update
the exectuable program.
4 Problem 2 Background: The clock()
Function
The code block below illustrates the basic usage pattern for the
clock()
function.
#include <time.h> // for clock() and clock_t { clock_t begin = clock(); // current cpu moment Perform computation that takes a while; clock_t end = clock(); // later cpu moment double cpu_time = // convert into seconds ((double) (end-begin)) / CLOCKS_PER_SEC; printf("Elapsed CPU Time: %f second\n", cpu_time); }
A few caveats are worth pointing out.
- The
clock_t
type is often a large integer type likeunsigned long
which is why one can perform subtraction using it. Don't rely on this being the case and just use the type indicated as shown. clock()
itself returns a number corresponding to the number of CPU "ticks" which have occurred while the program runs. This requires conversion into the number of seconds shown above. It makes use of theCLOCKS_PER_SECOND
constant which is included viatime.h
.- The time computed by this method is equivalent to the
user
time reported by thetime
utility: it is how much CPU time the user program has used. It is relevant to timing computational loops but is complemented by "wall time" which requires use of different timing functions likegettimeofday()
to compute. WARNING: Timing code runs is inherently noisy and will vary from one run to the next.
clock()
is reliable for timing computations that take around 0.001 seconds (a thousandth of a second). For times shorter than that, the variations in timing will likely be nearly as large as the total time which makes timing shorter activities unreliable.Adjust program parameters like the number of loop iterations so reported times are at least 1e-03 seconds. Ideally times should be larger, in the 1e-01 second range to be trustworthy.
5 QUESTIONS.txt File Contents
Below are the contents of the QUESTIONS.txt
file for the lab.
Follow the instructions in it to complete the QUIZ and CODE questions
for the lab.
_________________ LAB11 QUESTIONS _________________ Exercise Instructions ===================== Follow the instructions below to experiment with topics related to this exercise. - For sections marked QUIZ, fill in an (X) for the appropriate response in this file. Use the command `make test-quiz' to see if all of your answers are correct. - For sections marked CODE, complete the code indicated. Use the command `make test-code' to check if your code is complete. - DO NOT CHANGE any parts of this file except the QUIZ sections as it may interfere with the tests otherwise. - If your `QUESTIONS.txt' file seems corrupted, restore it by copying over the `QUESTIONS.txt.bk' backup file. - When you complete the exercises, check your answers with `make test' and if all is well, create a zip file with `make zip' and upload it to Gradescope. Ensure that the Autograder there reflects your local results. - IF YOU WORK IN A GROUP only one member needs to submit and then add the names of their group. QUIZ Problem 1: Questions on Makefiles ====================================== Review the presentation of `Makefile' basics in the lab description and analyze the examples provided in the demo directories `demo-makefile-little' and `demo-makefile-big'. Then answer the following questions. Command Execution ~~~~~~~~~~~~~~~~~ Which of the following best describes when the commands for a rule are run? - ( ) The commands for all targets in the Makefile are all run every time make is used - ( ) Only the commands associated with the target named on the command line are run; e.g. `make targ5` will run commands associated with targ5 - ( ) Commands for a given rule execute only if make detects a its target is needed and then only if the target is missing or older than its dependencies Rule Syntax ~~~~~~~~~~~ Which of the following Rules correctly lays out the syntax for target, dependencies, and commands in a Makefile. ,---- | # RULE A | target : dependency1 dependency2 | command1 | command2 | command3 | | # RULE B | target : dependency1 dependency2 | command1 | command2 | command3 | | # RULE C | dependency1 dependency2 : target | command1 | command2 | command3 | | # RULE D | target : | command1 | command2 | command3 | : dependency1 dependency2 `---- - ( ) RULE A - ( ) RULE B - ( ) RULE C - ( ) RULE D Variance in Makefiles ~~~~~~~~~~~~~~~~~~~~~ Check ALL that are true about Makefile features - ( ) Rules may have 0 dependencies - ( ) Rules may have 0 commands - ( ) Rules have no more than 1 command - ( ) Rules have no more than 1 dependency - ( ) Makefiles can set up explicit variables for use in rules - ( ) Makefiles provide a variety of automatic variables for use in rules - ( ) A Makefile can have deeply nested decencies: targA depends on targB depends on targC depends on targD etc. CODE Problem 2: prob1-makefile ============================== Complete the template in the subdirectory `prob1-makefile/Makefile' according to the comments there. The essential idea is to build the linked list program from an earlier lab and add a few other targets. Once you add rules for certain targets, you can test them interactively via commands like `make' and `make clean' in that directory. You can also run the automated tests of this via a problem test: ,---- | >> cd lab10-code | >> make test-prob1 # run tests for Makefile under prob1-makefile | ... `---- PROBLEM 2: clock_demo.c Program =============================== Demoers will walk through the `clock_demo.c' program to show how the `clock()' function is used in practice. Students should look carefully at the techniques used to time the two different sections of code and print that timing info. These will be needed to fill in the subsequent programs. Running the `clock_demo' program on the command line will produce results that look like the following: ,---- | >> make clock_demo | gcc -Wall -Werror -g -Og -Wall -Werror -g -Og clock_demo.c -o clock_demo | | >> ./clock_demo | usage: ./clock_demo <arrlen> <repeats> | | >> ./clock_demo 1000 1000 | Summing array length 1000 with 1000 repeats, ascending | Summing array length 1000 with 1000 repeats, descending | method: sum ascending CPU time: 2.3750e-03 sec sum: 499500 | method: sum descending CPU time: 1.5760e-03 sec sum: 499500 | | >> ./clock_demo 100000 1000 | Summing array length 100000 with 1000 repeats, ascending | Summing array length 100000 with 1000 repeats, descending | method: sum ascending CPU time: 6.6969e-02 sec sum: 704982704 | method: sum descending CPU time: 6.2286e-02 sec sum: 704982704 | | >> ./clock_demo 100000 10000 | Summing array length 100000 with 10000 repeats, ascending | Summing array length 100000 with 10000 repeats, descending | method: sum ascending CPU time: 6.2730e-01 sec sum: 704982704 | method: sum descending CPU time: 6.1995e-01 sec sum: 704982704 `---- PROBLEM 2 CODE: `struct_stride.c' Program ========================================= The provided `struct_stride.c' program has a number of TODO items in it related to timing several computations and reporting their results. It is best to complete these items first and then attempt to answer the quiz questions as some questions require running the program and observing timing results. Use the lab's description of how the `clock()' function works to complete TODO items and print the results. When completed, the program can be run as show below: ,---- | >> ./struct_stride | usage: ./struct_stride <arr_length> <num_iters> | | >> ./struct_stride 10000000 100 | method: int_field_base CPU time: 1.2345e-01 sec sum: 0 | method: arr_field_base CPU time: 1.2345e-01 sec sum: 0 | method: int_field_optm CPU time: 1.2345e-01 sec sum: 0 | method: arr_field_optm CPU time: 1.2345e-01 sec sum: 0 `---- NOTE: the timing information has intentionally been changed to read 1.2345e-01 as calculating this timing information is part of the lab. PROBLEM 2 QUIZ: Timing `struct_stride.c' Runs ============================================= NOTE: timing code varies from one machine to the next. The answers below were tested on GRACE and appear to be stable but system load may affect the outcome. Relative Speed of Struct Layouts ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After adding in calls to `clock()' and code the print times, run the `struct_strid' program. Run the program with a large array and a modest amount of array iterations such as using the following parameters: ,---- | ./struct_stride 6000000 100 `---- Examine the times reported. Which option below reflects the relative speeds of the layout/algorithm combinations? ,---- | ------SLOWEST--------------------------------------------FASTEST----- | - ( ) arr_field_base > arr_field_optm > int_field_base > int_field_optm | - ( ) int_field_base > int_field_optm > arr_field_base > arr_field_optm | - ( ) arr_field_base > int_field_base > arr_field_optm > int_field_optm | - ( ) int_field_base > arr_field_base > int_field_optm > arr_field_optm `---- Order of Memory Access ~~~~~~~~~~~~~~~~~~~~~~ Below are several memory layouts of A/B elements to consider. ------------------------------------------------------------------------- Byte Offset +00 +04 +08 +12 +16 +80 +84 +88 +92 +96 LAYOUT1 A0 A1 A2 A3 A4 ... B0 B1 B2 B3 B4 ... ------------------------------------------------------------------------- ------------------------------------------------------------------- Byte Offset +00 +04 +08 +12 +16 +20 +24 +28 32 +36 LAYOUT 2 A0 B0 A1 B1 A2 B2 A3 B3 A4 B4 ... ------------------------------------------------------------------- For each of following, indicate the best suited option. The `int_field_base' approach code that is timed.. - ( ) Uses memory LAYOUT 1 and visit elements in the order A0, B0, A1, B1, A2, B2, etc. - ( ) Uses memory LAYOUT 1 and visit elements in the order A0, A1, A2, ... B0, B1, B2, etc. - ( ) Uses memory LAYOUT 2 and visit elements in the order A0, B0, A1, B1, A2, B2, etc. - ( ) Uses memory LAYOUT 2 and visit elements in the order A0, A1, A2, ... B0, B1, B2, etc. The `arr_field_base' approach code that is timed.. - ( ) Uses memory LAYOUT 1 and visit elements in the order A0, B0, A1, B1, A2, B2, etc. - ( ) Uses memory LAYOUT 1 and visit elements in the order A0, A1, A2, ... B0, B1, B2, etc. - ( ) Uses memory LAYOUT 2 and visit elements in the order A0, B0, A1, B1, A2, B2, etc. - ( ) Uses memory LAYOUT 2 and visit elements in the order A0, A1, A2, ... B0, B1, B2, etc. The `int_field_optm' approach code that is timed.. - ( ) Uses memory LAYOUT 1 and visit elements in the order A0, B0, A1, B1, A2, B2, etc. - ( ) Uses memory LAYOUT 1 and visit elements in the order A0, A1, A2, ... B0, B1, B2, etc. - ( ) Uses memory LAYOUT 2 and visit elements in the order A0, B0, A1, B1, A2, B2, etc. - ( ) Uses memory LAYOUT 2 and visit elements in the order A0, A1, A2, ... B0, B1, B2, etc. The `arr_field_optm' approach code that is timed.. - ( ) Uses memory LAYOUT 1 and visit elements in the order A0, B0, A1, B1, A2, B2, etc. - ( ) Uses memory LAYOUT 1 and visit elements in the order A0, A1, A2, ... B0, B1, B2, etc. - ( ) Uses memory LAYOUT 2 and visit elements in the order A0, B0, A1, B1, A2, B2, etc. - ( ) Uses memory LAYOUT 2 and visit elements in the order A0, A1, A2, ... B0, B1, B2, etc. int_field_base VS arr_field_base ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Examine the differences between the two types of structs that are used in `struct_stride.c' called `int_field_t' and `arr_field_t'. Now examine the first 2 code blocks that use these structs called, `int_field_base' and `arr_field_base'. Both involve arrays and structs which store an equal number of positive and negative integers. However, they differ in the overall layout of those integers. Both use loops sum the "a" numbers first then sum the "b" numbers, then combine them for the total sum. Which of the following are possible explanations for the timing difference between `int_field_base' and `arr_field_base'? - ( ) `int_field_base' must perform more loop iterations than `arr_field_base' which will making it slower. - ( ) `arr_field_base' uses more memory to store then number than `int_field_base' and this additional memory increases speed. - ( ) `int_field_base' has a memory layout that is ABABABAB so when adding A elements, there is a "stride" through memory. `arr_field_base' has a layout like AAAAABBBBB so adding elements has no stride. - ( ) `int_field_base' uses struct field access. The assembly instructions to access array fields are slower than the assembly instructions that access array elements. This makes `arr_field_base' faster due to its use of plain integer arrays. BASE vs OPTM versions ~~~~~~~~~~~~~~~~~~~~~ The last two layout/algorithm sections are labeled "optm" as they perform a simple code transformation from their "base" version. Select ALL of the items below that are accomplished with this transformation. - ( ) Fewer loop checks/increments are needed as there is only one loop instead of 2. - ( ) The number of loop iterations is lowered for all loops in the optm version. - ( ) The memory stride is eliminated for the int_field_optm as both a/b elements are added each iteration. - ( ) The algorithmic complexity is reduced from O(N^2) in the "base" to O(N) in the "optm" version.
Submission
Follow the instructions at the end of Lab01 if you need a refresher on how to upload your completed exercise zip to Gradescope.