Using ReFrame for reproducible and portable performance benchmarking
In this tutorial you will set up the excalibur-tests benchmarking framework on a HPC system, build and run example benchmarks, create a new benchmark and explore benchmark data.
Installing the Framework
Set up python environment
This tutorial is run on the Cosma supercomputer.
It should be straightforward to run on a different platform, the requirements are gcc 4.5
, git 2.39
and python 3.7
or later.
(for the later parts you also need make
, autotools
, cmake
and spack
but these can be installed locally).
Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages.
First load a newer python module.
This tutorial is run on ARCHER2, you should have signed up for a training account before starting. It can be ran on other HPC systems with a batch scheduler but will require making some changes to the config. Before proceeding to install ReFrame, we recommend creating a python virtual environment to avoid clashes with other installed python packages. First load the system python module.
Then create an environment and activate it with
You will have to activate the environment each time you login. To deactivate the environment run deactivate
.
Change to work directory
Move on to the next step.
On ARCHER2, the compute nodes do not have access to your home directory, therefore it is important to install everything in a work file system. Change to the work directory with
If you are tempted to use a symlink here, ensure you use cd -P
when changing directory.
ARCHER2 compute nodes cannot read from /home
, only /work
, so not completely following symlinks can result in a broken installation.
Clone the git repository
In the work directory, clone the excalibur-tests repository with
Create a virtual environment
Before proceeding to install the software, we recommend creating a python virtual environment to avoid clashes with other installed python packages. You can do this with
You should now see the name of the environment in parenthesis your terminal prompt, for example:
You will have to activate the environment each time you login. To deactivate the environment run deactivate
.
Install the excalibur-tests framework
Now we can use pip
to install the package in the virtual environment. Update pip to the latest version with
editable
flag -e
because later in the tutorial you will edit the repository to develop a new benchmark.
We included optional dependencies with [post-processing]
. We will need those in the postprocessing section.
Set configuration variables
Configure the framework by setting these environment variables
export RFM_CONFIG_FILES="$(pwd)/excalibur-tests/benchmarks/reframe_config.py"
export RFM_USE_LOGIN_SHELL="true"
Install and configure spack
Finally, we need to install the spack
package manager. The framework will use it to build the benchmarks. Clone spack with
Then configure spack
with
Spack should now be in the default search path.
Check installation was successful
You can check everything has been installed successfully by checking that spack
and reframe
are in path and the path to the ReFrame config file is set correctly
$ spack --version
0.22.0.dev0 (88e738c34346031ce875fdd510dd2251aa63dad7)
$ reframe --version
4.4.1
$ ls $RFM_CONFIG_FILES
/work/d193/d193/tk-d193/excalibur-tests/benchmarks/reframe_config.py
Environment summary
If you log out and back in, you will have to run some of the above commands again to recreate your environment. These are (from your work
directory):
Run Sombrero Example
You can now use ReFrame to run benchmarks from the benchmarks/examples
and benchmarks/apps
directories. The basic syntax is
system specific flags
In addition, on ARCHER2, you have to provide the quality of service (QoS) type for your job to ReFrame on the command line with -J
. Use the "short" QoS to run the sombrero example with
Output sample
$ reframe -c benchmarks/examples/sombrero/ -r -J'--qos=short' --performance-report
[ReFrame Setup]
version: 4.3.0
command: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/excalibur-env/bin/reframe -c benchmarks/examples/sombrero/ -r -J--qos=short'
launched by: tk-d193@ln03
working directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/excalibur-tests'
settings files: '<builtin>', '/work/d193/d193/tk-d193/excalibur-tests/benchmarks/reframe_config.py'
check search path: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/excalibur-tests/benchmarks/examples/sombrero'
stage directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/excalibur-tests/stage'
output directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/excalibur-tests/output'
log files: '/tmp/rfm-u1l6yt7f.log'
[==========] Running 4 check(s)
[==========] Started on Fri Jul 7 15:47:45 2023
[----------] start processing checks
[ RUN ] SombreroBenchmark %tasks=2 %cpus_per_task=2 /de04c10b @archer2:compute-node+default
[ RUN ] SombreroBenchmark %tasks=2 %cpus_per_task=1 /c52a123d @archer2:compute-node+default
[ RUN ] SombreroBenchmark %tasks=1 %cpus_per_task=2 /c1c3a3f1 @archer2:compute-node+default
[ RUN ] SombreroBenchmark %tasks=1 %cpus_per_task=1 /52e1ce98 @archer2:compute-node+default
[ OK ] (1/4) SombreroBenchmark %tasks=1 %cpus_per_task=2 /c1c3a3f1 @archer2:compute-node+default
P: flops: 0.67 Gflops/seconds (r:1.2, l:None, u:None)
[ OK ] (2/4) SombreroBenchmark %tasks=1 %cpus_per_task=1 /52e1ce98 @archer2:compute-node+default
P: flops: 0.67 Gflops/seconds (r:1.2, l:None, u:None)
[ OK ] (3/4) SombreroBenchmark %tasks=2 %cpus_per_task=2 /de04c10b @archer2:compute-node+default
P: flops: 1.27 Gflops/seconds (r:1.2, l:None, u:None)
[ OK ] (4/4) SombreroBenchmark %tasks=2 %cpus_per_task=1 /c52a123d @archer2:compute-node+default
P: flops: 1.24 Gflops/seconds (r:1.2, l:None, u:None)
[----------] all spawned checks have finished
[ PASSED ] Ran 4/4 test case(s) from 4 check(s) (0 failure(s), 0 skipped, 0 aborted)
[==========] Finished on Fri Jul 7 15:48:23 2023
Log file(s) saved in '/tmp/rfm-u1l6yt7f.log'
Benchmark output
You can find build and run logs in the output/
directory of a successful benchmark. They record how the benchmark was built by spack and ran by ReFrame.
While the benchmark is running, the log files are kept in the stage/
directory. They remain there if the benchmark fails to build or run.
You can find the performance log file from the benchmark in perflogs/
. The perflog records the captured figures of merit, environment variables and metadata about the job.
Create a Benchmark
In this section you will create a ReFrame benchmark by writing a python class that tells ReFrame how to build and run an application and collect data from its output.
For simplicity, we use the STREAM
benchmark. It is a simple memory bandwidth benchmark with minimal build dependencies.
If you've already gone through the ReFrame tutorial some of the steps in creating the STREAM benchmark are repeated. However, pay attention to the Create a Test Class
and Add Build Recipe
steps.
How ReFrame works
When ReFrame executes a test it runs a pipeline of the following stages
You can customise the behaviour of each stage or add a hook before or after each of them. For more details, read the ReFrame pipeline documentation.
Getting started
To get started, open an empty .py
file where you will write the ReFrame class, e.g. stream.py
. Save the file in a new directory e.g. excalibur-tests/benchmarks/apps/stream
.
Include ReFrame modules
The first thing you need is include a few modules from ReFrame. These should be available if the installation step was successful.
Create a Test Class
ReFrame has built-in support for the Spack package manager.
In the following we will use the custom class SpackTest
we created for our benchmarks
module, which provides a tighter integration with Spack and reduces the boilerplate code you'd otherwise have to include.
The data members and methods detailed in the following sections should be placed inside this class.
Add Build Recipe
We prefer installing packages via spack whenever possible. In this exercise, the spack package for stream
already exists in the global spack repository.
The SpackTest
base class takes care of setting up spack as the build system ReFrame uses. We only need to instruct ReFrame to install version 5.10
of the stream
spack package using the openmp
variant.
Note that we did not specify a compiler. Spack will use a compiler from the spack environment. The complete spec is recorded in the build log.
Add Run Configuration
The ReFrame class tells ReFrame where and how to run the benchmark. We want to run on one task on a full archer2 node using 128 OpenMP threads to use the full node.
valid_systems = ['*']
valid_prog_environs = ['default']
executable = 'stream_c.exe'
num_tasks = 1
time_limit = '5m'
use_multithreading = False
Add environment variables
Environment variables can be added to the env_vars
attribute.
Add Sanity Check
The rest of the benchmark follows the Writing a Performance Test ReFrame Tutorial. First we need a sanity check that ensures the benchmark ran successfully. A function decorated with the @sanity_function
decorator is used by ReFrame to check that the test ran successfully. The sanity function can perform a number of checks, in this case we want to match a line of the expected standard output.
@sanity_function
def validate_solution(self):
return sn.assert_found(r'Solution Validates', self.stdout)
Add Performance Pattern Check
To record the performance of the benchmark, ReFrame should extract a figure of merit from the output of the test. A function decorated with the @performance_function
decorator extracts or computes a performance metric from the test’s output.
In this example, we extract four performance variables, namely the memory bandwidth values for each of the “Copy”, “Scale”, “Add” and “Triad” sub-benchmarks of STREAM, where each of the performance functions use the
extractsingle()
utility function. For each of the sub-benchmarks we extract the “Best Rate MB/s” column of the output (see below) and we convert that to a float.
@performance_function('MB/s', perf_key='Copy')
def extract_copy_perf(self):
return sn.extractsingle(r'Copy:\s+(\S+)\s+.*', self.stdout, 1, float)
@performance_function('MB/s', perf_key='Scale')
def extract_scale_perf(self):
return sn.extractsingle(r'Scale:\s+(\S+)\s+.*', self.stdout, 1, float)
@performance_function('MB/s', perf_key='Add')
def extract_add_perf(self):
return sn.extractsingle(r'Add:\s+(\S+)\s+.*', self.stdout, 1, float)
@performance_function('MB/s', perf_key='Triad')
def extract_triad_perf(self):
return sn.extractsingle(r'Triad:\s+(\S+)\s+.*', self.stdout, 1, float)
Run Stream Benchmark
You can now run the benchmark in the same way as the previous sombrero example
Sample Output
$ reframe -c excalibur-tests/benchmarks/examples/stream/ -r -J'--qos=short'
[ReFrame Setup]
version: 4.4.1
command: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/ciuk-demo/demo-env/bin/reframe -c excalibur-tests/benchmarks/examples/stream/ -r -J--qos=short'
launched by: tk-d193@ln03
working directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/ciuk-demo'
settings files: '<builtin>', '/work/d193/d193/tk-d193/ciuk-demo/excalibur-tests/benchmarks/reframe_config.py'
check search path: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/ciuk-demo/excalibur-tests/benchmarks/examples/stream'
stage directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/ciuk-demo/stage'
output directory: '/mnt/lustre/a2fs-work3/work/d193/d193/tk-d193/ciuk-demo/output'
log files: '/tmp/rfm-z87x4min.log'
[==========] Running 1 check(s)
[==========] Started on Thu Nov 30 14:50:21 2023
[----------] start processing checks
[ RUN ] StreamBenchmark /8aeff853 @archer2:compute-node+default
[ OK ] (1/1) StreamBenchmark /8aeff853 @archer2:compute-node+default
P: Copy: 1380840.8 MB/s (r:0, l:None, u:None)
P: Scale: 1369568.7 MB/s (r:0, l:None, u:None)
P: Add: 1548666.1 MB/s (r:0, l:None, u:None)
P: Triad: 1548666.1 MB/s (r:0, l:None, u:None)
[----------] all spawned checks have finished
[ PASSED ] Ran 1/1 test case(s) from 1 check(s) (0 failure(s), 0 skipped, 0 aborted)
[==========] Finished on Thu Nov 30 14:51:13 2023
Log file(s) saved in '/tmp/rfm-z87x4min.log'
Interpreting STREAM results
With default compile options, STREAM uses arrays of 10 million elements. On a full node, the default array size fits into cache, and the benchmark does not report the correct memory bandwidth. Therefore the numbers from this tutorial are not comparable with other, published, results.
To avoid caching, increase the array size during build by adding e.g. stream_array_size=64000000
to the spack spec.
Parametrized tests
You can pass a list to the parameter()
built-in function in the class body to create a parametrized test. You cannot access the individual parameter value within the class body, so any reference to them should be placed in the appropriate function, for example __init__()
Example: Parametrize the array size
array_size = parameter(int(i) for i in [4e6,8e6,16e6,32e6,64e6])
def __init__(self):
self.spack_spec = f"stream@5.10 +openmp stream_array_size={self.array_size}"
[----------] start processing checks
[ RUN ] StreamBenchmark %array_size=64000000 /bbfd0e71 @archer2:compute-node+default
[ RUN ] StreamBenchmark %array_size=32000000 /e16f9017 @archer2:compute-node+default
[ RUN ] StreamBenchmark %array_size=16000000 /abc01230 @archer2:compute-node+default
[ RUN ] StreamBenchmark %array_size=8000000 /51d83d77 @archer2:compute-node+default
[ RUN ] StreamBenchmark %array_size=4000000 /8399bc0b @archer2:compute-node+default
[ OK ] (1/5) StreamBenchmark %array_size=32000000 /e16f9017 @archer2:compute-node+default
P: Copy: 343432.5 MB/s (r:0, l:None, u:None)
P: Scale: 291065.8 MB/s (r:0, l:None, u:None)
P: Add: 275577.5 MB/s (r:0, l:None, u:None)
P: Triad: 247425.0 MB/s (r:0, l:None, u:None)
[ OK ] (2/5) StreamBenchmark %array_size=16000000 /abc01230 @archer2:compute-node+default
P: Copy: 2538396.7 MB/s (r:0, l:None, u:None)
P: Scale: 2349544.5 MB/s (r:0, l:None, u:None)
P: Add: 2912500.4 MB/s (r:0, l:None, u:None)
P: Triad: 2886402.8 MB/s (r:0, l:None, u:None)
[ OK ] (3/5) StreamBenchmark %array_size=8000000 /51d83d77 @archer2:compute-node+default
P: Copy: 1641807.1 MB/s (r:0, l:None, u:None)
P: Scale: 1362616.5 MB/s (r:0, l:None, u:None)
P: Add: 1959382.9 MB/s (r:0, l:None, u:None)
P: Triad: 1940497.3 MB/s (r:0, l:None, u:None)
[ OK ] (4/5) StreamBenchmark %array_size=64000000 /bbfd0e71 @archer2:compute-node+default
P: Copy: 255622.4 MB/s (r:0, l:None, u:None)
P: Scale: 235186.0 MB/s (r:0, l:None, u:None)
P: Add: 204853.9 MB/s (r:0, l:None, u:None)
P: Triad: 213072.2 MB/s (r:0, l:None, u:None)
[ OK ] (5/5) StreamBenchmark %array_size=4000000 /8399bc0b @archer2:compute-node+default
P: Copy: 1231355.3 MB/s (r:0, l:None, u:None)
P: Scale: 1086783.2 MB/s (r:0, l:None, u:None)
P: Add: 1519446.0 MB/s (r:0, l:None, u:None)
P: Triad: 1548666.1 MB/s (r:0, l:None, u:None)
[----------] all spawned checks have finished
[ PASSED ] Ran 5/5 test case(s) from 5 check(s) (0 failure(s), 0 skipped, 0 aborted)
[==========] Finished on Thu Nov 30 14:34:48 2023
Reference values
ReFrame can automate checking that the results fall within an expected range. We can use it in our previous example of increasing the array size to avoid caching. You can set a different reference value for each perf_key
in the performance function. For example, set the test to fail if it falls outside of +-25% of the values obtained with the largest array size.
The performance reference tuple consists of the reference value, the lower and upper thresholds expressed as fractional numbers relative to the reference value, and the unit of measurement. If any of the thresholds is not relevant, None may be used instead. Also, the units in this reference variable are entirely optional, since they were already provided through the @performance_function decorator.
Useful Reading
ReFrame
- ReFrame Documentation
- ReFrame tutorials
- Libraries of ReFrame tests