This testbed utilises world leading HPC systems development, deployment and operational skills housed within the Cambridge Research Computing Service to build a next generation high performance PCI-Gen-4 solid state I/O testbed utilising a range of file systems including Lustre, Intel DAOS, BeeGFS and HDF5 on state-of-the-art solid state storage hardware. The system utilises the latest Intel PCI Gen-4 NVMe drives and the new Intel PCI-Gen-4 Optane Data Centre Persistent Memory. The project will see the deployment in April 2021 of the UK’s fastest HPC storage testbed delivering over 500GB/s bandwidth and over 20 million IOPS of raw I/O performance which can be deployed across applications via a range of leading HPC file systems such as Lustre, Intel DAOS or BeeGFS as well as other more low level direct I/O protocols.
It is expected that the solution will also be one of the fastest HPC storage solutions in the world, being ranked high in the worldwide I/O 500 listing. Intel DAOS is of particular interest since it has been developed from the ground up to provide a persistent file system utilising both NVMe drives and Optane DCP memory. DAOS is still at proof of concept stage but is shown to deliver far higher performance than traditional parallel file systems. This project is supported directly by Intel in terms of hardware, staff effort and strong co-design work in collaboration with Intel engineers developing the DAOS file system. The system will represent Intel’s largest DAOS testbed.
In addition to the I/O hardware and various file system technologies the testbed is configured with comprehensive system level telemetry monitoring capability provided by the UKRI funded Scientific OpenStack middleware layer combined with a range of other more specialised application I/O profiling tools. The UK Scientific OpenStack is a world leading HPC middleware layer developed at Cambridge and funded by over 4 years investment from STFC, EPSRC and MRC. System I/O telemetry combined with application level I/O profiling is vital if we are to fully exploit emerging I/O and file system technologies by helping application developers understand how to implement the most efficient I/O mechanisms within the application code. Without such tools developers will be blind in terms of how to best utilise the new I/O platforms.
Access to the testbed systems
The Excalibur H&ES testbeds are prioritised for access by ExCALIBUR projects, but also available for use by the wider UK research community - contact the ExCALIBUR H&ES programme office to discuss your requirements. Please note that the testbeds are offered on a best efforts basis rather than a service footing, as befits their experimental status.