|
|
During 2019-07-[08-09] maintenance window for Spartan HPC, we ran extensive benchmarking with IO500 on our CephFS cluster. Here are the details.
|
|
|
|
|
|
## Infrastructure details
|
|
|
|
|
|
### Networking
|
|
|
|
|
|
Mellanox leaf switches SN2100 for Ceph nodes
|
|
|
Mellanox leaf switches SN2700 for client gpgpu nodes
|
|
|
Mellanox spine switches SN2700
|
|
|
2x100G from leaf to spine, 4x100G between spines
|
|
|
|
|
|
### Ceph cluster
|
|
|
|
|
|
RHEL 7.6, kernel-lt elrepo 4.4.135-1.el7.elrepo.x86_64, Mellanox OFED 4.3-3.0.2.1
|
|
|
|
|
|
* mon[1-5]: x10 cores Xeon v4 2.4GHz, 64GB of RAM, 2x25Gbe Mellanox
|
|
|
* mds[1-3]: 2 active, 1 standby, each is: 1x6-cores Xeon v4 3.4GHz, 512GB of RAM, 2x25Gbe Mellanox
|
|
|
* NLSAS data pool:
|
|
|
* 36 OSD nodes, 16 drives each (576 drives in total), mix of 8TB and 10TB NLSAS drives.
|
|
|
* Each node has 1xNVMe card Intel P3700 or Optane 900P for WAL (2GB) and RocksDB (10GB) per OSD.
|
|
|
* 1x10-cores Xeon v4 2.4GHz, 128GB of RAM, 2x25Gbe Mellanox
|
|
|
* Replicated 3:1 ratio
|
|
|
* Fullness: ~60%
|
|
|
* SSD data pool:
|
|
|
* 16 OSD nodes, 8 Sandisk BSSD 8TB drives each over 12Gb SAS (IF150 unit), 128 drives in total
|
|
|
* Each node has 2x NVMe cards (Optane 900P) for WAL (4GB) and RocksDB( 40GB) per OSD
|
|
|
* 2x16-cores Xeon v4 2.6GHz, 128GB of RAM, 2x25Gbe Mellanox
|
|
|
* Erasure Code 4:2 ratio
|
|
|
* Fullness: ~73%
|
|
|
* Metadata pool:
|
|
|
* On 10 of the 16 SSD OSD nodes
|
|
|
* Each node has 1x NVMe (Optane 900p 480GB) partitioned into 4, each becomes an OSD (so 40 NVMe OSDs in total)
|
|
|
|
|
|
## Compiling IO500
|
|
|
|
|
|
We run IO500 via Spartan Slurm, and compile it through Spartan modules.
|
... | ... | @@ -14,3 +45,4 @@ cd io-500-dev |
|
|
./utilities/prepare.sh
|
|
|
```
|
|
|
|
|
|
## Preparing scripts |