New Server – NVMe Issues

My current server is somewhat aged. I bought it new in July 2014 with a 6-core Xeon E5-2630L, 32 GB RAM and 4x 3.5″ hot-swappable drives. Gladly I had the opportunity to extend the memory to 128 GB RAM at no additional cost by using memory from my ex-employer. It also has 4x 2 TB WD Red HDDs with 5400 rpm hooked up to the SATA backplane, but unfortunately only two of them are SATA-3 with 6 Gbit/s.

The new server is a used/refurbished Supermicro server with 2x 14-core Xeon E5-2683 and 256 GB RAM and 4x 3.5″ hot-swappable drives. It also came with a Hardware-RAID SAS/SATA 8-port controller with BBU. I also ordered two slim drive kits (MCP-220-81504-0N & MCP-220-81506-0N) to be able to use 2x 3.5″ slots for rotational HDDs as a cheap storage. Right now I added 2x 128 GB Supermicro SATA DOMs, 4x WD Red 4 TB SSDs and a Sonnet Fusion 4×4 Silent and 4x 1 TB Seagate Firecuda 520 NVMe disks.

And here the issue starts:

The NVMe should be capable of 4-5 GB/s, but they are connected to a PCIe 3.0 x16 port via the Sonnet Fusion 4×4, which itself features a PCIe bridge, so bifurbacation is not necessary.

When doing some tests with bonnie++ I get around 1 GB/s transfer rates out of a RAID10 setup with all 4 NVMes. In fact, regardless of the RAID level there are only transfer rates of about 1 – 1.2 GB/s with bonnie++. (All software RAIDs with mdadm.)

But also when constructing a RAID each NVMe gives around 300-600 MB/s in sync speed – except for one exception: RAID1.

Regardless of how many NVMe disks in a RAID1 setup the sync speed is up to 2.5 GB/s for each of the NVMe disks. So the lower transfer rates with bonnie++ or other RAID levels shouldn’t be limited by bus speed nor by CPU speed. Alas, atop shows upto 100% CPU usage for all tests. I even tested

In my understanding RAID10 should perform similar to RAID1 in terms of syncing and better and while bonnie++ tests (up to 2x write and 4x read speed compared to a single disk).

For the bonnie++ tests I even made some tests that are available here. You can find the test parameters listed in the hostname column: Baldur is the hostname, then followed by the layout (near-2, far-2, offset-2), chunk size and concurrency of bonnie++. In the end there was no big impact of the chunk size of the RAID.

So, now I’m wondering what the reason for the “slow” performance of those 4x NVMe disks is? Bus speed of the PCIe 3.0 x16 shouldn’t be the cause, because I assume that the software RAID will need to transfer the blocks in RAID1 as well as in RAID10 over the bus. Same goes for the CPU: the amount of CPU work should be roughly the same for RAID1 and for RAID10. RAID10 should even have an advantage because the blocks only need to be synced to 2 disks in a stripe set.

Bonnie++ tests are a different topic for sure. But when testing reading with dd from the md-devices I “only” get around 1-1.5 GB/s as well. Even when using LVM RAID instead of LVM on top of md RAID.

All NVMe disks are already set to 4k and IO scheduler is set to mq-deadline.

Is there anything I could do to improve the performance of the NVMe disks? On the other head, pure transfer rates are not that important to a server that runs a dozen of VMs. Here the improved IOPS performance over rotation disks is a clear performance gain. But I’m still curious if I could get maybe 2 GB/s out of a RAID10 setup with the NVMe disks. Then again having two independent RAID1 setups for MariaDB and for PostgreSQL databases might be a better choice over a single RAID10 setup?

2 thoughts on “New Server – NVMe Issues

  1. Some time back (2-3 years?) I had similar problems. It turned out that the mapping and unmapping of the buffers with the iommu was the limiting factor. After I turned off the iommu on the linux command line, the transfer rate of my ssds rose from something like 0.8 to 1.8 GB/s. I don’t know if this is still valid with current linux kernels, but you may want to give it a try if you don’t need the iommu.

    1. I tested with both iommu and intel_iommu set to off but both showed no increase in transfer rate. But it was worth a try…
      Otherwise it would be good to know what exactly you have changed…

Leave a Reply

Your email address will not be published.