This week I’m onsite at HPE Discover and I have had some interesting discussions with folks from HPE about NVMe and their choices around array architecture in 3PAR and Nimble. HPE announced Memory-Driven Flash that is available for 3PAR arrays now and coming to Nimble arrays in 2019. Memory-Driven Flash is storage class memory as a caching layer in its existing array architectures and is based on the Intel Optane storage class memory in an add-in card (AIC). 3PAR and Nimble are not utilizing NVDIMM forms of Intel Optane in the release, though that difference should not be a detriment in the current designs.
For the management at HPE, their talking points are around latency analysis and pinpointing the bottlenecks within the IO pipeline. Both product management teams (3PAR and Nimble) insist that their current bottlenecks are not around the backend storage, but rather the front-end controllers in today’s arrays. So simply enabling NVMe based drives in the array does little to address customers real concern, which is lower latency storage IO. With controller bottlenecks in play, a lot of this ceiling is created because of the extra processing of data services on the controllers.
In testing, the teams discovered that using storage class memory as a caching layer – which the company showed as a technical preview in 3PAR last year at Discover – greatly accelerated workloads and reduced latency. The released products are based around the same design as the technical preview.
So why not also enable NVMe drives at this point? HPE says it is waiting out the emergence of standards and protocol maturity around NMVe before moving to NVMe drives – and because the backend is not where they see the latency introduced in the storage stack. They reiterate the point that the backend drives are NOT where arrays are constrained – and that is an industry-wide assertion, as I understood it – but rather at the controllers.
NVMe as a protocol lacks support for multipathing and failover and other essential resiliency features taken for granted in the SCSI protocol. The short answer is simply that SCSI is robust and NVMe is new with lots of growth. From personal testing of an NVMe over Fabric array, driver support is spotty at best with just a few Linux distributions offering support for NVMe over Fabric. Microsoft has taken the stance that it is not even offering driver support at this time for NVMe over Fabric. The lack of multipathing means that NVMe over Fabric LUNs cannot be used for things like VMware VMFS datastores or clustering use cases – many of the high-profile, high-importance use cases typically associated with super-fast, low latency storage arrays.
That led me to ask them about their data services adding to latency that might otherwise be avoided in future iterations of their arrays. Product managers agreed and said that was the thinking behind the introduction of adaptive data services in the Nimble platform. By making the data services selective, you can achieve the full potential of the hardware, but you also have the choice to enable those data services where desired to improve the cost-effectiveness of the array.