I may have already mentioned that one of our projects for the year is to transition our corporate ESX cluster from 2U hardware onto blades. The process of transitioning does not come without some concern and some caveots moving to the blade architecture. We feel that blades are a good fit in our case for this particular cluster (we run several ESX clusters). Our VMware View deployment is our first production ESX workload on blade hardware. We have learned a few things from this deployment that might be helpful.
Multiple VLANs on a single NIC
When using Virtual Connect, the default configuration sets VLAN tagging support to “Tunnel VLAN Tags.” The mode is self-explanitory and just means that the “Ethernet Network” chosen and assigned to the server profile in Virtual Connect is the only network visible to the blade. For most blade users, this setting works fine and many ESX deployments might be ok with this configuration. But for many ESX deployments, people require multiple VLANs to be brought up on the same physical NIC. The “Tunnel VLAN Tags” mode does not allow for this functionality.
To allow for multiple VLANs on a single NIC, you must login to Virtual Connect, expand Ethernet Settings, select Advanced Settings and change the VLAN Tagging Support from Tunnel to “Map VLAN Tags.” Map VLAN Tags exposes a Multiple VLANs option in the network assignment drop-down under your server profile. Once you select Mutliple VLANs, a new window appears and you may select as many VLANs as you need exposed to the server. The ESX host is then required to tag its traffic on these NICs.
Number of nodes per enclosure
Thou shalt not run more than 4 nodes of an ESX cluster in the same blade enclosure. No, its not the 11th commandement, but it is an important rule to know. Thanks to Duncan Epping’s article on the topic, we discovered a major implementation hazard for ESX on blade architecture that we didn’t have to experience first hand. We did have our own enclosure failure, which made us aware that we could have been affected, however. The pitfall is that HA has primary and secondary nodes in a cluster. An ESX HA cluster can have up to five primary nodes, but never more. The first five nodes in the HA cluster become primaries and these roles never get reassigned if a primary node fails. The primary nodes are responsible for directing the HA activites. So, you don’t want all your HA primary nodes running in the same enclosure. If all five HA nodes are running in the same enclosure and it fails, you will not get the desired HA restarts on the other ESX nodes in your cluster. Duncan’s article gives a great overview of the HA clustering architecture and sheds light on a little known consideration.
Service Console & VMotion
Perhaps the favorite thing about Virtual Connect and ESX is the ability to creatively configure Service Console and VMotion using just two NICs, and providing redundancy and isolation as needed for these functions. Lets look at the best practices:
- Service Console should have two NICs teamed for redundancy of this network link.
- VMotion should have its own dedicated NIC for the best performance of VMotion traffic.
So, what makes Virtual Connect suit this well?
First, Virtual Connect is redundant on the VC-Ethernet side. We can create a single “Shared Uplink Set” with both Service Console and VMotion tagged traffic. The stacking link between the two VC-Ethernet module will provide for traffic on NIC0 to be rerouted on Bay 2 if the uplink on Bay 1 is down. As long as both VC-Ethernet modules are functioning, the stacking links would be utilized.
Second, we can use ESX NIC Teaming failover settings to keep the traffic separate, except when a failure occurs. If you lost a VC-Ethernet module, your Service Console and VMotion traffic would be failed over using ESX onto the same NIC on the other VC-Ethernet module.
There are really a lot of options in this space and this is my high-level implementation. I think its a great solution and can’t find many trade-offs for this. In an blade environment where NICs are a premium, this is a wonderful solution.
(By the way, I didn’t devise this SC/VMotion configuration – its something someone else posted, but I can’t find the original blog post to give you credit… If it was you, please let me know.)