FT is a new feature which VMware introduced to the world during VMworld 2008. The feature is a continuous availability solution for use with some virtual machines. The following notes were compiled from session BC2621 at VMworld which introduced the forthcoming FT feature from VMware’s Application vServices. The feature should be available sometime in 2009.
FT will enable a VM to be protected with zero downtime and zero data loss due to a hardware failure. FT allows for a VM to have a secondary copy running simultaneously on a second ESX host which is executing every instruction and every input in lockstep with the primary VM. In the event of a failure, the secondary VM becomes the primary within a matter of seconds, while preserving state and without disconnecting any connections to the virtual machine. All traffic is redirected to the secondary. In addition, once the secondary assumes the role as primary, it spawns a new secondary instance on another ESX host and brings full fault tolerance back to the virtual server.
The idea of continuous access is very impressive and like the HA feature, should lower the bar in relation to ease of adoption for continuous access within the enterprise. One of the things which VMware touted during the introduction is that they believe this will really increase the number of applications which can be protected this way. Other solutions on the market are usually cost prohibitive for all but the most mission critical of applications.
This particular solution is also very appealing because it requires no modification to the application to support this technology. The application does not need to be made cluster aware or altered in any way, because the fault tolerance is all accomplished at the virtualization layer. In the same way that applications don’t have to be altered to run in a VM, applications will not need to be altered for FT.
There are some requirements for running FT in its first incarnation. First, FT will only support uni-processor VM’s, at least in the beginning. It requires an HA/DRS cluster and VMotion to work (more on this later). One good thing to note, is that there isn’t an extra storage cost for the secondary VM. It uses the same virtual hard disks as the primary on the shared VMFS, so that too is a requirement – shared VMFS of some sort.
The FT feature doesn’t come without some cost. In addition to these requirements, the secondary VM will be alive and running, so it will consume the same amount of resources on a second ESX server as the primary VM. This means you probably will not want to protect every VM. FT will also require a dedicated NIC on the cluster for FT logging with at least 1 gig speed. Lastly, the FT feature will increase latency on the VM, albeit very slightly. Keeping the VM in lockstep is the reason for the latency increase.
To support this technology, VMware is working closely with processor makers. Processors will need to be HV-compatible (I think this is not the same as HVM – somebody tell me for sure…) processors introduced in Intel’s Harpertown and AMD’s Barcelona chips. So, the feature will be only available for the newest of processors, which precludes my quad-core Xeon’s in the blades we purchased end of last year. Bummer.
FT will not support thin provisioning. In fact, per the demo, if a VM is thin provisioned (that is if it is created from a linked clone rather than a full virtual HD) then enabling FT will automatically force the disk to be converted to a full virtual hard disk.
VMotion is supported with FT virtual machines, though we were warned that you probably don’t want to do this often. You may want to set affinity to a particular ESX host for the primary FT VM to prevent this, but it will work if enacted. Storage VMotion, on the other hand, is not supported with FT virtual machines. This is because both VMs are accessing the same physical files on the storage.
The speaker gave us a few key points for picking candidates for FT:
- Applications that run well on a single processor.
- Applications which can tolerate higher latency.
- Applications with medium bandwidth requirements (<600Mbits).
Additional Reading from Other Blogs:
- Fault Tolerant VMs in VI: Operations and Best Practices – Scott Lowe
- VMware VDC-OS bring fault tolerance to VMs – InfoWorld