Tag:

VMware

Path failures on ESX4 with HP storage

by Philip Sellers April 8, 2010

written by Philip Sellers

Since we began upgrading our clusters to ESX4, we have been having strange “failed physical path” messages in our vmkernel logs. I don’t normally post unless I know the solution to a problem, but in this case, I’ll make an exception. Our deployment has been delayed and plauged by the storage issues that I mentioned in an earlier post. Even though we have fixed our major problems, the following type errors have persisted.

Our errors look like this:

vmkernel: 19:18:05:07.991 cpu6:4284)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x410005101140) to NMP device “naa.6001438005de88b70000a00002250000” failed on physical path “vmhba0:C0:T0:L12” H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

vmkernel: 19:18:05:07.991 cpu6:4284)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device “naa.6001438005de88b70000a00002250000” state in doubt; requested fast path state update…

After several cases with VMware and HP technical support, we are no closer to resolving the issues. VMware support, for its part, has done a good job of telling us what ESX is reacting to and seeing. HP support, on the other hand, has been circling around the problem but has made little progress in diagnosing the issue. We have had an ongoing case for several months and our primary EVA resource at HP has continually examined the EVAperf information and SSSU output that we have sent to HP for analysis. Those have turned up nothing, and yet the messages continue from VMware.

The errors in the log make sense to me – we are losing a path to a data disk (sometimes even a boot-from-SAN ESX OS disk!) – but why HP cannot see anything in our Brocade switches or within the EVA is beyond me. Our ESX hosts, whether blade or rack-mounted hardware, are seeing the problems across the board. The one cluster we waited to upgrade never saw the issues in ESX3.5, but sees them now in ESX4. And perhaps it is a VMware issue that is just too sensitive in monitoring its storage, but I suspect its something else. The messages don’t seem to affect operation on the hosts, but it certainly makes investigating problems difficult when trying to determine what is a real problem versus just another failed path message. Anyone else seeing this?

April 8, 2010 4 comments

Networking Virtualization

vNetwork Distributed Switch challenges

by Philip Sellers April 8, 2010

written by Philip Sellers

Among the new technologies introduced with ESX4, I’m particuarly inpressed with the vNetwork Distributed Switch. We have chosen to slowly introduce the dvSwitches into our environment and transition VM’s over to these switches. The distributed switch allows us some new capabilities such as centralized management, individual port assignments, retained state after vMotions, port statistics and improved monitoring, and rate limiting. That said, the distributed vSwitch has posed some challenges in the transition to it and from a design perspective.

Transition
The transition to the dvSwitches was an unexpected complication. In vCenter, the port group names, although identical, are presented differently in vCenter. Because of this, you could not simply VMotion from an ESX3.5 node to ESX4. As a solution, I decided to make an temporary standard vSwitch on one node of the new ESX4 cluster as the destination for all my vMotions. Each of my dvSwitches on the ESX4 cluster had two uplinks, so I stole one uplink for each of my temp standard vSwitches. This allowed me to seemlessly change the network configuration of each VM after it vMotioned from the standard port group to the distributed port group. Although it was more steps and trouble, it allowed me to make the transition during the day while production workloads stayed online with no impact to them.

Design Considerations
vNetwork Distributed Switch has certainly saved me time, and its only been in my VMware environment a really short time. I like several things that the distributed switch allows – such as persistent ports, port statistics and the ability to maintain the distributed switch in one place across all nodes in my cluster. The downside to this centralized administration is an increased dependence on vCenter Server.

Rich Bramley of VM/ETC posted on the challenges of distributed switches with virtualized vCenter instances — largely the problem of a cyclical relationship that could leave you between “the vDS rock and a hard place,” as he puts it. He and I drew the same conclusions and ultimately, I have decided to keep a mix of distributed and standard vSwitches within my environment.

Perhaps I am overly cautious, however, I run my service console, VMotion, FT Logging all across VMware Standard Switches. All VM traffic across the vNetwork Distributed Switch. The exception to the “all VM traffic” rule will be when if we introduce a virtualized vCenter instance on our cluster. The dependance on vCenter for dvSwitches and having a virtualized vCenter assigned to a distributed port group makes for a potential disaster, since the VM cannot access or bind to his port on the distributed vSwitch without vCenter running. Also of note, Rich reports that vCenter on a distributed switch is NOT supported by VMware because of the cyclical relationship between the two.

April 8, 2010 1 comment

Datacenter Storage Virtualization

Best practices for VMware ESX4 with HP EVA storage

by Philip Sellers April 7, 2010

written by Philip Sellers

(For PowerCLI commands to accomplish the same best practices, see this post).

HP provided us with the best practices document for ESX4 connected to an HP EVA array. There is a major change in ESX4. For the first time, ESX is ALUA (Asymmetric Logical Unit Access) aware. (See this post on Yellow Bricks for more detail about ALUA.) ALUA allows the array and ESX to determine the optimal path — the path to the managing controller’s ports in the EVA’s case — and use those optimal paths until one isn’t available. This is important because it prevents flip-flopping on the ESX host.

In previous version of ESX, the desired storage setting was fixed path for the EVA. In our case, we simultaneously presented the ESX3.5 and ESX4 hosts to the same LUNs, meaning some were fixed and some were set to the ESX4 default, which was MRU. This caused problems. After initial issues, we backed away and presented one LUN at a time, performed our VMotions and then unpresented the LUN from the old cluster. This prevented any flapping issues between controllers.

April 7, 2010 3 comments

General

Neglect…

by Philip Sellers April 7, 2010

written by Philip Sellers

Well, its a new quarter and I feel a big obligation to post something to the blog. I cannot believe it has been three months since my last post. I have several irons in the fire, but on the work front, I am glad to report that the vSphere upgrade has been completed and we are performing the final stages of upgrading all our VMware tools, drivers and virtual hardware. This has been a several month long transition and I have several draft posts waiting to go out which were started as things came up during the upgrade. This project has kept me extremely busy, much to the detriment of the blog. But the project has also provided a lot of good information which I want to pass along.

On a personal note, my wife and I are in the final stages of planning our new home, which we hope to begin building in the coming months. Any free time that I would have had to blog about my experiences has been consumed with house plans, builder meetings and other items to prepare for this major undertaking. My wife is attempting to chronicle our build on her blog, My Green Glasses. The house will be a certified green home, Energy Star certified and *possibly* LEED certified. We know that we will be close to meeting requirements for a LEED certification and we are looking at what additional things need to be done to make it happen and whether it is worth it or not.

April 7, 2010 0 comments

Datacenter Storage Virtualization

Growing a Virtual RDM in ESX

by Philip Sellers January 11, 2010

written by Philip Sellers

Short version: To grow a VM with and RDM in virtual compatiblity mode, you must VMotion the VM after performing your storage rescans to recognize the additional space. After the VMotion, the guest OS will see the additional space and be able to access it.

January 11, 2010 3 comments

Datacenter Storage Virtualization

Carefully planning ESX4 and HP Storageworks EVA

by Philip Sellers January 6, 2010

written by Philip Sellers

As in my post about Lessons Learned on ESX4 rollout, we had a pretty serious hiccup with our storage and the ESX systems in December while trying to bring up our ESX4 environment. The primary trouble uncovered was what I’ll call “controller ping-pong”.

An EVA normally has two (maybe more, I’m not primarily a storage guy) controllers and those handle all the requests received through the SAN. For every LUN, one controller is its master. Both controllers can handle requests for the LUN, but only one actually handles the access. If the controller on fabric A is the primary but the controller on fabric B is getting more requests, eventually the EVA swaps control for the LUN to fabric B — wherever the majority of requests are coming.

This behavior would only become a problem if you had hosts configured to access the LUN on different fabrics. ESX4 is ALUA (asymmetric logical unit access) aware, meaning it should automatically determine the optimal path and in the case of an EVA. The EVA, I’m told by HP support, is supposed to respond an ALUA request for the optimal path by responding with the controller that is the master over the LUN.

If you, like us, have an ESX 3.5 cluster with preferred paths setup, you should proceed with caution. The ALUA information isn’t apparently shared between clusters. And if your clusters get different optimal paths, you could end up with controller ping-pong as requests are sent down both fabrics and the volume changes between the two, resulting in more on Fabric A followed by more on Fabric B — forcing the controller to switch masters.

So, while in a migratory state, I think my safest route is to configure the ESX4 hosts to use a preferred path like the ESX3.5 cluster nodes. I hate to move from the default ESX configuration and this isn’t an official recommendation from HP support, but it certainly makes the most sense to define the paths being used (except in a failure).

I post this because I feel like there have to be other HP Storageworks customers who have the same situation or have experienced something similar. I would love to hear from you…

January 6, 2010 3 comments

Datacenter Storage Virtualization

Lessons learned from initial rollout of ESX4

by Philip Sellers December 21, 2009

written by Philip Sellers

Following my November upgrade of Flex-10 VirtualConnect on my blade enclosure, I have begun my rollout and upgrades to ESX4 on a new blade cluster as well as one existing cluster. There are quite a few lessons that I’ve learned on my roll-out.

December 21, 2009 1 comment

Virtualization

Passed the VCP4 exam

by Philip Sellers December 21, 2009

written by Philip Sellers

As of 10am today, I am officially a VCP4. I now have to wait for my official packet to arrive from VMware, but I have done the hard part and passed the exam. For any VCP3’s out there who still haven’t taken the test, there is still time to take the test without needing the official course if you take and pass the VCP4 exam before December 31. That was my major motivator – avoiding an $1,800 class. With relatively limited hands-on experience with vSphere and a little studying, I had no problems with the exam.

For anyone considering taking the exam, here is a quick list of the resources I used to study:

Simon Long’s VCP4 Practice Exam
VMware’s Official Documentation
* ESX and vCenter Server 4 Installation Guide (minimum system requirements, etc.)
* Configuration Maximums for vSphere 4
Scott Lowe’s Mastering VMware vSphere 4

Areas I’d suggest focusing on – system requirements, configuration maximums, new technologies – Data Recovery, vApp, Distributed vSwitches (vmkernel and service console ports on dvSwitches), and licensing and portfolio offerings. Don’t forget some general knowledge about View, Fusion, VMware Server 2 and their other products – those are good for a quick and easy couple correct answers.

December 21, 2009 0 comments

Virtualization

Remove dangling plug-in from vCenter Server

by Philip Sellers November 30, 2009

written by Philip Sellers

After completing our upgrade to vSphere’s version of vCenter Server, there were a few incompatible plugins left from our VI3 install. Most of these were easy to remedy – especially the VMware provided add-on products like Converter and Update Manager which were on the same DVD media as vCenter Server. But, if you have any third-party add-ins for vCenter, you may be left in a simliar situation to us.

We have tested many management products for ESX on our quest for the most appropriate product for our environment. One of those products offered integration of their tool into vCenter by using a plugin. Unfortunately, the plug-in didn’t have a nice uninstaller like the VMware provided plugins. We were left with a nice red caution sign with an exclamation point.

After searching a bit, I came across this thread with a helpful solution – http://communities.vmware.com/message/1366557. Thanks to user virtualchic for the solution…

In a web browser, navigate to http://<<vCenter server name or IP>>/mob
Click on “content”
Click on “ExtensionManager”
Select and copy the name of the plugin you want to remove from the list of values located under Properties.
Click the UnregisterExtension option and a new window will appear.
Paste in the name of the plug and click “Invoke Method” to remove the plugin.
Close the pop-up window.
Refresh the “Managed Object Type: ManagedObjectReference:ExtensionManager” window and the plugin should be removed from the list.

This successfully cleans up the list of plugins, however, I’m not sure if it completely cleans up the plugin or just removes it from view…

November 30, 2009 4 comments

Virtualization

VMworld coverage completed, finally

by Philip Sellers September 18, 2009

written by Philip Sellers

Well, I have finally gotten all my notes out from VMworld and posted on the site. I followed the conference with a family vacation on the West coast, so my time to get these notes processed and posted was limited. I attended several additional sessions, but these were the best of the sessions I attended and ones where I felt like I got the most information from them. Hope that they may help you too…

September 18, 2009 0 comments