Yesterday, I had a problem after I shutdown a lagcy San, that after I rescaned my HBA’s, ( which timed out), I lost connection to the host.
I tried restarting the normal services, in console,
# service vmware-vpxa restart
#service mgmt-vmware restart
but I would get,
Stopping VMware ESX Server Management services:
VMware ESX Server Host Agent Watchdog [fusion_builder_container hundred_percent=”yes” overflow=”visible”][fusion_builder_row][fusion_builder_column type=”1_1″ background_position=”left top” background_color=”” border_size=”” border_color=”” border_style=”solid” spacing=”yes” background_image=”” background_repeat=”no-repeat” padding=”” margin_top=”0px” margin_bottom=”0px” class=”” id=”” animation_type=”” animation_speed=”0.3″ animation_direction=”left” hide_on_mobile=”no” center_content=”no” min_height=”none”][FAILED]
I found the following post , http://communities.vmware.com/message/1157237 and KB article, http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1005566
Both which have you kill the hostd, process, then rename the pid, file. Here is the contents of the KB.
Service mgmt-vmware restart may not restart hostd
- You are using the command service mgmt-vmware restart but it does not finish restarting hostd.
- The script gets stuck when stopping the service.
The SSH session to the ESX host becomes unresponsive.
hostd does not restart
You must manually stop the stuck service and restart it.
To stop the service and restart it:
Log in as root to the ESX host command-line via the physical console or via KVM connection.
- Navigate to the /var/run/vmware directory:
# cd /var/run/vmware
Run the following command to list the files vmware-hostd.PID and watchdog-hostd.PID:
# ls -l vmware-hostd.PID watchdog-hostd.PID
- Determine the Process ID (PID) management service. View the contents of the vmware-hostd.PID file:
# cat vmware-hostd.PID
[root@vmware]# cat vmware-hostd.PID
Use the resulting PID to kill the process.
Caution: Use the kill -9 command with care. It kills the process of the supplied PID without exception or confirmation.
# kill -9 <PID>
In this example you run kill -9 1191.
- Delete the vmware-hostd.PID and watchdog-hostd.PID files:
# rm vmware-hostd.PID watchdog-hostd.PID
- Restart the management service.
# service mgmt-vmware restart”
This seemed to fix it, and without any downtime.