For some wierd reason, I have a ESXi 4.1 U2 cluster of HP BL460c G7 blade servers that seem to have a mind of their own. Well, at least their Qlogic QMH2562 8GB FC HBA’s do. It all started after we installed the new blade chassis and built out the new cluster of ESXi boxes. At random intervals, random boxes, and random HBA units, for saome reason, they would drop their logon to the Clariion CX4 array via Brocade 5900 switches. one time it would be HBA2 on server 1, another HBA3 on server 5. Just wierd, wacky stuff.
So i strated with VMware support and opened a ticket. Almost immediately, they pointed fingers at EMC and told me I had to open a case with them before they would proceed any further. So we did. A little while longer, EMC pointed squarley at VMware and said it was their issue, or just buy PowerPathVE and all will be right with the world. Meanwhile, I’m calling HP, who won’t let me opwn a case on software I did not purchase from them (which is total BS), and who told me to use their “forums” to get assistance. Lame.
Ok, so everyone is pointing at everyone else, but we are getting nowhere and the only way for us to get our dead/broken paths back is to reboot the boxes. They come back up with all 4 paths, but after awhile, they start disappearing again. I got pretty tired of cycling these boxes through maintenance mode and reboots just to resolve this issue, so I started my hunt to find a better way. i had to dig really deep and put in just the right Googly words, but finally came up with a duct-tape fix until a vendor comes back with something worthwhile.
Thanks so much to Cameron Kennedy and his blog post on how to force a Qlogic HBA to re-login to the array – no reboot needed! Here’s the gist of it –
- Find out what host has a HBA that’s offline by looking in the GUI for 2 broken or missing paths.
- SSH into the host and cd /proc/scsi/qla2xxx. There should be 2 files there, one for each hba.
- Cat each file to see which one is offline – the last few lines end in “…. Offline”. The good hba will say “… Online”.
- Enter the following, substituting <hba#> for the number of the file that shows Offline:
echo “scsi-qlalip” > /proc/scsi/qla2xxx/<hba#>
- Watch the paths magically re-appear in a few seconds in the GUI. If they don’t, do a Rescan to force it.
**Note: This is an ongoing support case with VMware and EMC. If I have updates, I’ll post them as I get them.