Recently, I encountered the "No healthy upstream" error in vCenter, which prevented me from logging in. Here’s how I diagnosed and fixed the issue.
Step 1: Resolving SSH Authentication Failure
Initially, when attempting to log in via SSH, I received the following error:
Too many authentication failures - vCenter SSH Error
To bypass this, I disabled public key authentication and logged in using the following command:
ssh -o PubkeyAuthentication=no root@192.168.1.100
After disabling public key authentication, I was able to log in with the old password. However, vCenter immediately prompted me to reset the password.
Step 2: Checking Disk Space
After resetting the password, I noticed that the archive file system was 95% full. Although this wasn't directly causing the issue, I decided to clean up old files.
To find and remove files older than 40 days, I ran:
find * -mtime +40
find *.* -mtime +40 | xargs rm
Step 3: Diagnosing Certificate Issues
Despite the cleanup, the "No healthy upstream" error persisted. Upon further investigation, I found that an expired certificate was causing the issue.
To check certificate details, I ran:
for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do
echo STORE $i;
sudo /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After";
done
This revealed that the MACHINE_SSL_CERT had expired.
Step 4: Regenerating the Expired Certificate
To fix this, I regenerated the MACHINE_SSL_CERT using the VMware Certificate Manager:
/usr/lib/vmware-vmca/bin/certificate-manager
I selected Option 3 and provided the required details. The process generated a new certificate and restarted the necessary services.
Conclusion
After completing these steps, the "No healthy upstream" error was resolved, and vCenter was fully operational again. If you encounter a similar issue, checking expired certificates and regenerating them might be the key fix.