Fixing the "No Healthy Upstream" Error in vCenter



Recently, I encountered the "No healthy upstream" error in vCenter, which prevented me from logging in. Here’s how I diagnosed and fixed the issue.

Step 1: Resolving SSH Authentication Failure

Initially, when attempting to log in via SSH, I received the following error:

Too many authentication failures - vCenter SSH Error

To bypass this, I disabled public key authentication and logged in using the following command:

ssh -o PubkeyAuthentication=no root@192.168.1.100

 

After disabling public key authentication, I was able to log in with the old password. However, vCenter immediately prompted me to reset the password.

Step 2: Checking Disk Space

After resetting the password, I noticed that the archive file system was 95% full. Although this wasn't directly causing the issue, I decided to clean up old files.

To find and remove files older than 40 days, I ran:

find * -mtime +40

find *.* -mtime +40 | xargs rm

Step 3: Diagnosing Certificate Issues

Despite the cleanup, the "No healthy upstream" error persisted. Upon further investigation, I found that an expired certificate was causing the issue.

To check certificate details, I ran:

for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do 

    echo STORE $i; 

    sudo /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; 

done

This revealed that the MACHINE_SSL_CERT had expired.

Step 4: Regenerating the Expired Certificate

To fix this, I regenerated the MACHINE_SSL_CERT using the VMware Certificate Manager:

/usr/lib/vmware-vmca/bin/certificate-manager

I selected Option 3 and provided the required details. The process generated a new certificate and restarted the necessary services.

Conclusion

After completing these steps, the "No healthy upstream" error was resolved, and vCenter was fully operational again. If you encounter a similar issue, checking expired certificates and regenerating them might be the key fix.