We're doing some in-house testing of healing prior to demoing to a customer and are experiencing problems if the heal is performed after a Cloudify manager failover. The URL that the healed instance uses to download the agent package still uses the IP address of the old Cloudify manager, so that fails. As far as I can see, the URL is being retrieved from the "package_url" key of the cloudify_agent runtime property and I can confirm that that is unchanged by a manager failover.
We raised a similar problem in 2018 (https://support.cloudify.co/hc/en-us/requests/9771), and I can confirm that the patch that was delivered for that request looks to have been installed on the Cloudify Managers here (certainly I've confirmed that the changed files from that patch have been updated on the managers).
OS (CLI), HA cluster, cloud provider
Steps to reproduce:
It seems that this is a real issue, but I don't know why has it resurfaced...
This was originally fixed in where we made the package url be dynamically computed based on the current active manager ip. The patch was https://github.com/cloudify-cosmo/patchify/pull/26
Maybe that is set wrong (for some unknown reason), or the install script keeps using the previously-rendered package_url instead of the dynamically-computed url.
We need to repro and examine it closely, probably at least half a day to understand it. I don't know what then, either a workaround or another patch for... 4.5?
Eve, do you have a reproduction? a cluster on which that can be tested?
Lukasz, what information do you need in order to understand this? will access to the file system suffice?
Customer reported to have the same problem on Cloudify 4.6.
Apparently the download script of the agent doesn’t switch to the new leader.
Note that this is 4.x related only.