Executions stuck in force_cancelling if a failover happens during cancelling
1. Start an execution
2. Force failover
3. Cancel on the new master
It's also possible to trigger this by a badly timed reboot of the manager. We now have a script to set these workflows to a cancelled state which I will attach. It is tested with 22.214.171.124
not just that; it would be fine if the executions were cancelled, or terminated, or force cancelled after a failover - but they end up stuck in "force_cancelling", never actually reaching cancelled state.
However, it turns out, that this doesn't happen only in the cluster. All you need to do to trigger the bug, is to stop mgmtworker while a workflow is running, and cancel that execution afterwards.
Using a cluster and triggering a failover is just a special case of stopping mgmtworker while it was executing a failover...
isnt this the case for any workflow running during failover as we do not have a workflow replication mechanism yet?