1. Start an execution
2. Force failover
3. Cancel on the new master
isnt this the case for any workflow running during failover as we do not have a workflow replication mechanism yet?
not just that; it would be fine if the executions were cancelled, or terminated, or force cancelled after a failover - but they end up stuck in "force_cancelling", never actually reaching cancelled state.
However, it turns out, that this doesn't happen only in the cluster. All you need to do to trigger the bug, is to stop mgmtworker while a workflow is running, and cancel that execution afterwards.
Using a cluster and triggering a failover is just a special case of stopping mgmtworker while it was executing a failover...
It's also possible to trigger this by a badly timed reboot of the manager. We now have a script to set these workflows to a cancelled state which I will attach. It is tested with 18.104.22.168