Executions stuck in force_cancelling if a failover happens during cancelling


1. Start an execution
2. Force failover
3. Cancel on the new master


January 12, 2018, 11:59 AM

It's also possible to trigger this by a badly timed reboot of the manager. We now have a script to set these workflows to a cancelled state which I will attach. It is tested with

Łukasz Maksymczuk
January 11, 2018, 5:45 PM

not just that; it would be fine if the executions were cancelled, or terminated, or force cancelled after a failover - but they end up stuck in "force_cancelling", never actually reaching cancelled state.

However, it turns out, that this doesn't happen only in the cluster. All you need to do to trigger the bug, is to stop mgmtworker while a workflow is running, and cancel that execution afterwards.
Using a cluster and triggering a failover is just a special case of stopping mgmtworker while it was executing a failover...

Sivan Barzily
January 11, 2018, 4:53 PM

isnt this the case for any workflow running during failover as we do not have a workflow replication mechanism yet?


Łukasz Maksymczuk


Łukasz Maksymczuk


Bug Type

legacy bug

Target Version