Executions stuck in force_cancelling if a failover happens during cancelling

Description

1. Start an execution
2. Force failover
3. Cancel on the new master

Activity

Show:
Sivan Barzily
January 11, 2018, 4:53 PM

isnt this the case for any workflow running during failover as we do not have a workflow replication mechanism yet?

Łukasz Maksymczuk
January 11, 2018, 5:45 PM

not just that; it would be fine if the executions were cancelled, or terminated, or force cancelled after a failover - but they end up stuck in "force_cancelling", never actually reaching cancelled state.

However, it turns out, that this doesn't happen only in the cluster. All you need to do to trigger the bug, is to stop mgmtworker while a workflow is running, and cancel that execution afterwards.
Using a cluster and triggering a failover is just a special case of stopping mgmtworker while it was executing a failover...

geokala
January 12, 2018, 11:59 AM

It's also possible to trigger this by a badly timed reboot of the manager. We now have a script to set these workflows to a cancelled state which I will attach. It is tested with 4.1.1.1

Assignee

Łukasz Maksymczuk

Reporter

Łukasz Maksymczuk

Labels

Bug Type

legacy bug

Target Version

4.4

Severity

None

Affects versions

Configure