kill-cancel doesn't kill all processes

Description

The bug is here: https://github.com/cloudify-cosmo/cloudify-agent/blob/master/cloudify_agent/worker.py#L207

The issue is that when _stop_process kills the subprocess of cloudify.dispatch, cloudify.dispatch continues by popping the process which has just terminated from the _process_registery: https://github.com/cloudify-cosmo/cloudify-common/blob/master/cloudify/dispatch.py#L213.

The more processes that exist for an execution, the more likely that this modification of _process_registery occurs while worker.py is still iterating over it. Modifying _process_registery from another thread while it’s being iterated over leads to undefined behavior (in my case, about half of the processes were actually being cancelled).

The solution is just to collect all the thread objects into an array in worker.py and then iterate over the thread objects (instead of the processes), calling start on each one.

Steps to Reproduce

Environment:
OS (CLI), HA cluster, cloud provider
------------------------------------

Steps to reproduce:
------------------
1.
2.
3.

Expected result:
---------------

Actual result:
-------------

Why Propose Close?

None

Activity

Show:
Mohammed Abuaisha
January 29, 2020, 9:00 AM

let me know what do you think on the following PRs related to this jira

  1. https://github.com/cloudify-cosmo/cloudify-common/pull/401/files

  2. https://github.com/cloudify-cosmo/cloudify-agent/pull/596/files (Updated PR)

 

Barak Azulay
January 29, 2020, 3:46 PM

I would also consider this one for 5.0.5-patch (up to you) ,

however this hve the same problem of influencing both management worker & agent

Mohammed Abuaisha
February 2, 2020, 9:52 AM

All tests are passed for the two PRs

Mohammed Abuaisha
February 5, 2020, 12:45 PM

Let me know if we can merge this to master as we already started the 5.1.0 version

Mohammed Abuaisha
February 19, 2020, 12:17 PM

This Jira can be moved to done as now merged to master for both common changes and for agent changes

Assignee

Mohammed Abuaisha

Reporter

Jonathan Abramsohn

Severity

High

Target Version

5.1

Premium Only

no

Found In Version

4.5

QA Owner

None

Bug Type

unknown

Customer Encountered

Yes

Customer Name

c458

Release Notes

yes

Priority

None

Priority

Unprioritized
Configure