Upgrade from 4.6 to 5.0.5 cluster is not working

Description

Failed trying to do a snapshot restore of Cloudify 4.6 on a 5.0.5 Cloudify cluster (3 Managers) using the following method:
1. Running `cfy_manager stop` on manager_2 and manager_3.
2. Running a snapshot restore on manager_1.
3. Running `cfy_manager start` on manager_2 and manager_3.

  • Important note: This method worked on a restore from 5.0.5 to a 5.0.5 Cloudify cluster.

The failure was that while trying to run `cfy_manager start` on manager_2 and manager_3, 'cloudify-composer' and 'cloudify-mgmtworker' services didn't start. It's important to mention that according to the logs, the restore succeeded on all managers.

To find out what was the problem I did more tests:
1. I tried to do the same procedure on an all-in-one 5.0.5 Cloudify Manager and encountered the same issue: I did a snapshot restore of Cloudify 4.6, the restore succeeded, but after running `cfy_manager stop` and then `cfy_manager start` I got the same error.

2. I succeeded to run this procedure while doing a snapshot restore of Cloudify 5.0.5 on an all-in-one Cloudify Manager 5.0.5.

I think that if we'll fix the issue on an all-in-one manager, it will also fix it on a cluster.

Steps to Reproduce

Environment:
------------------------------------
Full Cloudify cluster: 3 PostgreSQL, 3 RabbitMQ, 3 Managers, 1 load-balancer

Steps to reproduce:
------------------
1. Run `cfy_manager stop` on manager_2 and manager_3.
2. Do a snapshot restore of a 4.6 snapshot from our AWS.
3. Run `cfy_manager start` on manager_2 and manager_3.

Expected result:
---------------
All services should start and the restore should succeed.

Actual result:
-------------
'cloudify-composer' and 'cloudify-mgmtworker' services won't start.

Why Propose Close?

None

Activity

Show:
Yoni Itzhak
November 17, 2019, 5:34 PM

Hi ,

Barak suggested that I’ll ask you about this issue.
After running the commands as described above, the journalctl of the cloudify-composer.service showed the following error:

Nov 17 16:07:49 cloudify systemd[1]: Started Cloudify Composer Service.
Nov 17 16:07:49 cloudify node[26894]: /opt/cloudify-composer/node_modules/log4js/lib/configuration.js:25
Nov 17 16:07:49 cloudify node[26894]: throw new Error(Problem with log4js configuration: (${util.inspect(config, { depth: 5 })}) +
Nov 17 16:07:49 cloudify node[26894]: ^
Nov 17 16:07:49 cloudify node[26894]: Error: Problem with log4js configuration: ({ replaceConsole: true,
Nov 17 16:07:49 cloudify node[26894]: appenders:
Nov 17 16:07:49 cloudify node[26894]: [ { type: 'clustered',
Nov 17 16:07:49 cloudify node[26894]: appenders:
Nov 17 16:07:49 cloudify systemd[1]: cloudify-composer.service: main process exited, code=exited, status=1/FAILURE
Nov 17 16:07:49 cloudify systemd[1]: Unit cloudify-composer.service entered failed state.
Nov 17 16:07:49 cloudify systemd[1]: cloudify-composer.service failed.

 

Do you know what might be causing it?

You can find further logs in the attached folder (the left one).

 

Thanks in advance,

Yoni

Jakub Niezgoda
November 18, 2019, 11:21 AM
Edited

We checked that with and it looks like in case of Composer the problem is with Composer’s internal configuration which is included in snapshot. That configuration syntax has changed between 4.6 and 5.0.5 version and we now (in 5.0.5) cannot override log4js.conf file with a file from 4.6 version.
Actually we need to decide if any parts of internal Composer configuration should be included in snapshot.

I think we need to discuss that on tomorrow’s weekly. We need to create solution for both - stage and composer - as right now internal configuration for both components is included in snapshot (which can produce issues - eg. )

Barak Azulay
November 18, 2019, 4:47 PM

please read above comments

Is there a post snapshot restore mechanism that can help solve this problem ?

Inbal Amrani
November 18, 2019, 4:58 PM
Edited

Can’t think of a suitable mechanism, but maybe we don’t need to restore this Composer’s internal configuration? Or write a script for it like the restore-snapshot.py is for stage.

Jakub Niezgoda
November 19, 2019, 10:39 AM
Edited

We had a discussion with about how to solve the problem.

The decision is to change snapshot restore mechanism to ignore the following directories:

  1. /opt/cloudify-composer/backend/conf

  2. /opt/cloudify-stage/conf

 

Assignee

Yoni Itzhak

Reporter

Yoni Itzhak

Labels

Severity

Medium

Target Version

5.0.5

Premium Only

no

Found In Version

5.0

QA Owner

None

Bug Type

regression bug

Customer Encountered

No

Customer Name

None

Release Notes

no

Priority

Medium

Sprint

None

Fix versions

Priority

Blocker
Configure