Old logs on Database are not deleted if the deployment is still alive

Description

Logs entries on the database are deleted only when deployment gets deleted.
Assuming the following use case:
A deployment that runs the install WF every minute, deployment never gets deleted and so are logs, after 6 months of running the logs table is 10GB and operations is 6GB
Assuming it will continue to run for another 6 months we will get to 32GB only for these tables.

I think, our design assumed deployments are getting deleted, or idle, while it should have considered, day two operations.
We should archive old logs of a deployment, after a certain amount of time, that could be configured by the customer, that is, assuming the archive won't be part of the working database and won't affect its performance.

Steps to Reproduce

Environment:
OS (CLI), HA cluster, cloud provider
------------------------------------

Steps to reproduce:
------------------
1.
2.
3.

Expected result:
---------------

Actual result:
-------------

Why Propose Close?

None

Activity

Show:
geokala
April 23, 2020, 9:33 AM

I think you'll find that the logs file under /var/log/cloudify/mgmtworker/logs/<deployment id>.log is also filling up.

We probably need to address both of these somehow.

geokala
April 23, 2020, 9:41 AM

Anything we do on this is likely to want to be manual, because otherwise we’ll be doing an extra DB call on every addition of logs to the table.

Therefore, I suggest we should provide something via the rest service that allows deleting of logs older than a given date. This should be something we can backport for the user currently facing the issue.

For the log files, as Jonathan mentioned on Slack, we need to check the logrotate config.

Jonathan Abramsohn
April 23, 2020, 10:18 AM

As discussed on Slack, I don’t think we should leave it customer free will, as most of them tend to forget “cleaning” chores.
If we want to leave it for customers, we should at least notify that DB is getting too big and they should run maintenance commands like this one.

geokala
April 23, 2020, 11:10 AM

The problem is that we don’t currently have any mechanism for running routine maintenance tasks, so either of these automations are probably not great. I’ll think about other options though.

geokala
April 23, 2020, 11:34 AM

After some discussion with , we could probably do something backported to 5.0.5 where we patch in a cron job for the managers which consults a file /etc/cloudify/max_log_and_event_age.conf
That file would just contain a number, which would be the number of days.

Then, the cron would run every day (for example), and if that configuration file exists it would delete any logs and events older than the defined number of days, logging any output to /var/log/cloudify/log_and_event_cleanup_cron.log

This log file would need rotating, and we’d still need to confirm rotation of /var/log/cloudify/mgmtworker/logs

The config file should not be created as part of the patch (so by default logs and events wouldn’t be deleted).

Note that this is just a rough approach but we should design properly if this is prioritised for the near future.

Assignee

Unassigned

Reporter

Jonathan Abramsohn

Severity

High

Target Version

unscrubed

Premium Only

no

Found In Version

4.6

QA Owner

geokala

Bug Type

legacy bug

Customer Encountered

Yes

Customer Name

c240

Release Notes

yes

Priority

None

Priority

Unprioritized
Configure