Logs entries on the database are deleted only when deployment gets deleted.
Assuming the following use case:
A deployment that runs the install WF every minute, deployment never gets deleted and so are logs, after 6 months of running the logs table is 10GB and operations is 6GB
Assuming it will continue to run for another 6 months we will get to 32GB only for these tables.
I think, our design assumed deployments are getting deleted, or idle, while it should have considered, day two operations.
We should archive old logs of a deployment, after a certain amount of time, that could be configured by the customer, that is, assuming the archive won't be part of the working database and won't affect its performance.
Environment:
OS (CLI), HA cluster, cloud provider
------------------------------------
Steps to reproduce:
------------------
1.
2.
3.
Expected result:
---------------
Actual result:
-------------
I think you'll find that the logs file under /var/log/cloudify/mgmtworker/logs/<deployment id>.log is also filling up.
We probably need to address both of these somehow.
Anything we do on this is likely to want to be manual, because otherwise we’ll be doing an extra DB call on every addition of logs to the table.
Therefore, I suggest we should provide something via the rest service that allows deleting of logs older than a given date. This should be something we can backport for the user currently facing the issue.
For the log files, as Jonathan mentioned on Slack, we need to check the logrotate config.
As discussed on Slack, I don’t think we should leave it customer free will, as most of them tend to forget “cleaning” chores.
If we want to leave it for customers, we should at least notify that DB is getting too big and they should run maintenance commands like this one.
The problem is that we don’t currently have any mechanism for running routine maintenance tasks, so either of these automations are probably not great. I’ll think about other options though.
After some discussion with , we could probably do something backported to 5.0.5 where we patch in a cron job for the managers which consults a file /etc/cloudify/max_log_and_event_age.conf
That file would just contain a number, which would be the number of days.
Then, the cron would run every day (for example), and if that configuration file exists it would delete any logs and events older than the defined number of days, logging any output to /var/log/cloudify/log_and_event_cleanup_cron.log
This log file would need rotating, and we’d still need to confirm rotation of /var/log/cloudify/mgmtworker/logs
The config file should not be created as part of the patch (so by default logs and events wouldn’t be deleted).
Note that this is just a rough approach but we should design properly if this is prioritised for the near future.