Syncthing stops sync the cluster_statuses folder after the DB is down

Description

First, we have a healthy cluster running.
I stop the patroni service on 2 of the DB nodes to simulate DB cluster failure.
Then I start the patroni service on those 2 DB nodes so the DB cluster will be healthy again.
The problem is that now the Syncthing service stops syncing the cluster_statuses folder after the DB is down.
The cluster_statuses folder includes that status report files of all the nodes and should be sync among the managers.
It cause weird behaviour of the GET cluster-status endpoint.

Steps to Reproduce

Environment:
OS (CLI), HA cluster, cloud provider
------------------------------------

Steps to reproduce:
------------------
1.
2.
3.

Expected result:
---------------

Actual result:
-------------

Why Propose Close?

None

Activity

Show:
Barak Azulay
January 27, 2020, 4:18 PM

any idea ? thought about the connection between DB cluster down and syncthing ?

geokala
January 28, 2020, 9:45 AM

OK, so from discussion, this is unlikely to be triggered in the wild. It is caused by:

  1. syncthing gets upset when it sees a file being changed that it is currently syncing.

  2. When patroni (but not etcd) is stopped on two nodes of the DB cluster, we will still have a leader DB up and running, because patroni can still get a leader lock on etcd.

  3. This means that each status reporter will attempt to put its data on the restservice, which will hang waiting to write the last-login-time.

  4. Then, when the DB is brought back up, the last attempted update will be sent, immediately followed by the following one.

  5. This will cause syncthing to see updates on a file it is currently syncing, breaking replication on that directory.

 

Barak Azulay
January 29, 2020, 4:19 PM

after merging into master , please merge it to 5.0.5-build branch as well, and only than move to FIXED

Inbal Amrani
February 2, 2020, 10:42 AM

From our discussion, the problem was that the use-case the test checked it not realistic.
So I’m closing this issue and working on fixing the test in:

Assignee

Inbal Amrani

Reporter

Inbal Amrani

Labels

None

Severity

Medium

Target Version

5.0.5

Premium Only

yes

Found In Version

5.0

QA Owner

None

Bug Type

new feature bug

Customer Encountered

No

Customer Name

None

Release Notes

no

Priority

None

Priority

Blocker
Configure