Title updated, below is the original description, but given that it can definitely happen, it needs to be not able to happen.
We need to know whether, and under what circumstances, the patroni standby DB can become a synchronous standby.
If it can do so during normal operation, this could lead to significantly increased latency causing problems on the primary site, while obscuring the fact that the only thing keeping the primary site running is the presence of the secondary site.
Therefore, if it can become the synchronous replica then we will need to either document it or fix it so that this cannot happen, depending on the circumstances.
If you have two sites with a DB cluster on each, and patroni standby cluster replication set up:
Site A (primary): DB1A, DB2A, DB3A
Site B (standby; 70ms from Site A): DB1B, DB2B, DB3B
On site A, DB1A is the primary leader, DB2A is the synchronous standby, and DB3A is the async standby.
On site B, DB1B is the standby leader, replicating from DB1A. The other two are async replicas.
By causing DB2A and DB3A to stop running, I have managed to make DB1A treat DB1B as its standby replica.
This will cause excessive lag on writes and disguise the (effectively) failed state of site A.
It is not clear whether this is likely to happen when any other nodes on Site A are still active.
This was determining whether it’s possible for one of the standby DBs to become the synchronous replica, which is apparently possible. I’m not sure how likely it is to happen in normal operation, but it’ll require a bit more code and doc diving to determine that and to find a way to avoid it.
My intent is to deal with this once I have the rest of the clustering work sorted so that I can pass it on to Vladimir while I deal with this.
I’ve now observed this happening while the primary cluster is healthy. This will need to be addressed shortly.
Note: The above observation appears to have been when I was looking at the wrong DB node, so might be a false alarm. This still needs checking.
Confirmed this is still possible.
Unfortunately, it can’t be made impossible, because postgres works on the ‘application’ name, and due to the way we have to set up the clusters, we end up with the same ‘application’ names on both sides, meaning that postgres itself can’t filter the node, even if patroni could.
A new sync standby is selected whenever there is not currently a sync standby (e.g. because the current one disconnected for some reason).
The new sync standby will be the next server in the list of standbies that are connected and currently streaming.
Given the way this works, there’s nothing trivial we can do to avoid this occurring.
We could try restarting the standby side’s connections, but that would essentially be doing deliberate hard-resets of the replication connection on a frequent schedule, meaning that if there were to be any rare errors that could be triggered they would become more common. Therefore, this is probably not the best way to approach it.
It looks like this is something that needs to go into the caveats as something to monitor. In the event that it does occur we can document the need to either change the primary site’s leader DB (under maintenance mode) or re-initialise the standby site.