Acceptance test for this is system-tests' test_replication[3], but unfortunately this is somewhat random.
What can happen is:
a node starts trying to follow the master
the master dies before the follow process is completed
that node is then elected the new master
a retry of the previous follow runs at the same time or after the promote, breaking that node's nginx config
Relevant log (from "that node") - attached as "cloudify-cluster.log"
Fixed by making sure that when handlers run, they run with the most recent data available.
Instead of retrying many times with the value the handler was originally called with, we refetch between handler calls, so that if the value in consul changes, subsequent retries use the new value.
This means that no longer should a handler be stuck trying to follow a master long after the master has already switched.