Right way (or best practices) to do failover

viggy28 · June 17, 2020, 7:23pm

Sorry, if this topic has been already discussed.

I am wondering is there any best practice or guidance on “How to do failover”. Say a cluster of 3 with 1 SR and 1 ASR, we failed over using failkeeper or stopping the master, which promotes the SR to become new master, however, ASR couldn’t follow the newly elected master.

Is pgRewind the only solution to rollback the leading ASR to become SR?

Any suggestions or thoughts are appreciated.

viggy28 · June 18, 2020, 5:44pm

Looks like https://github.com/sorintlab/stolon/issues/17 is the solution which can avoid stolon blowing up all the replicas during a failover.

Of course, getting this merged https://github.com/sorintlab/stolon/issues/601 would be a great first step.