I’m in a bit of trouble … I’m experiencing this startup error on one of my clusters:
2020-07-06 11:24:56.299812 [keeper.go:143] W | keeper: password file /etc/secrets/stolon/password permissions 01000000777 are too open. This file should only be readable to the user executing stolon! Continuing... 2020-07-06 11:24:56.300334 [keeper.go:1159] I | keeper: id: 79ffcec1 2020-07-06 11:24:56.300362 [keeper.go:1162] I | keeper: running under kubernetes. time="2020-07-06T11:24:56Z" level=info msg="Stopping database" 2020-07-06 11:24:56.465437 [keeper.go:421] E | keeper: error getting pgstate: error getting pg state 2020-07-06 11:24:56.499052 [keeper.go:743] I | keeper: current pg state: master 2020-07-06 11:24:56.499083 [keeper.go:768] I | keeper: our cluster requested state is master time="2020-07-06T11:24:56Z" level=info msg="Starting database" pg_ctl: another server might be running; trying to start server anyway waiting for server to start....2020-07-06 11:24:56.671 UTC : [1-1] user=,db= LOG: database system was interrupted while in recovery at log time 2020-05-04 01:58:36 UTC 2020-07-06 11:24:56.671 UTC : [2-1] user=,db= HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2020-07-06 11:24:56.757 UTC : [3-1] user=,db= WARNING: recovery command file "recovery.conf" specified neither primary_conninfo nor restore_command 2020-07-06 11:24:56.757 UTC : [4-1] user=,db= HINT: The database server will regularly poll the pg_xlog subdirectory to check for files placed there. 2020-07-06 11:24:56.757 UTC : [5-1] user=,db= LOG: entering standby mode 2020-07-06 11:24:56.761 UTC : [6-1] user=,db= LOG: consistent recovery state reached at 4/4F34DC20 2020-07-06 11:24:56.761 UTC : [7-1] user=,db= LOG: record with zero length at 4/4F34DC20 2020-07-06 11:24:56.762 UTC : [1-1] user=,db= LOG: database system is ready to accept read only connections
Seems I have to get the db in sync somehow. I’m not that familiar with Stolon, even though I do have some PGSQL experience.
Any pointers regarding how to re-sync the nodes and repair the index?