Replication Error

I had this problem starting 2 keeper. After init, stolon-keeper-0 doesn’t seem to run.
Any idea what went wrong? Running on K8s 1.17.3 with static persistent volume.

/usr/local/bin/stolonctl -y --cluster-name=kube-stolon --store-backend=kubernetes --kube-resource-kind=configmap init

stolon-keeper-1

2020-04-01T02:38:53.389Z INFO cmd/keeper.go:1033 our keeper data is not available, waiting for it to appear
2020-04-01T02:38:58.398Z INFO cmd/keeper.go:1094 current db UID different than cluster data db UID {“db”: “”, “cdDB”: “f4ef814a”}
2020-04-01T02:38:58.399Z INFO cmd/keeper.go:1101 initializing the database cluster
The files belonging to this database system will be owned by user “stolon”.
This user must also own the server process.

The database cluster will be initialized with locale “en_US.utf8”.
The default database encoding has accordingly been set to “UTF8”.
The default text search configuration will be set to “english”.

Data page checksums are disabled.

creating directory /stolon-data/postgres … ok
creating subdirectories … ok
selecting dynamic shared memory implementation … posix
selecting default max_connections … 100
selecting default shared_buffers … 128MB
selecting default time zone … Etc/UTC
creating configuration files … ok
running bootstrap script … ok
performing post-bootstrap initialization … ok
initdb: warning: enabling “trust” authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
–auth-local and --auth-host, the next time you run initdb.
syncing data to disk … ok
Success. You can now start the database server using:

pg_ctl -D /stolon-data/postgres -l logfile start

2020-04-01T02:38:59.283Z INFO postgresql/postgresql.go:319 starting database
2020-04-01 02:38:59.299 UTC [64] LOG: starting PostgreSQL 12.2 (Debian 12.2-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2020-04-01 02:38:59.299 UTC [64] LOG: listening on IPv4 address “192.168.175.116”, port 5432
2020-04-01 02:38:59.301 UTC [64] LOG: listening on Unix socket “/tmp/.s.PGSQL.5432”
2020-04-01 02:38:59.328 UTC [65] LOG: database system was shut down at 2020-04-01 02:38:59 UTC
2020-04-01 02:38:59.336 UTC [64] LOG: database system is ready to accept connections
2020-04-01T02:38:59.523Z INFO cmd/keeper.go:1160 setting roles
2020-04-01T02:38:59.523Z INFO postgresql/postgresql.go:571 setting superuser password
2020-04-01T02:38:59.532Z INFO postgresql/postgresql.go:575 superuser password set
2020-04-01T02:38:59.532Z INFO postgresql/postgresql.go:578 creating replication role
2020-04-01T02:38:59.542Z INFO postgresql/postgresql.go:588 replication role created {“role”: “repluser”}
2020-04-01T02:38:59.545Z INFO postgresql/postgresql.go:384 stopping database
2020-04-01 02:38:59.547 UTC [64] LOG: received fast shutdown request
2020-04-01 02:38:59.548 UTC [64] LOG: aborting any active transactions
2020-04-01 02:38:59.551 UTC [64] LOG: background worker “logical replication launcher” (PID 71) exited with exit code 1
2020-04-01 02:38:59.551 UTC [66] LOG: shutting down
2020-04-01 02:38:59.570 UTC [64] LOG: database system is shut down
waiting for server to shut down… done
server stopped
2020-04-01T02:38:59.693Z INFO cmd/keeper.go:1457 our db requested role is master
2020-04-01T02:38:59.708Z INFO postgresql/postgresql.go:319 starting database
2020-04-01 02:38:59.718 UTC [84] LOG: starting PostgreSQL 12.2 (Debian 12.2-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2020-04-01 02:38:59.718 UTC [84] LOG: listening on IPv4 address “192.168.175.116”, port 5432
2020-04-01 02:38:59.720 UTC [84] LOG: listening on Unix socket “/tmp/.s.PGSQL.5432”
2020-04-01 02:38:59.748 UTC [85] LOG: database system was shut down at 2020-04-01 02:38:59 UTC
2020-04-01 02:38:59.756 UTC [84] LOG: database system is ready to accept connections
2020-04-01T02:38:59.925Z INFO cmd/keeper.go:1493 already master
2020-04-01T02:38:59.962Z INFO cmd/keeper.go:1626 postgres parameters not changed
2020-04-01T02:39:05.068Z INFO cmd/keeper.go:1457 our db requested role is master
2020-04-01T02:39:05.072Z INFO cmd/keeper.go:1493 already master
2020-04-01T02:39:05.103Z INFO cmd/keeper.go:1626 postgres parameters not changed
2020-04-01T02:39:05.104Z INFO cmd/keeper.go:1653 postgres hba entries not changed
2020-04-01 02:39:09.415 UTC [121] WARNING: skipping special file “./postgresql.auto.conf”
2020-04-01 02:39:09.426 UTC [122] ERROR: replication slot “stolon_ddb8d3d0” does not exist
2020-04-01T02:39:10.191Z INFO cmd/keeper.go:1457 our db requested role is master
2020-04-01T02:39:10.192Z INFO cmd/keeper.go:1493 already master
2020-04-01T02:39:10.203Z INFO cmd/keeper.go:961 creating replication slot {“slot”: “stolon_ddb8d3d0”}

stolon-keeper-0

2020-04-01T02:38:39.027Z INFO cmd/keeper.go:1001 no cluster data available, waiting for it to appear
2020-04-01T02:38:44.033Z INFO cmd/keeper.go:1001 no cluster data available, waiting for it to appear
2020-04-01T02:38:49.043Z INFO cmd/keeper.go:1033 our keeper data is not available, waiting for it to appear
2020-04-01T02:38:54.051Z INFO cmd/keeper.go:1033 our keeper data is not available, waiting for it to appear
2020-04-01T02:38:59.058Z INFO cmd/keeper.go:1039 no db assigned
2020-04-01T02:39:04.069Z INFO cmd/keeper.go:1039 no db assigned
2020-04-01T02:39:09.080Z INFO cmd/keeper.go:1094 current db UID different than cluster data db UID {“db”: “”, “cdDB”: “ddb8d3d0”}
2020-04-01T02:39:09.080Z INFO cmd/keeper.go:1248 resyncing the database cluster
2020-04-01T02:39:09.108Z INFO cmd/keeper.go:1273 database cluster not initialized
2020-04-01T02:39:09.131Z INFO cmd/keeper.go:878 syncing from followed db {“followedDB”: “f4ef814a”, “keeper”: “keeper1”}
2020-04-01T02:39:09.132Z INFO postgresql/postgresql.go:964 running pg_basebackup
pg_basebackup: initiating base backup, waiting for checkpoint to complete
WARNING: skipping special file “./postgresql.auto.conf”
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
0/24525 kB (0%), 0/1 tablespace (/stolon-data/postgres/backup_label )
pg_basebackup: error: could not send replication command “START_REPLICATION”: ERROR: replication slot “stolon_ddb8d3d0” does not exist

@myazid

If you refer to this error:

pg_basebackup: error: could not send replication command “START_REPLICATION”: ERROR: replication slot “stolon_ddb8d3d0” does not exist

It should work after the replication slot is created on the primary here:

2020-04-01T02:39:10.203Z INFO cmd/keeper.go:961 creating replication slot {“slot”: “stolon_ddb8d3d0”}

This is normal since stolon is a distributed system.

The standby logs repeated with the same error continuously.

pg_basebackup: error: could not send replication command “START_REPLICATION”: ERROR: replication slot “stolon_ddb8d3d0” does not exist

@myazid then there’s something really strange in your setup since the logs you provided says that the replication slot has been created, look like the standby is connecting to another postgres instance.

Please double check you configuration or provide a way to reproduce it.