Kubernetes store timouts


We are having problems with the kubernetes store. We experience random store timeouts and therefore db disconnects in our cluster. It happens under load but also on random times during the night without any load.
Do you have an idea what could cause the disconnects without any load?

E0420 09:39:49.035064       1 leaderelection.go:331] error retrieving resource lock bms-databases/stolon-cluster-bms-postgres-stolon: Get context deadline exceeded (Client.Timeout exceeded while awaiting headers)

I saw the new --store-timeout parameter in v0.16. I’ll try it out as soon as the parameter is available in the helm chart.


@landor That means that your k8s api are responding slowly. Increasing the store timeout isn’t a real fix since the proxies will timeout anyway after the proxy timeout interval and you’ll have to also increase them. My suggestion is to use a different dedicated store like etcd. See also the doc with all the other downsides of using the k8s api:

@sgotti Thanks for the info.
We are considering moving the kubernetes etcd storage out of the main cluster where our workloads run.
What do you think about that?
Will it also fix our problem or do you think we still need to use a dedicated etcd storage for our stolon instances?
Thank You