What is the config proxyTimeout mean

@sgotti

For example:
Set proxyTimeout to 1m the proxy check every 5 seconds, means during the 1m if the proxy check failed will not close the connection?

@snailxr The proxy timeout is explained in the cluster spec doc

It’s the interval where a proxy check must successfully complete at least on time or the proxy will close all connections to the master.

This is the “self fencing” mechanism used by stolon to avoid connections to an unelected old primary (maintaining consistency). The proxy will timeout if for whatever reason it cannot complete a check. Usually this failure is due to errors reading the cluster data and writing its status to the store (i.e. store is down, slow, network partition etc…).

The more you increase the proxy timeout the more the sentinel will wait for all the proxies to be converged or disappear before setting the new proxy address in the cluster data. Usually the default (15s) is quite a saner value and could also be decreases to a smaller timeout (while also decreasing the proxyCheckInterval). Increasing it is usually unuseful and will slow down proxies opening connection to a new elected primary if one of your proxy cannot update its state to the store.

@sgotti
Thanks for the reply!
We use k8s configmap as store. Is the request timeout to k8s apiserver can config?

Sorry but I don’t understand your question, can you detail it?

@sgotti
Here is some logs, request timeout
|.034Z|INFO|cmd/proxy.go:286|proxying to master address|{“address”: “10.244.64.25:5432”}|
|—|---|—|---|—|
|2020-02-24T03:09:53.268Z|INFO|cmd/proxy.go:268|master address|{“address”: “10.244.64.25:5432”}|
|2020-02-24T03:09:53.659Z|INFO|cmd/proxy.go:286|proxying to master address|{“address”: “10.244.64.25:5432”}|
|2020-02-24T03:10:03.778Z|INFO|cmd/proxy.go:346|check function error|{“error”: “cannot get cluster data: failed to get latest version of configmap: Get https://10.96.0.1:443/api/v1/namespaces/collateral/configmaps/stolon-cluster-collateral-postgres-postgresql?timeout=5s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)”}|
|2020-02-24T03:10:14.474Z|INFO|cmd/proxy.go:346|check function error|{“error”: “cannot get cluster data: failed to get latest version of configmap: Get https://10.96.0.1:443/api/v1/namespaces/collateral/configmaps/stolon-cluster-collateral-postgres-postgresql?timeout=5s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)”}|
|2020-02-24T03:10:25.866Z|INFO|cmd/proxy.go:346|check function error|{“error”: “cannot get cluster data: failed to get latest version of configmap: Get https://10.96.0.1:443/api/v1/namespaces/collateral/configmaps/stolon-cluster-collateral-postgres-postgresql?timeout=5s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)”}|
|2020-02-24T03:10:41.870Z|INFO|cmd/proxy.go:346|check function error|{“error”: “cannot get cluster data: failed to get latest version of configmap: Get https://10.96.0.1:443/api/v1/namespaces/collateral/configmaps/stolon-cluster-collateral-postgres-postgresql?timeout=5s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)”}|
|2020-02-24T03:11:02.099Z|INFO|cmd/proxy.go:346|check function error|{“error”: “cannot get cluster data: failed to get latest version of configmap: Get https://10.96.0.1:443/api/v1/namespaces/collateral/configmaps/stolon-cluster-collateral-postgres-postgresql?timeout=5s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)”}|
|2020-02-24T03:11:33.555Z|INFO|cmd/proxy.go:346|check function error|{“error”: “cannot get cluster data: failed to get latest version of configmap: Get https://10.96.0.1:443/api/v1/namespaces/collateral/configmaps/stolon-cluster-collateral-postgres-postgresql?timeout=5s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)”}|
|2020-02-24T03:11:43.789Z|INFO|cmd/proxy.go:346|check function error|{“error”: “cannot get cluster data: failed to get latest version of configmap: Get https://10.96.0.1:443/api/v1/namespaces/collateral/configmaps/stolon-cluster-collateral-postgres-postgresql?timeout=5s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)”}|
|2020-02-24T03:12:00.332Z|INFO|cmd/proxy.go:346|check function error|{“error”: “cannot get cluster data: failed to get latest version of configmap: Get https://10.96.0.1:443/api/v1/namespaces/collateral/configmaps/stolon-cluster-collateral-postgres-postgresql?timeout=5s: context deadline exceeded”}|
|2020-02-24T03:12:18.086Z|INFO|cmd/proxy.go:346|check function error|{“error”: “cannot get cluster data: failed to get latest version of configmap: Get https://10.96.0.1:443/api/v1/namespaces/collateral/configmaps/stolon-cluster-collateral-postgres-postgresql?timeout=5s: dial tcp 10.96.0.1:443: connect: connection refused”}|
|2020-02-24T03:12:23.180Z|INFO|cmd/proxy.go:346|check function error|{“error”: “cannot get cluster data: failed to get latest version of configmap: Get https://10.96.0.1:443/api/v1/namespaces/collateral/configmaps/stolon-cluster-collateral-postgres-postgresql?timeout=5s: dial tcp 10.96.0.1:443: connect: connection refused”}|
|2020-02-24T03:12:28.180Z|INFO|cmd/proxy.go:346|check function error|{“error”: “cannot get cluster data: failed to get latest version of configmap: Get https://10.96.0.1:443/api/v1/namespaces/collateral/configmaps/stolon-cluster-collateral-postgres-postgresql?timeout=5s: dial tcp 10.96.0.1:443: connect: connection refused”}|
|2020-02-24T03:12:38.719Z|INFO|cmd/proxy.go:346|check function error|{“error”: “cannot get cluster data: failed to get latest version of configmap: Get https://10.96.0.1:443/api/v1/namespaces/collateral/configmaps/stolon-cluster-collateral-postgres-postgresql?timeout=5s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)”}|
|2020-02-24T03:12:47.685Z|INFO|cmd/proxy.go:346|check function error|{“error”: “cannot get cluster data: failed to get latest version of configmap: configmaps “stolon-cluster-collateral-postgres-postgresql” is forbidden: User “system:serviceaccount:collateral:collateral-postgres-postgresql” cannot get resource “configmaps” in API group “” in the namespace “collateral””}|
|2020-02-24T03:12:52.690Z|INFO|cmd/proxy.go:268|master address|{“address”: “10.244.64.25:5432”}|
|2020-02-24T03:12:52.699Z|INFO|cmd/proxy.go:286|proxying to master address|{“address”: “10.244.64.25:5432”}|
|2020-02-24T03:12:57.709Z|INFO|cmd/proxy.go:268|master address|{“address”: “10.244.64.25:5432”}|
|2020-02-24T03:12:57.722Z|INFO|cmd/proxy.go:286|proxying to master address|{“address”: “10.244.64.25:5432”}|

@snailxr Is the timeout caused by the k8s api being slow or just unreachable?

This looks like a rbac issue…

@sgotti
2020-01-20T08:56:25.007Z INFO cmd/proxy.go:302 check function error {“error”: “failed to update proxyInfo: update failed: failed to get latest version of pod: Get https://10.233.0.1:443/api/v1/namespaces/xxx/pods/postgres-postgresql-proxy-5d8645f4cc-v2xc9: net/http: request canceled (Client.Timeout exceeded while awaiting headers)”}
2020-01-20T08:56:29.875Z INFO cmd/proxy.go:262 check timeout timer fired
2020-01-20T08:56:29.875Z INFO cmd/proxy.go:149 Stopping listening

Most time is because the k8s api being slow ,we test locally so we want to increase the request timeout

@snailxr currently store request timeout is hardcoded to 5 second which is a lot for a request to the store.

Feel free to open a Pull Request on github to make it configurable but increasing it will only workaround the real issue: your store is too slow.

Also please note possible downsides with k8s store and why we suggest the etcdv3 store: stolon/architecture.md at master ¡ sorintlab/stolon ¡ GitHub

1 Like

@sgotti
Is this the code for request timeout?
DefaultStoreTimeout = 5 * time.Second in cluster.go
I think it should be alway >= proxyTimeout otherwise proxyTimeout will be useless

@snailxr ProxyTimeout and StoreTimeout are different. You’re only thinking about your bad case where you have a too much slow store. Usually a store is quite fast (less then 1 second) and a small StoreTimeout is better and useful for retrying request when they are blocked for some reasons. Additionally store Timeout is also used by keeper and sentinel so it’s not only a Proxy thing.

Plus it depends on the store client. I.E. latest etcd client will block retrying until the timeout if the store is down while other clients will just return the connection error.