Hello everybody... I am struggling for over a week now to make my cross-namespace httproutes working.
I have the following setup (the very same as the official examples):
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: mgmt-internal-gw
namespace: gateway
annotations:
cert-manager.io/issuer: cf-issuer
spec:
gatewayClassName: gke-l7-rilb
listeners:
- name: http-listener
port: 80
protocol: HTTP
hostname: "my.domain.com"
- name: https-listener
port: 443
protocol: HTTPS
hostname: "my.domain.com"
allowedRoutes:
namespaces:
from: All
tls:
mode: Terminate
certificateRefs:
- name: internal-wildcard-cert
kind: Secret
group: ""
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: http-filter-redirect
namespace: gateway
spec:
parentRefs:
- name: mgmt-internal-gw
sectionName: http-listener
hostnames:
- "my.domain.com"
rules:
- filters:
- type: RequestRedirect
requestRedirect:
scheme: https
statusCode: 301
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: argocd-route
namespace: system-argocd
spec:
parentRefs:
- name: mgmt-internal-gw
namespace: gateway
sectionName: https-listener
hostnames:
- "my.domain.com"
rules:
- matches:
- path:
value: /
type: PathPrefix
backendRefs:
- name: argocd-argo-cd-server
port: 8080
I can see all routes and the gateway itself healthy in GCP console. When I try to reach my domain, I got 503 "no healthy upstream". I confirm that service inside (ArgoCD) is working pretty fine - i can access with port-forward or from another pod via curl. If I deploy argocd (or any other service) within the same namespace and have the httproute + gateway + service within the same namespace, all works well. I have also tried to create referencegrants but still the same. I am using autopilot, I have tried with versions 27,28,29. The cluster is all private if that matters...
Do you guys have any suggestion what could be missing here in my setup?
Solved! Go to Solution.
Hello All, I am passing by to tell you that I found the solution myself. The solution is not described in any documentation (dont know why....). If someone here can help me how to give GCP feedback so the documents can be updated?
The solution:
- the exposed services require NodePort, not ClusterIP like in the documentation... Otherwise, the backend healthchecks do not pass. There is even no meaningful alert in GCP UI when browsing the gateway or the routes. You need to go to the services, click on the desired service you want to expose, go to backends, expand advanced options, and then you will see a small alert that the backend cannot access the service... Really annoying.
- the HealthCheckPolicy, GCPBackendPolicy and GCPGatewayPolicy are NOT required! (but ofc they are best practice)
- the ReferenceGrant is not needed at all.
good luck everyone!
Hi @adimitrov,
Welcome to the Google Cloud Community!
It may be challenging to troubleshoot this issue without more visibility but If I understand your issue correctly, you are receiving 503 errors, "no healthy upstream" on your GKE.
HTTP 503 Service Unavailable, no health upstream means that either there are no hosts available to serve the traffic or all hosts have failed the backend health checks.
HealthCheckPolicy
, and GCPBackendPolicy
resources must exist in the same namespace as the target Service
or ServiceImport
resource. You might want to checking out this documentation.
Add the Health check configuration on your YAML file as this could ensure that traffic is only routed to servers that are able to respond effectively.
Here's a sample YAML that is available on the documentation provided.
apiVersion: networking.gke.io/v1
kind: HealthCheckPolicy
metadata:
name: lb-healthcheck
namespace: lb-service-namespace
spec:
default:
checkIntervalSec: INTERVAL
timeoutSec: TIMEOUT
healthyThreshold: HEALTHY_THRESHOLD
unhealthyThreshold: UNHEALTHY_THRESHOLD
logConfig:
enabled: ENABLED
config:
type: PROTOCOL
httpHealthCheck:
portSpecification: PORT_SPECIFICATION
port: PORT
portName: PORT_NAME
host: HOST
requestPath: REQUEST_PATH
response: RESPONSE
proxyHeader: PROXY_HEADER
httpsHealthCheck:
portSpecification: PORT_SPECIFICATION
port: PORT
portName: PORT_NAME
host: HOST
requestPath: REQUEST_PATH
response: RESPONSE
proxyHeader: PROXY_HEADER
grpcHealthCheck:
grpcServiceName: GRPC_SERVICE_NAME
portSpecification: PORT_SPECIFICATION
port: PORT
portName: PORT_NAME
http2HealthCheck:
portSpecification: PORT_SPECIFICATION
port: PORT
portName: PORT_NAME
host: HOST
requestPath: REQUEST_PATH
response: RESPONSE
proxyHeader: PROXY_HEADER
targetRef:
group: ""
kind: Service
name: lb-service
Make sure you review the restrictions and limitations before deploying GKE resources. Also you might want consider filing a ticket with our Google Support team as they are well-equipped to handle issues like these.
Hope you find this helpful.
Hello RFelizardo,
I do appreciate your help here 🙂 As per your suggestion and all the things described in the documentation, I have also created the Health check policy, backend policy and gateway policy. Here are the specs:
---
apiVersion: networking.gke.io/v1
kind: HealthCheckPolicy
metadata:
name: lb-healthcheck
namespace: system-argocd
spec:
default:
checkIntervalSec: 5
timeoutSec: 5
healthyThreshold: 3
unhealthyThreshold: 3
logConfig:
enabled: true
config:
type: HTTPS
# httpHealthCheck:
# port: 8080
# requestPath: /
httpsHealthCheck:
port: 8443
requestPath: /
targetRef:
group: ""
kind: Service
name: argocd-argo-cd-server
---
apiVersion: networking.gke.io/v1
kind: GCPBackendPolicy
metadata:
name: my-backend-policy
namespace: system-argocd
spec:
default:
sessionAffinity:
type: CLIENT_IP
targetRef:
group: ""
kind: Service
name: argocd-argo-cd-server
---
apiVersion: networking.gke.io/v1
kind: GCPGatewayPolicy
metadata:
name: my-gateway-policy
namespace: default
spec:
default:
allowGlobalAccess: true
targetRef:
group: gateway.networking.k8s.io
kind: Gateway
name: mgmt-internal-gw
Unfortunately the behavior is still the same. I still got 503...
I do not think it is a problem of insufficient nodes because as I mentioned, if I do a port-forwarding or move the service + pods within the same namespace as the Gateway, everything works pretty fine.
One interesting thing is that I have tried to setup my own vanilla Kubernetes deployed on Compute VM's and tested out the cross-namespace scenario, and it seems that it is working. So I guess it is kind of a GKE thing... It becomes really frustrating as GKE promises to be the easiest and slickest way to consume Kubernetes, however...
Gonna try with the support, fingers crossed solution comes from their end.
Hello All, I am passing by to tell you that I found the solution myself. The solution is not described in any documentation (dont know why....). If someone here can help me how to give GCP feedback so the documents can be updated?
The solution:
- the exposed services require NodePort, not ClusterIP like in the documentation... Otherwise, the backend healthchecks do not pass. There is even no meaningful alert in GCP UI when browsing the gateway or the routes. You need to go to the services, click on the desired service you want to expose, go to backends, expand advanced options, and then you will see a small alert that the backend cannot access the service... Really annoying.
- the HealthCheckPolicy, GCPBackendPolicy and GCPGatewayPolicy are NOT required! (but ofc they are best practice)
- the ReferenceGrant is not needed at all.
good luck everyone!