Cluster exceeding quota quickly!

Hi,

I am new to Kubernetes and I just created one auto-pilot cluster in GKE. When I submit multiple pods to it (not deployments), it only runs one pod at a time. When clicking on the other pods, I find this error: "0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.".

I tried deleting the auto-pilot cluster and create a standard cluster, but GCP was not able to create it because of insufficient CPU.

I do not know if this is relevant, but when I create one auto-pilot cluster, the quota "Persistent Disk SSD (GB)" becomes nearly full.

I feel like Kubernetes should be able to handle many pods, even without requesting additional quota, so what am I doing wrong?

3 13 413
13 REPLIES 13

That error usually means that Autopilot is still booting the nodes to hold the other Pods. Starting up a new node especially on an empty cluster takes some time, sometimes up to 7 minutes. 

Also, Compute Engine resource quotas still apply to Autopilot nodes. If you don't have enough quota for the CPU and memory needed to run your Pods, the nodes won't get created and the Pods will stay stuck.

So I'd say try:

  • Waiting for longer to see if the Pods deploy eventually
  • Checking your Compute Engine quotas to ensure that you have enough resources

Multiple pods nearly never run at the same time, even if I waited.

I think the problem is the quota, but I am not running many pods, so why does it reach the limit quicky?

This is the quota when nothing is running:

mubarak1999_0-1715185825726.png

And this is the quota when only 2 pods are submitted (one is running and the other is pending):

mubarak1999_1-1715186013856.png

 

The node boot disk is defaulted to 100GB from the looks of it (for AP and Standard). So every node gets a 100GB Balanced Persistent Disk attached to it and it uses your quota

Then how are developers normally able to use Kubernetes to run many pods or create multiple clusters?

That actually brings me to another question, I am using Kubernetes so that my website users can submit scientific calculations to the cloud. Is Kubernetes actually suitable for that?

They'll typically have higher Persistent Disk quotas in their projects. I wonder if you could use Cloud Run for your use case though?

Sometimes a scientific job can take days to weeks. Cloud Run has a time limitation.

Why is the CPU limited to 8 in Google Cloud quotas? If each pod uses one core, how can I run many cores in this case?

mubarak1999_0-1715195743661.png

You can request a quota increase.  The default quotas for new accounts are low in order to prevent nefarious uses.   But you can request a quota increase via the UI and reasonable increases are approved pretty quickly.

Normally how much do I increase the CPU to? Assuming I have a lot of users that will submit calculations via my website.

I have a similar issue.

I think each node that is spinning up has a large disk associated with it, which is using up all the quota.  However I can't find a way to reduce the disk size in the autopilot config.

Have you tried creating a standard cluster?

I am trying that now.

When creating a node pool, in a standard cluster, it is possible to set the disk size manually. Default is 100GB.

For now, switching to standard cluster solved my problems. I did not need to increase any quota. However, when creating the standard cluster, I needed to change some of the settings during the creation process otherwise google will show me errors related to the quota.

Well I take that back 😓, it seems the reason the pods ran is not because of the standard cluster, it was because of very low CPU requests for each pod. In other words, the issue is not solved.

Top Labels in this Space