Cluster exceeding quota quickly!

mubarak1999

Hi,

I am new to Kubernetes and I just created one auto-pilot cluster in GKE. When I submit multiple pods to it (not deployments), it only runs one pod at a time. When clicking on the other pods, I find this error: "0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.".

I tried deleting the auto-pilot cluster and create a standard cluster, but GCP was not able to create it because of insufficient CPU.

I do not know if this is relevant, but when I create one auto-pilot cluster, the quota "Persistent Disk SSD (GB)" becomes nearly full.

I feel like Kubernetes should be able to handle many pods, even without requesting additional quota, so what am I doing wrong?

shannduin

That error usually means that Autopilot is still booting the nodes to hold the other Pods. Starting up a new node especially on an empty cluster takes some time, sometimes up to 7 minutes.

Also, Compute Engine resource quotas still apply to Autopilot nodes. If you don't have enough quota for the CPU and memory needed to run your Pods, the nodes won't get created and the Pods will stay stuck.

So I'd say try:

Waiting for longer to see if the Pods deploy eventually
Checking your Compute Engine quotas to ensure that you have enough resources

mubarak1999

Multiple pods nearly never run at the same time, even if I waited.

I think the problem is the quota, but I am not running many pods, so why does it reach the limit quicky?

This is the quota when nothing is running:

And this is the quota when only 2 pods are submitted (one is running and the other is pending):

shannduin

The node boot disk is defaulted to 100GB from the looks of it (for AP and Standard). So every node gets a 100GB Balanced Persistent Disk attached to it and it uses your quota

mubarak1999

Then how are developers normally able to use Kubernetes to run many pods or create multiple clusters?

That actually brings me to another question, I am using Kubernetes so that my website users can submit scientific calculations to the cloud. Is Kubernetes actually suitable for that?

shannduin

They'll typically have higher Persistent Disk quotas in their projects. I wonder if you could use Cloud Run for your use case though?

mubarak1999

Sometimes a scientific job can take days to weeks. Cloud Run has a time limitation.

Why is the CPU limited to 8 in Google Cloud quotas? If each pod uses one core, how can I run many cores in this case?

garisingh

You can request a quota increase. The default quotas for new accounts are low in order to prevent nefarious uses. But you can request a quota increase via the UI and reasonable increases are approved pretty quickly.

mubarak1999

Normally how much do I increase the CPU to? Assuming I have a lot of users that will submit calculations via my website.

edlouth

I have a similar issue.

I think each node that is spinning up has a large disk associated with it, which is using up all the quota. However I can't find a way to reduce the disk size in the autopilot config.

mubarak1999

Have you tried creating a standard cluster?

edlouth

I am trying that now.

When creating a node pool, in a standard cluster, it is possible to set the disk size manually. Default is 100GB.

mubarak1999

For now, switching to standard cluster solved my problems. I did not need to increase any quota. However, when creating the standard cluster, I needed to change some of the settings during the creation process otherwise google will show me errors related to the quota.

mubarak1999

Well I take that back 😓, it seems the reason the pods ran is not because of the standard cluster, it was because of very low CPU requests for each pod. In other words, the issue is not solved.