Migrated to py3 and Appengine is much more expensive - Am I doing it wrong?

I have a long running app on appengine using py2.7.  In the past I've used the smallest instance class, generally my app spins up about 10 instances to handle loads maxing out around 40requests a second.

Generally this cost me about $11 a day to run.

I've updated my code to py3, using cloud_ndb and a redis memcache.  I noticed the python3 app consumes a lot more memory so I had to up the instance class to F2

On py2.7 my app size was about 107MB.. on py3 it's 475mb (the F2 Instance Class maxes at 768 MB):

dank_0-1701386206102.pngHere's what my 10instances look like

dank_1-1701386387542.png

The instances starts out already consuming a little over 600MB.. over time memory grows and I don't know exactly why until I bump into a memory limit, the instance is killed.  I need to track that down but from my knowledge I'm not doing anything to hold onto memory.. so wondering if there's a memory leak somewhere that I'm not responsible for (the cloud ndb framework or something else)

Additionally these instances are only serving 4-5 requests a second... that seems really low and Id hope that I don't need that many instances to server 40requests a second... my hope was with a higher instance class and python3 threading a single instance could handle much more. Obviously it probably depends on how expensive each request is.  So yes they cost more which probably accounts for my increased costs but they are also faster, more mem, and hoped that fewer instances would be needed.

I'm curious if because I'm so close to the mem limit to start if appengine is leaning on that metric to decide to spin up more instances then maybe it really needs?  I'm bumbed that a forced update to py3 has made things so much more expensive. I'm an indy developer so making the change directly takes a few $1000 out of my pocket to run my server.

Now that I've done this py3 conversion and I no longer rely on the standard env I'm considering moving the server over to something like linode to bring costs down (maybe).  Obviously that's a lot of work potentially and would love to avoid that if I can.

Any advice would be appreciated or if you need additional info 

Thanks!

 

 

3 18 1,615
18 REPLIES 18

Yes, Python 3 Apps are larger than the Python 2 Apps and use more memory. Because of this, Google announced they had increased the memory allocated to all of the instance classes.

Make sure you have  a .gcloudignore file in the root of your application directory. The file should include an entry for your virtual env folder. If you don't do this and you tested your app (ran it locally without using dev_appserver.py) before deploying, then a deploy would also deploy the virtual environment that was created when you ran your App and that increases the size of the deployed App. 

If you used the datastore emulator or stored your test data in your Application directory, make sure to include that folder in the gcloudignore file else the contents will also get deployed increasing the size of your deployed App

Without seeing your code, one can only make guesses as to what is happening. Can you describe what your App does at a very high level ? That might help with tips on how to possibly reduce your memory consumption. Also, can you use Google's memcache instead of Redis memcache?

If you're not already doing this, look into writing/reading data to/from datastore in batches. That can help cut down on the cost. Also look at using ndb.tasklet

 

..... NoCommandLine ......
 https://nocommandline.com
A GUI for Google App Engine
    & Datastore Emulator

Thanks so much for the reply. I did not have a .gcloudignore and have added that.  The size of the app went from 475mb down to 240mb, thanks for that.  That will make deployments faster which is great but doesn't affect the amount of memory the app uses once running on an instance (I wouldn't expect it to).

My app is called Upwords:

https://itunes.apple.com/us/app/upwords-free/id588252565?mt=8

https://play.google.com/store/apps/details?id=com.lonelystarsoftware.upwords

All data is stored in datastore.. user account models, etc.  With each request the rest api sends the user account info so the user record can be retrieved , the client can ask for a list of challenges for the user and send that back.. then when they open a challenge in the app that is sent, then the user plays a move and the challenge record is updated. 

I use memcache a lot to store the challenge list and challenge models.. when a challenge is updated I delete the caches for that user and the opponent so the next time datastore is accessed to refresh the challenge list and cache.  By far my most used end point is "/getChallenges" followed by. "/getChallenge" and "/updateChallenge"

At this moment in time "/getChallenges" sees about 275requests a min while "/getChallenge" sees 121requests a min and "/updateChallenge" seeing 100requests a min.

There are lots of other endpoints for chatting, updating your profile in the challenge, challenging other players etc.

I decided to try and get off the legacy services completely and that's why I spun up a Redis instance.  At the moment I am off all those legacy appengine api's and did a test running my server on a Linode server which talks to datastore, taskqueue etc.. and it appears to work well. .. so this was a plan B if I can't get appengine cheap enough. I'd rather not forklift everything off if I don't have to..  At the moment it's costing 3x what my py2.7 version cost to run.

I did hook up the cloud ndb to Redis but I'm not sure if that's working.. I do it this way

def ndb_wsgi_middleware(wsgi_app):
    def middleware(environ, start_response):
        with ndbClient.context(global_cache=global_cache):
            return wsgi_app(environ, start_response)

    return middleware

 So rather then doing the "with context" throughout the code I saw you can do it once here and all requests get that context.. so far it appears to work, I'm not getting any errors that I'm aware of.  The global_cache is defined like so:

global_cache = ndb.RedisCache.from_environment()

I'm curious if this is less effective then the legacy memcache with ndb or if it works the same.. I could easily move back to the appengine memcache and if I move to Linode or some other external server use Redis then.  Maybe I'll let it run today and see what resources it uses, then tomorrow switch back to the legacy memcache and see if it makes a difference. 

thanks for taking a look!

Daniel

 

The reason I suggested Google's memcache was that I assumed it would be cheaper than a 3rd party product (I have no confirmation of this)

From your description, it looks like your large memory consumption triggers more instances and it already led you to using a more expensive instance class. Focus can thus be on how to reduce your memory consumption.

- Check your code to see if there are places where you don't have to read data into memory but access them when needed e.g. say you retrieve data or run a query, is it possible to use an iterator to access the data when needed instead of converting the entire thing into a list which is then held in memory?

- Can you use projection queries (which means you only return needed fields of an entity instead of returning the entire entity; this is advantageous if your entities are large)

- Check your logs to see if you can figure out which call or process is consuming a lot of memory. You can pick a log entry and view the trace details

 

..... NoCommandLine ......
 https://nocommandline.com
A GUI for Google App Engine
    & Datastore Emulator

Hi there.. thanks for sticking with this. I'm sure there are areas I can use an iterator. I guess I assumed if I load all the models into a list, iterate over etc.. then when the request is done wouldn't that memory get free'd anyway? 

I'll look into the projected queries. My largest model has 55 properties, the data in each is pretty small.. so I don't know if that's considered large.

I "think" there's extra cost associated with talking to datastore.. more reads/writes.. so maybe in my conversion I'm leaning less into memcache but I don't think so.  I'm wondering if in the legacy ndb / memcache more was automatically being cached then hooking redis to the cloud ndb.  All my custom caching is exactly the same as before.. that's the only thing that's changed there.  I'll have to do some tests rolling back to my old py2.7 server and seeing what the costs where..   I may also turn off all my caching and clear the redis cache and see if anything is actually being written there automatically, I haven't confirmed that yet.


Hi,

I'm sorry to hear that your costs have gone up so drastically. I don't have
any insight as to why that may be, but I wanted to say: if you're thinking
of migrating, may I suggest looking at Cloud Run or even App Engine
Flexible environment. Both of these could help you with the cost issue. You
can get a preliminary idea using our pricing calculator
<>.

I hope you're able to get to the root cause, though. If my memory serves me
correctly, App Engine exports a CPU utilization metric; this is definitely
one of the inputs to the scaling decisions that App Engine makes. So
looking at that metric may give you some clues.

- Karolina

Thanks Karolina, I'll take a look at all these options

Keep in mind that App Engine Flex requires that at least 1 instance IS ALWAYS up unlike App Engine Standard which when set to automatic scaling can go down to 0 when there's no instance which in turn saves money (you're not charged when an instance isn't running).

 

..... NoCommandLine ......
 https://nocommandline.com
A GUI for Google App Engine
    & Datastore Emulator

Thanks. Right and that's fine, there's alway someone playing the app.. so it's pretty much never idle.. at least 20requests a sec when it's lighter times of day

Hi dank, I also migrated several apps from py27 to py3. Unlike you, I'm still using bundled services, so I'm not using Redis for memcache nor a newer API to access Datastore. Keep that in mind when you read the following.

I can see the footprint of my apps (when deployed) is many times bigger now (e.g. from 2MB in Py27 to 80MB in Py3), but that didn't seem to affect much the memory the apps consume when running. In fact, my costs now are very similar to what they were before. I would suggest a few things:

1- Check the cost breakdown, so that you can pinpoint what's causing the expense. I don't see that in your question and it will look something like this:

gae-costs.jpg

 

 

 

 

 

 

 

 

 

2- Sometimes weird things happen (e.g. on Nov 19 I had a big spike in Frontend Instances in that App; I think it was because of a bot crawling my site, but I'm not 100% sure). If you're not sure you could send Google feedback right there attaching screenshots or try to get support. It's the question mark (?) on the top-right corner. Earlier this year, my expenses went up in several apps and I could see that it was because of Frontend Instances although I couldn't explaining it because traffic was the same and I hadn't changed anything to warrant it. I did the feedback thing (plus other thing I'll explain below) and a few days later the cost came back to normal and few weeks later I got a refund (credit to my billing account) that I suppose was related to that.

3- If you see that the cost is related to the number of instances you have running (as it seems to be the case), you can control that in your app's app.yaml file. Experiment with the scaling settings (instead of having the automatic default or zero for max_instances) and see what happens:

 

 

automatic_scaling:
  min_idle_instances: automatic
  max_idle_instances: automatic
  min_pending_latency: automatic
  max_pending_latency: automatic
  max_instances: 0

 

 

 

I was able to control things using that when couldn't explain the increase in costs, but I'm not sure that's available in the environment you're using now.

Please, let us know how things go as we might have to start using those APIs at some point...all the best.

 

 

 

 

 

Thanks everyone for taking an interest with suggestions.  I've figured out a few more things, including some errors on my end. A recap of what I've found 

  • Originally on py2.7 I was using the F1 class.. I was running at about $10-12 a day, been running like that for years.  Generally it would spin up 8-10 instances. I was making good use of memcache with about a 95% hit ratio
  • As a test I update to F2 instance class under the py27 It bumped my costs to $18 a day... which adds up over a year but the F1 instances were being killed due to memory after they ran for a bit.  I still am not sure why I'm accumulating memory over time. The F2 instances are also being killed due to memory, just takes a lot longer to happen.  I don't know that it really affects my users if those instances are killed a lot.. so if I could stay on py27 I'd prob continue to run the F1 class as it's a lot cheaper and in practice users don't notice them restarting
  • I updated to py3  (have not choice of course but I do like the flexibility of using any lib I want etc) and the memory requirements don't fit into the F1 class.. I'm curious if any py3 app fit into that memory footprint.  I'm importing pretty standard stuff.  So moving to py3 will cost me more.
  • Initially I reported py3 costs were 3x.. it's been running at like $30-32 a day which is way more then the $11 I started with. I converted off all the legacy services, even memcache to the Redis memcache.  Thought that was a good thing to do so the server could be portable if I wanted... but the costs went way up! 
  • After further inspection noticed while front end instance hour price did go up and accounted for some of the increase the larger increase was Firestore Read Operations.. went from about 3m ops to 35m ops and of course cost a lot more!  That had me inspecting if the cloud.ndb lib was actually using the Redis memcache and it turns out it wasn't.. I had made a mistake and set everything up right except I missed setting the REDIS_CACHE_URL env variable. So while it was memcaching anything I was explicitly putting in there ndb wasn't leveraging it's automatic caching.
  • I fixed that issue along with a few other optimizations and ran a test. Ran with legacy memcache and redis memcache.. a day each.  On the legacy memcache the billing graph shows $17.25 and the next day on Redis it's 16.03. The difference it probably more due to variablilty in my server traffic.. so I'd say they are about the same.  Still larger then my $11 a day but I don't see how to downgrade the instance class so I'll have to see if I can optimize further in other ways.  I'm not clear if the redis cost is included in that number, I don't think it is since it's not called out specifically.. I'm guessing I'll be billed that at the end of the month which is projected to be $35 a month. 
  • Looking at the 1gig Redis memcache after a day it's full, filled up pretty quickly and the hit rate is 95-97%.
  • So I think I have 2 options now:
    • Stick with legacy memcache so I don't have to pay for the Redis instance and the virtual server that connects it. Look at other methods of optimizings my server to see ifI can bring costs down.. looking at the most used and expensive rest endpoints
    • Increase the Redis memcache size.. since it's full and the hit ratio is so high.. maybe a larger memcache would bring down costs even more?.. I don't know if that's enough to offset the cost of a larger memcache.  I was going to try a 2gig memcache which would be double.. about $70 a month.  I imagine it would futher bring down the datastore reads and the instance hours as there'd be less processing if more is found in the cache.

That's where I'm at, I appreciate all the pointers.  I'll look at some of the other stratagies (projection queries, taskletts, using iterators rather then loading all the models into a list.. seeing where else I can memcache),  after this test, want to keep it the same so I have apples to apples testing for the moment and just experiment with memcache.

 

Thanks again for the help! Glad the biggest issue was a mistake on my end.. I had the Redis env variables for using it generally but was surprised I needed an additional one for ndb

Well to wrap this up. After a lot of testing and optimizations I went from about $18 a day using the F2 class in py3 down to $10!  That includes  paying for a 1 gig Redis memcache. My app's cache hit rate is way up around 95% so I added caching and more efficient caching wherever I could. This cost is at or a little below what I was seeing in py2.7 with the lowest F1 instance class.

I did add a bool in my code to switch back to the legacy memcache to see how it performs that way.. that would likely bring the cost down to around $8 if it's as good as the 1gig Redis. But for some reason I'm seeing errors when I switch back to legacy api's.  I have a top level variable "USE_LEGACY_MEMCACHE"  and the app switches imports and function calls to legacy or Redis if it's off.

if USE_LEGACY_MEMCACHE:

from google.appengine.api import memcache
from google.appengine.ext import ndb
from google.appengine.api import wrap_wsgi_app

REDIS_ACTIVE = False

else:

from google.cloud import ndb
import redis

#Memcache
REDIS = redis.Redis(host=REDIS_HOST,port=REDIS_PORT)

#Data Store
ndbClient = ndb.Client()
global_cache = ndb.RedisCache.from_environment()

Then wrapped my get and set memcache to respect that variable to switch between the types. But I'm getting errors that aren't clear to me what's going on.  It's erroring out when trying to .get() a key from the memcache. When I first set this up it worked great, they iterated on my app for a while and now I can't switch back to legacy.  Going to set it down for the moment, happy with the progress so far and will look at this another time.

Thanks

D

When you use the legacy memcache, did you remember to wrap your WSGI object and also set app_engine_apis: true in your app.yaml file? See documentation here

 

..... NoCommandLine ......
 https://nocommandline.com
A GUI for Google App Engine
    & Datastore Emulator

I did remember to do both of those things.  I haven't gotten round to debugging further though.. my app has been running at $9-10 a day with the Redis cost included which is a little cheaper than pre py3.. So for the moment it's not an issue.  At some point I'll look again as I may be able to shave a couple dollars a day if I can get legacy working again.  I do like that the app is more portable now and easier to test locally without the legacy apis

I have also incurred a similar issue


We had our application, on App Engine standard environment, GEN1 , developed on runtime, python 2.7 and webapp2 framework, deployed on google app engine.

Currently, we're in the migration process to update our framework from Webapp2 to flask and effectively, we're updating our python runtime as well, from python27 to python 310.

We've migrated the code base, however, we're facing some unexpected issues:

  • With webapp2, the google app engine build size raised upto 140 -160 MB, but when after migration to flask, python 310, the size rose upto, 480-500 MB..

  • Upon debugging, we found out, it's the same size that the project is engulfing locally on the machine.. i mean did webapp2 apply compression while deploying of app engine? And Flask isn't applying such compression?

  • Now, for all my F1 instance class declared, the build is deployed successfully, but my front end does not load, it's because my build size is about 480MB, greater than that of the F1 memory limit i.e 384 MB in the second gen runtime for the App engine standard environment.

  • We did few experiments to evaluate this build size rise i.e we checked adding requirements.txt in .gcloudignore that reduced size to 200 MB, and moreover, adding static folder, further reduced 136 MB. I mean we cannot add these in the .gcloudignore as these are required.

Also for example, if i go live with such build size, if i have 20 instances let's suppose, on an F2 instance, when will it spin another instance, using automatic scaling, and in return have an affect on billing.

A few things I learned.. I've actually optimized enough where now it's cheaper on py3 then py2.7.. but in part that may be the nature of my app

  • My app has a lot of repeated calls where my db doesn't change at all.. as a result I just leaned heavier into memcache.  My cache hit rate is like 96%. I actually now pay for a Redis memcache and even with that cost I'm running cheaper.  I'd like to try and go back to the legacy free memcache but I'm seeing lots of errors when I go back.. so haven't tackled that yet
  • The build size is different then the memory size of course.. My build size is bigger but I don't think that's a big deal, but I was also seeing my instance memory usage skyrocket when I went to py3.  That caused me to have to jump to F2 which is double the cost of F1 and even then I didn't have a lot of headroom and instances kept getting killed due to hitting the mem limit.  My app doesn't need the extra horsepower, but now it needed that memory as I wasn't fitting into the memory limits of F1
  • On Another thread (maybe you authored this or you saw this) mentioned that by default the py3 app will spin up like 8 gunicorn workers to manage traffic.. but you can tune that in the app.yaml.  I added this line to my app.yaml:

entrypoint: gunicorn -b :$PORT -w 2 main:app

 This Dramatically reduced the memory required to run my app.. from like 700mb down to 220mb.. this allowed me to go back to F1 and save a lot of $.  the -w 2 flag specifies how many workers.. I guess the more you have the more are in memory. I am not noticing any difference in the performance of my app since I added this.  Again I'm sure it's all highly dependent on your app's needs.

As an additional optimization, you can use the `--preload` flag, which loads the app first before forking the work processes. This can save a bit of additional instance memory, but it depends on the app. For us this saves about ~40mb per instance:

entrypoint: gunicorn --bind=:$PORT --workers=2 --preload main:app

The migration from Python 2 to Python 3 and the changes in cost on App Engine might be influenced by various factors. It's essential to carefully examine your application, usage patterns, and the pricing model changes between Python 2 and Python 3 on App Engine.

Here are some aspects to consider:

  1. App Engine Pricing Changes: App Engine pricing can be affected by various factors such as instance class, request/response sizes, and data storage. Google Cloud Platform occasionally updates its pricing models, and these changes might impact your overall costs.

  2. Instance Class and Scaling: Different instance classes have different pricing. Review the instance classes you're using and consider if adjustments can be made based on your application's requirements. Additionally, examine how your application scales, as this can impact the number and type of instances running.

  3. Resource Utilization: Python 3 might have different resource utilization patterns compared to Python 2. Assess whether your application's resource usage (CPU, memory) has changed significantly after the migration.

  4. App Engine Flex vs. Standard: Depending on your application's requirements, you might be using either the standard environment or the flexible environment on App Engine. These environments have different pricing models, and your choice might impact costs.

  5. Third-Party Libraries and Dependencies: Ensure that all your third-party libraries and dependencies are compatible with Python 3. Sometimes, the changes in libraries or the need to use alternative libraries in Python 3 can impact the performance or resource usage of your application.

  6. Monitoring and Optimization: Regularly monitor your application's performance and resource usage using Google Cloud Monitoring or other tools. This can help identify areas for optimization and potential cost savings.

  7. Google Cloud Credits and Billing Support: If you have Google Cloud credits, make sure to check your billing support options. Google Cloud offers billing support to help customers understand and manage their costs.

  8. Consult Google Cloud Documentation and Support: Review the official Google Cloud documentation for the specific services you're using. Additionally, consider reaching out to Google Cloud Support for assistance. They can provide insights into your specific situation and offer guidance on optimizing costs.

Before making significant changes, it's crucial to thoroughly analyze the factors mentioned above and potentially consult with your development and operations teams to ensure that you are optimizing both the performance and cost aspects of your application on App Engine.

ChatGPT much?