how do i configure/ find current configuration of GCP dataflow runners

Hi,

In one of the projects we have at work, i have jobs i can see under cloud dataflow. these jobs are triggered by the scheduler -> pubsub topic -> cloud function..

Currently i get a deprecation notice about the Apache Beam SDK version (the runners use 2.14.0 while the latest is 2.31.0 or so)

Since i'm new to GCP, i cant find how were the runners configured and how can i update to latest Apache SDK. i can see that the cloud function states the number of workers by providing it the a dataflow Object (the function is written in NodeJS) with "maxNumWorkers" ,not sure if this is a specific attribute we use or is it part of google's API.

Would love some help here 🙂

Thanks in advance

Sivan

p.s: these are part of my dataflow jobs

Screen Shot 2021-09-01 at 10.50.43.png

0 5 752
5 REPLIES 5

glen_yu
Google Developer Expert
Google Developer Expert

Hi!

 

So if I'm understanding this correctly, your Dataflow jobs are triggered by your Cloud Function?  So that would mean you're running Python...the SDK version is based on whatever's in your Cloud Function.  I would assume there's a requirements.txt file in your Cloud Functions Dataflow job definition and in it, the SDK ("apache_beam" or "apache_beam[gcp]") is specified to be 2.14.0.  If you update that to your desired version, I think that will resolve your issue with the SDK.

Thanks for taking the time to answer. 🙂

Actually the fucntion is written in NodeJS. and the package.json does not seem to include the SDK version. 😞

That's odd because the Apache Beam SDK is only available in Java or Python...

 

Without divulging too much about the Cloud Function, how does it interact with DataFlow?  REST API calls?  Maybe the API version it's using corresponds to an SDK version in the UI?  I don't really know, I'm just guessing.

 

In any case, perhaps you can try adding a small change (comment or something?) to your NodeJS Cloud Function and see what happens when it rebuilds.  If there's some implied dependency and it just pulls the latest SDK package at time of build, that would explain why it's on 2.14.0 as it was the latest at that time so rebuilding might have it pull the new latest which is 2.32.0.

So after some digging i can say that the cloud function is using a dataflow template (which i still dont know how to generate/edit). this template contains, among other things, the Apache Beam SDK version

"userAgent" : "Apache_Beam_SDK_for_Java/2.14.0",

You still have a good point about how come my NodeJS runtime can be used with Beam while the only supported languages are Java/Python/Go. but i feel i'm getting closer, thanks to your answers 🙂

 

As  for your suggestion, editing the function does not cause "refresh" of Beam version, probably since its using the same dataflow template.

Hi, Were you able to resolve this issue. Would be helpful if you can share more details on this.