Solved: Dataflow reading multiple text files from cloud st...

Former Community Member · 03-29-2024 03:56 PM

Hello I currently have a website on compute engine, every time a user enters data into it, it saves that customers data in a .txt file and stores it in a cloud storage bucket in a certain folder, I am trying to have dataflow read in files as they are coming in and store them in big query, I got the schema to match but every time I run it, it only reads the file I select in that cloud storage, it doesn't read in multiple and it doesn't detect the event in which a new file is added to that folder.

here is my bq schema:

[
    {
        "name": "fname",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "lname",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "email",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "creditcard",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "date",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "vanilla",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "vanilla_qty",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "chocolate",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "chocolate_qty",
        "type": "STRING",
        "mode": "NULLABLE"
    }
]

and here is how my data is coming in, in a .txt file.

{"fname": "Jane", "lname": "Smith", "email": "jane.smith@example.com", "creditcard": "9876543210987654", "date": "2024-03-27", "vanilla": "yes", "vanilla_qty": 1, "chocolate": "yes", "chocolate_qty": 3}

I have tried the streaming template in dataflow, both of them and I have tried the batch one as well. I tried storing my data in .json file as well and that didn't work. not sure what the issue is.

nceniza

I have replicated the concern and it seems working fine for me every time I add a new .txt file it gets I can query it to BigQuery . When setting up the Job (in console) be sure to follow the pattern:

gs://bucketname/path/*.txt

change txt format depending on your use case

My Schema file content:

{
  "BigQuery Schema": [
   {
        "name": "fname",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "lname",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "email",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "creditcard",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "date",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "vanilla",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "vanilla_qty",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "chocolate",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "chocolate_qty",
        "type": "STRING",
        "mode": "NULLABLE"
    }
  ]
}

Results in BQ:

Values added everytime I added a file.

View solution in original post

nceniza