Hello I currently have a website on compute engine, every time a user enters data into it, it saves that customers data in a .txt file and stores it in a cloud storage bucket in a certain folder, I am trying to have dataflow read in files as they are coming in and store them in big query, I got the schema to match but every time I run it, it only reads the file I select in that cloud storage, it doesn't read in multiple and it doesn't detect the event in which a new file is added to that folder.
here is my bq schema:
[
{
"name": "fname",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "lname",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "email",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "creditcard",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "date",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "vanilla",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "vanilla_qty",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "chocolate",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "chocolate_qty",
"type": "STRING",
"mode": "NULLABLE"
}
]
and here is how my data is coming in, in a .txt file.
{"fname": "Jane", "lname": "Smith", "email": "jane.smith@example.com", "creditcard": "9876543210987654", "date": "2024-03-27", "vanilla": "yes", "vanilla_qty": 1, "chocolate": "yes", "chocolate_qty": 3}
I have tried the streaming template in dataflow, both of them and I have tried the batch one as well. I tried storing my data in .json file as well and that didn't work. not sure what the issue is.
Solved! Go to Solution.
I have replicated the concern and it seems working fine for me every time I add a new .txt file it gets I can query it to BigQuery . When setting up the Job (in console) be sure to follow the pattern:
gs://bucketname/path/*.txt
change txt format depending on your use case
My Schema file content:
{
"BigQuery Schema": [
{
"name": "fname",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "lname",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "email",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "creditcard",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "date",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "vanilla",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "vanilla_qty",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "chocolate",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "chocolate_qty",
"type": "STRING",
"mode": "NULLABLE"
}
]
}
Results in BQ:
Values added everytime I added a file.
I have replicated the concern and it seems working fine for me every time I add a new .txt file it gets I can query it to BigQuery . When setting up the Job (in console) be sure to follow the pattern:
gs://bucketname/path/*.txt
change txt format depending on your use case
My Schema file content:
{
"BigQuery Schema": [
{
"name": "fname",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "lname",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "email",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "creditcard",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "date",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "vanilla",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "vanilla_qty",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "chocolate",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "chocolate_qty",
"type": "STRING",
"mode": "NULLABLE"
}
]
}
Results in BQ:
Values added everytime I added a file.
User | Count |
---|---|
1 | |
1 | |
1 | |
1 | |
1 |