Recently I embarked on building a simple Synapse pipeline, delivering blob data to an Azure SQL Database, with a view to prove out other related features of this Azure service. I had limited time and wanted to create a csv file with random but structured data. I then decided I wanted to get an insight into whether I could automate the process I was about to embark on, in a repeatable fashion.
So, I thought, can I utilise an AI based service without any setup overhead to deliver what I was asking for?
Ideally, I want an on-demand AI service which I can give a brief natural language description of the data structure, to deliver a useable file.
I want to then take that file and push it into my pipeline and ultimately deliver it to my Azure SQL database.
I asked chatgpt the following:
From that, chatgpt generated a python routine to ultimately create random data.
For me, this is impressive, but when aligning with what I am trying to deliver, it highlighted the following:
Pros:
Cons
To be totally fair to chatgpt, all of the above is my fault, as I have not been specific enough in the way I have asked Chatgpt for results…
Before taking the generated python code and running it, I ask chatgpt the following, shifting the emphasis from generating to displaying content:
With the following returned:
This is great, I can now hit the Copy code button, paste it into a text editor, save the file and pass it into my data pipeline. Come to think of it, that may be just as fiddly as running the generated python code, but at least it's given me my realistic job titles! 😊
At this point, I'm thinking I now have a set of repeatable 'search' statements, to pass to chatgpt to generate further csv test files…or do I?
I open up a separate chatgpt session and ask the same last question, expecting a further lump of csv to be displayed for me…
…but alas it generates a further python file.
My next thought is chatgpt needs the whole conversation to be repeated, not just the last statement.
I try this… but unfortunately it still doesn't give the original csv based results, but instead offers up further python scripts. The python files are great, they look like they will work, but it's not the repeatable results I was after.
Key take away for me here is:
About the author
Geoff Sanderson
An experienced Data Specialists with over a decade experience in data and 20 plus in the technology industry. 80's music enthusiast and Yamaha QY70 mega fan.