(Note: I talk about GPT here mostly but just because it's easier to write than "transformer language models" and most people are familiar with them in the form of GPT, but the text is about them in general)
GPT3, often confused with ChatGPT in the latest swarm of internet articles,
has been all the rage in the tech buzzword world these days. It's treatment in the media for the last year or so has been off the charts, with some treating it as the miracle AI we have been waiting for. Everybody and their mom has been jumping on the bandwagon, creating the next copywriting tool, making it pass the bar or just using it to write their math homework.
Unfortunately, the quality of the content generated is usually mediocre - even with better prompting, the text generated cannot be novel - the technology itself is based on "common denominators" in a way, parroting and remixing from the trained texts, so you can forget about becoming the next James Joyce in a few clicks; your writing will most likely end up looking like an average philosophy student's grandiose manifesto, with a bunch of words thrown in to impress the average reader, yet meaning nothing and bearing no satisfaction to the reader's gaze.
But, far off on the other side, there are some way more fun applications people are finding uses for - GPT3 as a reducer, as a backend, as a translator or decompiler/deobfuscator- and these applications have a much bigger practical value.
And for the last year or so, this has been tickling my mind - what are some actual usecases behind the technology - yes, generating articles or parroting back documentation is an obvious one. Fine-tuned models answering support questions is also a nice one, tho it comes with it's own 13 reasons why not.
But the transformations themselves - taking data in 1 form and returning it in the other, processing it along the way or just translating it - unlock a large pool of uncaptured value.
Imagine being able to process a bunch of scraped or human data into a predefined format that aligns with your API's data format - or to put it more vividly, imagine your grandma sending a text "can you bring me 2 bottles of milk and a pack of eggs?", getting an answer "that will be 3.97, is that ok?" and someone showing up with 2 milk and eggs 15 minutes later (or sometimes 12 milks and 2 eggs because the model screwed up).
Behind the scenes, the text is actually fed into the model that transforms it into a json in the format of:
"name": "Egg pack",
Which the latest 15-minute grocery delivery app can then consume and bring your grandma her milk (and rip her off for a 4$ service fee, 8$ delivery fee, 3$ VC fee on the way).
Even better things are possible with chaining different models:
Scrape a website, feed it into a model to remove unnecessary HTML, and feed the results into another model that transforms contents into a format your API's consume. Hell, why even bother with an API, just insert the results into a model that is fine-tuned in translating to SQL queries and pump that sweet data oil in directly.
Want to check how much open bugs during full moons influence your user churn?
Well what if your favorite analytics tool had a question box connecting to a chain -
first giving your question to a model that suggests data to find, passing into another model returning a query on your data lake which is then evaluated for safety, executed and passed together with the original prompt into a code-generating model that will return the necessary HTML to display that data.
Instead of having to torture your developers and designers with supporting infinite possible permutations of filters, chart designs and customisations, you can just leave it up to the model to generate them on the fly.
With enough fine-tuning (and a lot of human work to provide good data),
transformer LLM's can help us achieve a lot of stuff that we thought "unscalable" as of now - stuff that wasn't cost efficient, needed a mechanical turk or a large swath of harcoded assumptions to iron out the edge cases - can be achieved by using an oversized text mumbler-jumber.
And yes, there are a lot of hallucinations, quite a few mistakes, and a lot of accuracy issues in the way - one wrong word and the model could go wind up in the crazy lane - but I'm not saying it's a perfect "do-all-be-all" technology, far from it - I'm saying it's a great "glue" layer we were missing in our toolbelt, a "generic glue" layer which could help us unlock more economic and data value than ever. With good training, error checking and proper chaining, we could conquer some problems that were unsurmountable until now.
Even though the current generation of models are like giant mainframes upon which we can only gaze with wonder, there are newer and smaller models coming out at a rapid pace. And while we are still quite far away from having a small, easily tuneable model that will be good enough to cover a large swath of tasks with only a small amount of additional training, the next generation of programmers might grow up complaining that 'gpt install is-integer' ruined programming.