Introduction to Generative AI
Gen-AI, or generative AI is the new hot topic in the market. From chat-GPT writing your assignments, DALL-E creating art, its complexity is growing as rapidly as its use case. So let’s break down the technology.
We start by understanding AI.
Artificial Intelligence can be understood as a disciple or a field of study, just like physics and sociology. Like other subjects, it has a broad spectrum of topics under its belt. ML or Machine Learning is one of the subfields that we will be zooming in on. Although, ML is more about Statistical Math than Computer Science, but both are equally essential for the never-ending use case.
Think of a machine learning model as a magic box, you put in a carrot (input data) rotate the box once (run the function) and pull out a rabbit (output data). The carrot is like any other carrot in the farm and the rabbit is nothing magical either, it is the structure and making of the magic box that needs a closer inspection.
There are numerous different ML models (or magic box) but what is common across all these models is data. ML model needs data like a car needs fuel; data is paramount. We categorise these ML models based on the kind of data and how it is fed to the system. Machine Learning can be categorised into a few more topics; Supervised, Unsupervised, Semi-supervised and Reinforcement learning. To put it in simple terms;
Supervised Learning Models are models that use labelled data.
A teacher shows this (assuming the students know how to count):
f(0) = 2
f(3) = 5
f(12) = 14
f(48) = 50
f(30) = 32
f(2) = 4
The student knows there’s an input inside ‘ f() ’ and an output after ‘ = ’ and for every input the teacher is adding 2 is and they can perform x + 2 based what he recognised.
Unsupervised Learning Models are the models where the data is not labelled as input or output.
A teacher shows this to her students (assuming the students know how to count):
1 < 2
3 > 2
9 < 11
5 > 4
1 > 0
Although the student doesn’t understand what the symbol means, they recognise the pattern and put ‘ > ’when the left number is greater and ‘ < ’ when the right number is greater.
Semi-supervised learning is a mix of both.
say the student is studying for a test, they will get questions like “if x > 9 and x + y = 10, solve for x”. They will use a little bit of supervised learning (addition) and a little bit of unsupervised learning(greater than).
Let’s get an overview of Deep Learning.
Deep Learning is a shade of semi-supervised learning. In fact, it is a subset of artificial inteligence inspired by the human brain. If you know a little neurology, you’d know that our brain is wired in a way that there are several billion neurons or nodes creating a big mesh network sending and receiving data to understand and communicate better. (To be fair, it is far more complex than that but humour me). Deep Learning works like that too. we can classify these models into Discriminative kind and the Generative kind.
Discriminative Deep Learning Models
Discriminative models are classification or prediction models. They are typically supervised. They learn the relationship between the feature of data points and labels. The output of such models are generally numbers, categories, probability or a class. For instance; a model that can predict if a picture is of a cat or dog, is a discriminative model.
Generative Deep Learning Model
Generative Models are models that generate new content based on previously fed data. They are typically unsupervised and can generate output like natural language, an image, audio or video. For example, given enough pictures of cats, the model can generate a picture of a cat.
We will again zoom in a little more on Generative AI.
Gen-AI is a subset of Deep Learning. It combines artificial neural networks with semi-structured data. Gen-AI generates new content based on dataset provided to it. Generative AI, like other Machine Learning model, feeds on data. There are 2 kinds of data provided to the model.
- Training data — This is consumed for pre-training a generative transformer model also called as a foundational model, which recognises and adapts to patterns and learns “ what to generate”. Think of it like dancing; a dancer watch and learns 10 different styles from thousands of videos. They now know the basic pattern of each dance form and come up with their own choreography to new songs. The extensive videos to study dance was the training data.
- Input data — This is primarily given in terms of prompts, or small texts to provide contexts and constraints for the new content to be generated. Going back to the dancing example, a prompt would be a person asking the dancer to perform classical dancr for 2 minutes.
Just like us humans, machines too, misunderstand patterns and jumps to wrong conclusions too. And just like humans, when machine answers like a crazy person, we say that the machine had “Hallucinations”. Now hallucinations can occur largely due to 4 reasons:
The training data was too little.
The training data was noisy or dirty.
The prompt data didn't have enough context.
The prompt data didn't have enough constraints.
Another thing to keep in mind is that the quality of the input or the prompt highly influences the quality of the output. Hence prompt designing is cardinal. Based on the input and expected output, there are multiple types of Generative AI model as well. Let’s look into some of these models and their examples.
Text to Text
Generation, classification, summarisation, translations, (re)search all are text to text models. where input and output both are texts, chat-GPT and BARD are the prime examples of these model types.
Text to Image
These models take text as an input and provides an image as an output, image generation and image editing are prime use cases today. There are a number of controversies on copyright regulations around these images that are creating panic in both the art and technology world at the moment.
Text to Video / 3D
Video generation and editing has been a pain for most creators, now with text to video or text to 3D gen-AI models, their life will be a little more simpler. At the same time, game developers can quickly create game assets and non playable characters using just text prompts. Even 3D modelling and rendering animations has become easier with Gen-AI.
Text to Task
Text to task has been in the industry for a while in form of virtual assistant, automation and software agents. What is new, with Gen-AI is that the tasks need not be custom-made and saved on these assistants, rather the system should be smart enough to adapt and complete the tasks on its own.
Generative AI, is the next big thing in the world. It might just burst like the .com bubble or it might revolutionise like the Industrial Revolution. While we wait for either to happen, we can learn more about the What, How and Why of what’s coming next.