Generative AI represents a significant advancement in technology, following the rise of the Internet, mobile devices, and cloud computing. Its immediate practical benefits, especially in improving productivity and efficiency, are more apparent than those of other emerging technologies like the metaverse, autonomous driving, blockchain, and Web3. Generative AI models are used across many domains, with notable examples and applications of these systems seen in areas such as writing, art, music, among other innovative fields.
Generative AI is a transformative technology that employs neural networks to produce original content, including text, images, videos, and more. Well-known applications such as ChatGPT, Bard, DALL-E 2, Midjourney, and GitHub Copilot demonstrate the early promise and potential of this breakthrough.
Dive into the evolving world of generative AI as we explore its mechanics, real-world examples, market dynamics, and the intricacies of its multiple “layers” including the application, platform, model, and infrastructure layer. Keep reading to unravel the potential of this technology, how it’s shaping industries, and the layers that make it functional and transformative for end users.
What is Generative AI?
Generative AI is a subset of artificial intelligence that employs algorithms to create new content, such as text, images, videos, audio, software code, design, or other forms of content.
How Does Generative AI Work?
Generative AI models work by utilizing neural networks to analyze and identify patterns and structures within the data they have been trained on. Using this understanding, they generate new content that both mimics human-like creations and extends the pattern of their training data. The function of these neural networks varies based on the specific technology or architecture used. This includes, but is not limited to, Transformers, Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models.
- Transformers: Transformers utilize self-attention mechanisms to process and analyze sequences of data with greater efficiency than conventional methods. Unlike traditional AI models that only focus on individual sentences, transformers can identify connections between words across entire pages, chapters, or books. This makes them highly suitable for training on massive, unlabeled datasets
- Generative Adversarial Networks (GANs): GANs are composed of two parts – a generator that creates new data, and a discriminator that distinguishes between real and computer-generated data. Both components are trained simultaneously. The generator is penalized if it produces unrealistic samples, while the discriminator is penalized if it incorrectly identifies computer-generated examples
- Variational Autoencoders (VAEs): VAEs consist of an encoder and a decoder that are connected by a set of latent variables. These unsupervised learning models strive to make the input and output as identical as possible by compressing the dataset into a simplified form. The latent variables allow the generation of new data by feeding random sets into the decoder, facilitating creativity and diversity in output
- Diffusion Models: These models are trained algorithms that manipulate data by adding random noise and then skillfully removing it. They learn how to retrieve the original data from noise-distorted versions, making them particularly useful in image generation applications. By dissecting the components of photographs and artwork, diffusion models are capable of converting random noise into specific and coherent images
Comparison of Leading Generative Model Architectures
Transformers have become a cornerstone for natural language processing and are currently the most popular architecture for generative AI models. They are followed by GANs – which are widely used in image synthesis and enhancement, VAEs – which are commonly employed for data generation and reconstruction, and Diffusion Models – which are gaining traction for their ability to effectively generate images and text.
Examples of Generative AI
Generative AI models have the ability to create new content in various forms, such as text, images, videos, audio, software code, and design. Here are some examples for each category:
- Text: Generative AI can produce human-like text based on specific prompts. Prominent examples include OpenAI’s GPT (GPT-3.5 and GPT-4), and Google’s PaLM 2, which are large language models (LLMs) that power popular chatbots like ChatGPT and Bard, respectively
- Images: These models can generate images ranging from realistic human faces and artistic creations to photorealistic scenes, all based on textual descriptions. Notable applications in this field are OpenAI’s DALL-E 2, Adobe Firefly, Midjourney, and Stable Diffusion
- Videos: Although currently limited to short clips, generative AI can create videos from textual descriptions. Early platforms in text-to-video generation include Kaiber, Runway, Genmo, and Pika Labs
- Audio: Generative AI is capable of creating audio content in the form of music and speech. This creation process can be guided by textual descriptions, musical notation, or seed audio files. For music generation, models such as Google’s MusicLM, OpenAI’s MuseNet, and Meta’s AudioCraft are widely utilized. In the field of text-to-speech, models like Google DeepMind’s WaveNet, ElevenLabs, and Tacotron are currently popular
- Design: Generative AI is employed in the creation and optimization of 3D models, facilitating prototyping, streamlining process optimization, and enhancing game design through text-based prompts. NVIDIA’s GET3D, DreamFusion, and RoomGPT are examples of applications that can create a diverse range of designs and related assets
ChatGPT and LLMs – A Key Example of Generative AI
ChatGPT, used by hundreds of millions of people across the globe, stands as a prominent example of generative AI. It can produce human-like text by responding to input prompts, utilizing the Transformer architecture. Built on OpenAI’s GPT (Generative Pre-Trained Transformer) models, ChatGPT is part of the large language model (LLM) family, and it is commonly employed for various natural language processing (NLP) tasks.
LLMs are deep learning algorithms capable of recognizing, summarizing, translating, predicting, and generating text, along with other content. These abilities are based on knowledge gleaned from extensive datasets. In the case of GPT-4, the neural network architecture, known as Transformer, hosts more than 1 trillion parameters that served as the training foundation. The GPT models are engineered to predict the subsequent word in a text sequence, while the Transformer component adds context to each word through the attention mechanism.
Market Size of Generative AI
The generative AI market size is projected by Boston Consulting Group (BCG) to reach $60 billion by 2025 and then double to $120 billion by 2027. This significant increase represents a 66% compound annual growth rate (CAGR) from 2022 to 2027. By 2025, generative AI is also expected to make up 30% of the total AI market.
These strong growth figures are further supported by McKinsey’s estimates on the broader economic impact of generative AI. According to their analysis, generative AI could contribute between $2.6 trillion to $4.4 trillion to the GDP (Gross Domestic Product) in advanced economies, amounting to 4% to 7% of the overall GDP.
Layers of Generative AI
For a more comprehensive understanding of the generative AI landscape, we analyze the technology’s value chain, dividing it into four interconnected layers that work together to create new content. These layers are the application layer, the platform layer, the model layer, and the infrastructure layer. Each of these plays a distinctive role in the entire process, enhancing the robust capabilities of generative AI.
Application Layer of Generative AI
The application layer in generative AI streamlines human interaction with artificial intelligence by allowing the dynamic creation of content. This is achieved through specialized algorithms that offer tailored and automated business-to-business (B2B) and business-to-consumer (B2C) applications and services, without users needing to directly access the underlying foundation models. The development of these applications can be undertaken by both the owners of the foundation models (such as OpenAI with ChatGPT) and third-party software companies that incorporate generative AI models (for example, Jasper AI).
Generalized, Domain-Specific, and Integrated Applications
The application layer of generative AI consists of three distinct sub-groupings: generalized applications, domain-specific applications, and integrated applications.
- Generalized Applications: This category encompasses software designed to perform a broad array of tasks, generating new content in various forms including text, images, videos, audio, software code, and design. Examples in this category include ChatGPT, DALL-E 2, GitHub Copilot, Character.ai (a chatbot service allowing users to create and converse with AI characters), and Jasper AI (an AI-powered writing tool)
- Domain-Specific Applications: These are software solutions tailored to meet the particular needs and requirements of specific industries, such as finance, healthcare, manufacturing, and education. These applications are more specialized and responsive in their respective domains, especially when companies train them on high-quality, unique, and proprietary data. Examples include BloombergGPT, an LLM developed by Bloomberg for financial data analysis, and Google’s Med-PaLM 2, an LLM trained on medical data to answer medical queries
- Integrated Applications: This sub-group consists of existing software solutions that have incorporated generative AI functionality to enhance their mainstream offerings. Major players include Microsoft 365, Salesforce CRM, and Adobe Creative Cloud. Examples of integrated generative AI tools are Microsoft 365 Copilot (an AI-powered assistant for various Microsoft products), Salesforce’s Einstein GPT (a generative AI CRM technology), and Adobe’s generative AI integrations with Photoshop
Platform Layer of Generative AI
The platform layer of generative AI focuses on providing access to large language models (LLMs) through a managed service. This service simplifies the fine-tuning and customization of general-purpose, pre-trained foundation models like OpenAI’s GPT. Although leading LLMs, such as GPT-4, can answer most questions immediately upon deployment using only the locked dataset on which they have been trained, fine-tuning allows these LLMs’ capabilities to be significantly enhanced for specific content domains.
Fine-tuning involves unlocking an existing LLM’s neural network for additional layers of training with new data. End users or companies can seamlessly integrate their own proprietary or customer-specific data into these models for targeted applications.
The ultimate objective of the platform layer is to simplify the use of LLMs for end users or companies and to reduce the associated costs. This approach eliminates the necessity to invest billions of dollars and years of effort in developing these models independently from scratch. Instead, users can pay monthly subscription fees or have it bundled with their Infrastructure as a Service (IaaS) offerings. Alongside this, users also gain access to valuable features such as security, privacy, and various platform tools, all managed in a streamlined manner.
Cloud Platforms for AI Model Fine-Tuning
Cloud service providers (CSPs) have developed platform services to allow companies to access necessary foundation models and to train and customize their own models for specific applications. The platform services include:
- Azure OpenAI Service: This cloud-based service offers access to OpenAI’s foundation models, allowing users to create applications within the Azure portal. Included are the GPT family of LLMs for text generation and Codex for code generation
- Amazon Bedrock: A platform that supports building and scaling generative AI applications, using foundation models like Anthropic’s Claude, Stability AI’s Stable Diffusion, and Amazon Titan
- Google Cloud’s Vertex AI: A managed ML platform, offering tools and services for building, training, and deploying generative AI models, including PaLM for text generation and Imagen for image generation
Open Source Platforms for AI Model Fine-Tuning
Open source solutions are also available to assist in the fine-tuning and customization of general-purpose and pre-trained foundation models. These include:
- Hugging Face: Recognized as a “model hub,” Hugging Face grants access to over 120,000 pre-trained transformer models and provides tools to fine-tune them for NLP tasks like question & answer, text classification, and text generation
- TensorFlow: Created by Google, TensorFlow is an open source library for deep learning. It facilitates the building, training, and deployment of AI models, with features for image recognition, machine translation, and various decision-making applications
- PyTorch: Developed by Meta (formerly Facebook) Research, PyTorch is a Python-based ML framework. It is unique for its strong GPU support and the ability to modify models in real-time through reverse-mode auto-differentiation
Model Layer of Generative AI
The model layer of generative AI starts what is referred to as a foundation model. This large-scale machine learning model is commonly trained on unlabeled data through the use of a Transformer algorithm. The training and fine-tuning process enables the foundation model to evolve into a versatile tool that can be adapted for a wide variety of tasks, to support the capabilities of various generative AI applications.
At present, the market offers hundreds of foundation models capable of understanding various aspects such as language, vision, robotics, reasoning, and search. By the year 2027, Gartner predicts that foundation models will underpin 60% of NLP (Natural Language Processing) use cases. This marks a significant increase from less than 10% in 2022. This growth is expected to stem primarily from domain-specific models, which will be refined using general-purpose foundation models as their basis.
Foundation models can be broadly classified into two main categories: closed source (or proprietary) models and open source models.
- Closed Source Models: These models are owned and controlled by specific organizations like OpenAI, and the underlying source code, algorithms, training data, and parameters are kept private
- Open Source Models: In contrast, these models are accessible to everyone without restrictions. They encourage community collaboration and development, allowing for transparent examination and modification of the code
Closed Source Foundation Models
Closed source (or proprietary) foundation models are available to the public through an application programming interface (API). Third parties can utilize this API for their applications, querying and presenting information from the foundation model without the need to expend additional resources on training, fine-tuning, or running the model.
These models often have access to proprietary training data and have priority access to cloud computing resources. Large cloud computing companies typically create closed source foundation models, as training these models requires a significant investment. Closed source models generate revenue by charging customers for API usage or subscription-based access.
Large language models (LLMs) like OpenAI’s GPT-4 and Google’s PaLM 2 are specific closed source foundation models that focus on natural language processing. They have been fine-tuned for applications like chatbots, such as ChatGPT and Bard. A non-language example is OpenAI’s DALL-E 2, a vision model that recognizes and generates images.
Open Source Foundation Models
Open source foundation models are developed collaboratively. They are freely available for redistribution and modification, providing full transparency into training data and the model-building process. Many are even distributed without charge, depending on the license and data.
Benefits of using open source models include:
- Complete control and privacy over data; unlike sharing with closed source models like OpenAI’s GPT
- Improved customization with specific prompting, fine-tuning, and filtering to optimize for various industries
- Cost-effective training and inferencing of domain-specific models (smaller models require less compute)
Examples of open source models are Meta’s Llama 2, Databricks’ Dolly 2.0, Stability AI’s Stable Diffusion XL, and Cerebras-GPT. For a comprehensive and up-to-date list, refer to Hugging Face’s Open LLM Leaderboard, which tracks, ranks, and evaluates open LLMs and chatbots.
Infrastructure Layer of Generative AI
The infrastructure layer of generative AI encompasses the vital components underlying large-scale foundation models. The key resources involved in this process are semiconductors, networking, storage, databases, and cloud services, all of which play crucial roles both in the initial training and the ongoing fine-tuning, customization, and inferencing of generative AI models. Generative AI models function through two primary phases:
- Training Phase: This is where learning occurs, typically within a cloud data center in an accelerated computing cluster. In this compute-intensive phase, a large language model (LLM) learns from a given dataset. Parameters are the internal variables that the model adjusts to represent the underlying patterns in the training data. Tokens refer to the individual pieces of text that the model processes, such as words or sub-words. For example, GPT-3 was trained on 300 billion tokens, with one token equal to 1.33 words, sourced mainly from the Internet’s Common Crawl, Wikipedia, books, and articles
- Inference Phase: This is the process of actually using a trained AI model to generate user responses. Here, new text inputs are tokenized into individual units, and the model uses the parameters learned during training to interpret these tokens and generate corresponding outputs. These trained AI models require significant computing power and must be deployed close to end users (in an edge data center) to minimize delays in response (latency), as real-time interaction is essential to keep users engaged
Overall, the accuracy of generative AI relies on the size of the LLM and the volume of training data used. These factors, in turn, necessitate a robust infrastructure composed of semiconductors, networking, storage, databases, and cloud services.
Semiconductors enable the underlying hardware for computation, facilitating the processing and complex calculations required for generative AI models. They are essential materials used in the fabrication of various types of accelerated computing processing units, namely graphics processing units (GPUs), application-specific integrated circuits (ASICs) – including tensor processing units (TPUs) – and field-programmable gate arrays (FPGAs).
Key examples of these hardware accelerators – used by Microsoft, Amazon, Meta, and Google – are as follows:
- GPU: NVIDIA’s Hopper (H100) GPUs – which Microsoft uses heavily
- ASIC: Amazon’s AWS Trainium (Trn1) and AWS Inferentia (Inf1) ASICs, which are designed for training and inference, respectively. Also, the Meta Training and Inference Accelerator (MTIA) is another custom ASIC
- TPU: Google Cloud’s TPU v4
- FPGA: Amazon’s AWS F1 instance is powered by FPGAs
These hardware accelerators are particularly well-suited for tasks that can be broken down into smaller, parallel tasks, such as those found in generative AI.
Networking plays a crucial role in generative AI, facilitating the efficient exchange of data between AI systems. This is particularly important when dealing with high-bandwidth needs in server-to-server communication, also known as east-west traffic, within accelerated computing clusters.
Prominent networking technologies for AI workloads, such as InfiniBand and Ethernet, are complemented by high-bandwidth interconnects like NVLink (developed by NVIDIA). Together, these technologies provide solutions that enable connections between both internal and external components of AI clusters. Their coordination ensures efficient data transfer across cloud data centers, with high throughput and minimal latency.
Storage plays a vital role in the training and inference phases of generative AI models, enabling the retention of vast amounts of training data, model parameters, and intermediate computations. Parallel storage systems enhance the overall data transfer rate by providing simultaneous access to multiple data paths or storage devices. This functionality allows large quantities of data to be read or written at a rate much faster than that achievable with a single path.
In the context of generative AI training, there’s a need to read source datasets at extremely high speeds and to write out parameter checkpoints as swiftly as possible. During inference, where trained models respond to user requests, a high degree of read performance is essential. This capability enables the quick use of an LLM, utilizing billions of stored parameters, to generate the most appropriate response.
Databases, particularly non-relational (NoSQL) types, are vital for generative AI. They facilitate the efficient storage and retrieval of large, unstructured datasets required to train complex models like Transformers. The use of Azure Cosmos DB – Microsoft’s NoSQL database within Azure – by OpenAI for dynamically scaling the ChatGPT service underscores the need for databases that are both highly performant and scalable in the realm of generative AI.
Cloud service providers (CSPs) such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud offer computing resources (powered by semiconductors), networking, storage, databases, and various other services that enable the training and deployment of complex generative AI models. Within their data centers – divided into cloud regions and availability zones – they house the essential physical hardware, such as servers and IT equipment, making these operations both scalable and readily accessible.
These CSPs specialize in delivering Infrastructure as a Service (IaaS) offerings, tailor-made for the training and deployment of generative AI models. Examples of their specific offerings are as follows:
- AWS: Amazon offers EC2 P5 instances, which are powered by the NVIDIA H100 Tensor Core GPU, delivering up to 20 exaFLOPS of compute performance
- Microsoft Azure: They offer the NVIDIA A10G v5 instances, which are powered by the NVIDIA A10G Tensor Core GPU, providing up to 250 teraFLOPS of compute performance
- Google Cloud: Through G2 virtual machines, they offer NVIDIA’s L4 Tensor Core GPU, capable of up to 242 teraFLOPS of compute performance