Here’s How You Can Build Your Own Generative AI Project: A Step-by-Step Guide

9 min readDec 25, 2023

Let’s face it — there are already plenty of GenAI models that are just a quick Google search away. While my curious mind kept me experimenting with such models, I realized that I could also use these models to create my own GenAI project — and so can you.

It’s quick, simple, and easy — even for beginners who are just starting with Python. To make things easier for you, I have come up with a comprehensive, step-by-step guide on how to build your own text-to-image GenAI project. Needless to say, I will use readily available tools and resources, making them accessible even for those without extensive machine learning experience.

Understanding Generative AI Models

Firstly, let’s start from the basics!

Generative models are a class of AI algorithms that aim to create new data samples that resemble a given dataset. In the case of text-to-image generation, the model learns the patterns and features present in a set of text descriptions and generates corresponding images. The project we’ll build will use a popular type of generative model called a Generative Adversarial Network (GAN).

So — what is a GAN?

A GAN consists of two neural networks, a generator, and a discriminator, which are trained simultaneously through adversarial training. The generator generates data, while the discriminator evaluates the generated data against real data. The goal is for the generator to create data that is indistinguishable from real data, and the discriminator becomes progressively better at distinguishing real from generated data.

Step 1: Setting Up Your Development Environment

Before diving into the project, you need to set up your development environment. We’ll use Python and some popular libraries for machine learning. Ensure you have the following installed:

1. Python: Install Python from its website, python.org

2. TensorFlow: TensorFlow is a powerful machine learning library. Install it using:

pip install tensorflow

3. Keras: Keras is a high-level neural networks API that runs on top of TensorFlow. Install it using:

pip install keras

4. NumPy: NumPy is a fundamental package for scientific computing with Python. Install it using:

pip install numpy

Now that your environment is set up, let’s move on to the actual project.

Step 2: Collecting and Preparing Data

For our text-to-image generation project, we need a dataset containing pairs of text descriptions and corresponding images. To keep things simple, we’ll use a small dataset. You can later experiment with larger datasets for more diverse results.

Choose a Dataset

For this guide, let’s use the Oxford-102 Flower Dataset, which contains 102 flower categories. Download and extract the dataset.

Preprocess the Data

Write a Python script to preprocess the data. Organize it into a format where each image is paired with a text description. Resize the images to a consistent size, and preprocess the text by tokenizing and converting it into a format suitable for the model.

Here is an example code snippet for preprocessing the Oxford-102 Flower Dataset, assuming you have already downloaded and extracted the dataset. This code will organize the data into pairs of text descriptions and corresponding image paths:

import os
import pandas as pd
from sklearn.model_selection import train_test_split

# Path to the downloaded Oxford-102 Flower Dataset
dataset_path = '/path/to/oxford-102-flowers'

# Create a dataframe to store the pairs of text descriptions and image paths
data = {'text_description': [], 'image_path': []}

# Iterate through each flower category
for category in os.listdir(dataset_path):
    category_path = os.path.join(dataset_path, category)
    
    # Check if it's a directory
    if os.path.isdir(category_path):
        # Iterate through images in the category
        for image_name in os.listdir(category_path):
            image_path = os.path.join(category_path, image_name)
            
            # Append the text description and image path to the dataframe
            data['text_description'].append(f"{category} flower")
            data['image_path'].append(image_path)

# Create a dataframe from the data
df = pd.DataFrame(data)

# Split the dataset into training and testing sets
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

# Display the first few rows of the training dataset
print("Training Dataset:")
print(train_df.head())

# Display the first few rows of the testing dataset
print("\nTesting Dataset:")
print(test_df.head())

# Save the dataframes to CSV files
train_df.to_csv('/path/to/train_dataset.csv', index=False)
test_df.to_csv('/path/to/test_dataset.csv', index=False)

This script assumes that the images are organized into subdirectories named after the flower categories within the main dataset directory. The code creates a dataframe where each row contains a text description (formed by appending “ flower” to the category name) and the corresponding image path. It then splits the dataset into training and testing sets using the train_test_split function from scikit-learn.

Make sure to replace '/path/to/oxford-102-flowers', '/path/to/train_dataset.csv', and '/path/to/test_dataset.csv' with the actual paths on your system. Additionally, you may need to adapt the script based on the specific structure of the Oxford-102 Flower Dataset or the dataset you are working with.

Step 3: Building the Generative Model

Now that the data is ready, let’s move on to building the generative model using a GAN architecture.

Define the Generator

Below is an example code snippet for defining the generator in Keras. This generator will take random noise as input and generate an image. Keep in mind that the architecture and hyperparameters can be adjusted based on your specific project requirements and experimentation:

from keras.models import Sequential
from keras.layers import Dense, Reshape, BatchNormalization, LeakyReLU, UpSampling2D

def build_generator(latent_dim):
    model = Sequential()
    
    # Input layer
    model.add(Dense(128 * 7 * 7, input_dim=latent_dim))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Reshape((7, 7, 128)))
    
    # Upsampling layers
    model.add(UpSampling2D(size=(2, 2)))
    model.add(Conv2D(128, kernel_size=(3, 3), padding='same'))
    model.add(BatchNormalization())
    model.add(LeakyReLU(alpha=0.2))
    
    model.add(UpSampling2D(size=(2, 2)))
    model.add(Conv2D(64, kernel_size=(3, 3), padding='same'))
    model.add(BatchNormalization())
    model.add(LeakyReLU(alpha=0.2))
    
    # Output layer
    model.add(Conv2D(1, kernel_size=(3, 3), activation='tanh', padding='same'))
    
    return model

# Set the size of the random noise vector
latent_dim = 100

# Build the generator
generator = build_generator(latent_dim)

# Display the generator architecture
generator.summary()

This script defines a simple generator architecture using a Sequential model in Keras. The generator takes random noise of size latent_dim as input and gradually upsamples it to generate an image. Adjust the architecture and hyperparameters based on your specific requirements and experimentation. The final layer uses the tanh activation function to ensure the generated images are in the range of [-1, 1].

Feel free to experiment with different architectures, layer sizes, and activation functions to achieve the desired results for your text-to-image generation project.

Define the Discriminator

Similarly, you can use the following example code snippet for defining the discriminator in Keras. The discriminator evaluates whether an image is real or generated. Like the generator code, you can experiment with the architecture and hyperparameters to achieve the desired results:

from keras.models import Sequential
from keras.layers import Conv2D, Flatten, Dense, BatchNormalization, LeakyReLU

def build_discriminator(img_shape):
    model = Sequential()
    
    # Input layer
    model.add(Conv2D(64, kernel_size=(3, 3), strides=(2, 2), padding='same', input_shape=img_shape))
    model.add(LeakyReLU(alpha=0.2))
    
    # Hidden layers
    model.add(Conv2D(128, kernel_size=(3, 3), strides=(2, 2), padding='same'))
    model.add(BatchNormalization())
    model.add(LeakyReLU(alpha=0.2))
    
    model.add(Conv2D(256, kernel_size=(3, 3), strides=(2, 2), padding='same'))
    model.add(BatchNormalization())
    model.add(LeakyReLU(alpha=0.2))
    
    model.add(Flatten())
    
    # Output layer
    model.add(Dense(1, activation='sigmoid'))
    
    return model

# Set the image shape (adjust based on your image dimensions and channels)
img_shape = (64, 64, 1)

# Build the discriminator
discriminator = build_discriminator(img_shape)

# Display the discriminator architecture
discriminator.summary()

This script defines a simple discriminator architecture using a Sequential model in Keras. The discriminator takes an image as input and processes it through convolutional layers to determine whether the image is real or generated. The final layer uses the sigmoid activation function to produce a binary output.

Adjust the architecture and hyperparameters based on your specific project requirements and experimentation. You may need to modify the img_shape variable to match the dimensions and channels of the images in your dataset.

Step 4: Combine Generator and Discriminator

Create the GAN model by combining the generator and discriminator. During training, the generator’s weights will be updated to generate more realistic images.

The following code combines the generator and discriminator to create the GAN (Generative Adversarial Network) model in Keras. This model is responsible for training the generator while simultaneously updating the discriminator:

from keras.models import Sequential
from keras.layers import Input
from keras.models import Model

def build_gan(generator, discriminator):
    # Make the discriminator non-trainable during the generator training
    discriminator.trainable = False
    
    # Connect the generator and discriminator
    gan_input = Input(shape=(latent_dim,))
    generated_img = generator(gan_input)
    gan_output = discriminator(generated_img)
    
    # Create the GAN model
    gan = Model(gan_input, gan_output)
    
    return gan

# Build the GAN model by combining the generator and discriminator
gan = build_gan(generator, discriminator)

# Display the GAN architecture
gan.summary()

In this script, we create a GAN model by connecting the generator’s output to the discriminator’s input. During the training of the GAN, the generator’s weights will be updated to generate more realistic images, while the discriminator’s weights remain fixed. This adversarial training process helps the generator improve its ability to produce images that are indistinguishable from real ones.

Make sure to have already defined the generator and discriminator models before running this code. The latent_dim variable should be set to the size of the random noise vector used by the generator.

Step 5: Compile the Model

You are almost there! You can now just use the following code compiles the GAN model by specifying the loss function and optimizer. We’ll use binary cross-entropy as the loss function, and Adam as the optimizer for both the generator and discriminator:

from keras.optimizers import Adam

# Set the learning rate for the optimizer
learning_rate = 0.0002

# Compile the discriminator
discriminator.compile(loss='binary_crossentropy', optimizer=Adam(learning_rate), metrics=['accuracy'])

# Compile the GAN
gan.compile(loss='binary_crossentropy', optimizer=Adam(learning_rate))

In this script, we compile both the discriminator and the GAN using the binary cross-entropy loss function, which is suitable for binary classification problems. The optimizer used is Adam, a popular choice for training deep neural networks.

You can experiment with different loss functions and optimizers based on your project requirements and the characteristics of your dataset. Additionally, adjusting the learning rate is an essential hyperparameter that can impact the training stability and convergence of the model.

Step 6: Train the Model

Training a GAN involves iteratively training the generator and discriminator in a loop. Here’s a basic example of how you can train the model:

import numpy as np

# Set the number of epochs and batch size for training
epochs = 5000
batch_size = 64

# Set the label for real and fake images
real_labels = np.ones((batch_size, 1))
fake_labels = np.zeros((batch_size, 1))

for epoch in range(epochs):
    # Select a random batch of images
    idx = np.random.randint(0, train_images.shape[0], batch_size)
    real_images = train_images[idx]

    # Generate a batch of fake images
    noise = np.random.normal(0, 1, (batch_size, latent_dim))
    generated_images = generator.predict(noise)

    # Train the discriminator on real and fake images
    d_loss_real = discriminator.train_on_batch(real_images, real_labels)
    d_loss_fake = discriminator.train_on_batch(generated_images, fake_labels)
    d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

    # Train the generator to fool the discriminator
    noise = np.random.normal(0, 1, (batch_size, latent_dim))
    g_loss = gan.train_on_batch(noise, real_labels)

    # Display the progress every 100 epochs
    if epoch % 100 == 0:
        print(f"Epoch {epoch}/{epochs} [D loss: {d_loss[0]} | D accuracy: {100 * d_loss[1]}] [G loss: {g_loss}]")

# Save the trained generator model
generator.save('/path/to/trained_generator.h5')

This script trains the GAN by alternating between training the discriminator on real and fake images and training the generator to fool the discriminator. The train_on_batch method is used for both the discriminator and GAN.

Make sure to replace '/path/to/trained_generator.h5' with the desired path to save the trained generator model. Additionally, you may need to adapt the code based on the specifics of your dataset and project requirements.

It’s important to note that training GANs can be challenging, and hyperparameter tuning is often necessary for stable and effective training. Experiment with different batch sizes, learning rates, and architectures to achieve the best results for your text-to-image generation project.

Step 7: Generating Images

Once the GAN is trained, you can use the generator to generate new images based on random noise. Here’s a code snippet to generate images using the trained generator:

import matplotlib.pyplot as plt

# Function to generate and display images
def generate_images(generator, rows, cols, latent_dim):
    noise = np.random.normal(0, 1, (rows * cols, latent_dim))
    generated_images = generator.predict(noise)

    # Rescale images from [-1, 1] to [0, 1]
    generated_images = 0.5 * generated_images + 0.5

    # Display generated images
    fig, axs = plt.subplots(rows, cols, figsize=(10, 10))
    cnt = 0
    for i in range(rows):
        for j in range(cols):
            axs[i, j].imshow(generated_images[cnt, :, :, 0], cmap='gray')
            axs[i, j].axis('off')
            cnt += 1
    plt.show()

# Set the number of rows and columns for the grid of generated images
rows = 4
cols = 4

# Generate and display images using the trained generator
generate_images(generator, rows, cols, latent_dim)

This code defines a function generate_images that takes the trained generator, the desired number of rows and columns for the grid of generated images, and the size of the random noise vector. It then generates images using random noise and displays them in a grid using Matplotlib.

Adjust the rows and cols variables to control the number of generated images to display in the grid. After running this code, you should see a grid of newly generated images based on the trained generator.

Step 8: Fine-Tuning and Experimentation

That’s it! You’ve built a basic text-to-image generator. Now, take your project further by experimenting with different hyperparameters, architectures, or even trying different datasets. Here are some suggestions:

Hyperparameter Tuning

Experiment with learning rates, batch sizes, and the number of training epochs to find the best configuration for your specific project.

Architecture Modifications

Explore different architectures for the generator and discriminator. You can add more layers, adjust the size of layers, or even try different types of layers.

Use a Different Dataset

Try using a different dataset to see how the model performs with diverse input data.

Over to you!

As you can see, building your own generative AI project can be a rewarding journey that combines creativity and technical skills. Remember, this is just the beginning. Continue exploring, tweaking, and refining your project to achieve even more impressive results.

Happy coding!