Generating alt text using AI

When I started making this site in Jekyll, I quickly amassed hundreds of project photos.

I decided I wanted to add alt text to all the images.

But how could I do that? How could I do that in a scalable way for hundreds of images?

Manual alt text creation is tedious, especially for sites with many images. It’s easy to neglect, but crucial for accessibility.

I created a Python script that uses Google’s Gemini API to automatically generate alt text for images and store them systematically in a way that’s available to Jekyll’s templating system.

How it works

The script reads image entries from a YAML file and processes only those missing the ‘alt’ key. For each image:

It loads the image using PIL
Resizes it if needed to meet API requirements
Sends it to Google Gemini with a carefully crafted prompt
Updates the YAML file with the generated alt text

Here’s the basic setup:

import google.generativeai as genai
import yaml
from PIL import Image

PROMPT = """
Describe this image concisely to serve as an alt text for accessibility purposes.
Focus on important visual elements, context, and the main subject.
Keep it brief but descriptive, and start with the most important element.
Don't mention that it's an image or photo or give multiple options.
Avoid any personally identifying information.
"""

I set the script up to read its configuration from _config.yml alongside the rest of Jekyll’s configuration.

alt_text_generator:
  google_api_key: YOUR_API_KEY_HERE
  model_version: gemini-1.5-flash
  max_retries: 3
  max_size: 1024
  backoff_time: 10

image_metadata:
  yaml_path: _data/images.yml

I used Google Gemini due to its generous free API tier, unlike many of its current competitors.

The script ultimate boils down to this:

with Image.open(image_path) as img:
    # Convert to RGB if needed (e.g., for PNG with transparency)
    if img.mode != 'RGB':
            img = img.convert('RGB')

    # Resize if needed
    img = resize_image_if_needed(img, max_size)

    # Generate the alt text
    response = model.generate_content([PROMPT, img])

    # Extract and return the generated text
    return response.text.strip()

I store the results in a file called _data/image_metadata.yml, which I interpolate into using an _include template.

The results are perfectly usable!

assets/installation_images/goggles/bottle_view.jpg:
  alt: A transparent orange container holding a circuit board with multicolored wires
    connected to a white component.
assets/installation_images/goggles/bottlecap_tack.jpg:
  alt: A small circuit board with wires attached, secured with modeling clay inside
    a white plastic container.
assets/installation_images/goggles/enabled.jpg:
  alt: White ski goggles with multicolored LED lights embedded in the frame, and a
    small attached battery pack.
assets/installation_images/goggles/finished.jpg:
  alt: White ski goggles with dark mirrored lenses and a small device attached to
    the strap.
assets/installation_images/goggles/heatgun.jpg:
  alt: A heat gun is used to join sections of clear acrylic tubing.
assets/installation_images/goggles/heatshrink1.jpg:
  alt: LED light strip being assembled into a zig-zag shape on a notebook page.
assets/installation_images/goggles/heatshrink2.jpg:
  alt: LED strips, adhered with putty, forming a curved shape on notebook paper with
    a sketched outline.

I can of course edit any of the entries after the fact; I also made it so that if I add a title: field it will use that as the title attribute for the image.

This makes maintaining 300+ images a lot easier!