Generating alt text using AI
Mar 16, 2025

When I started making this site in Jekyll, I quickly amassed hundreds of project photos.
I decided I wanted to add alt text to all the images.
But how could I do that? How could I do that in a scalable way for hundreds of images?
Manual alt text creation is tedious, especially for sites with many images. It’s easy to neglect, but crucial for accessibility.
I created a Python script that uses Google’s Gemini API to automatically generate alt text for images and store them systematically in a way that’s available to Jekyll’s templating system.
How it works
The script reads image entries from a YAML file and processes only those missing the ‘alt’ key. For each image:
- It loads the image using PIL
- Resizes it if needed to meet API requirements
- Sends it to Google Gemini with a carefully crafted prompt
- Updates the YAML file with the generated alt text
Here’s the basic setup:
import google.generativeai as genai
import yaml
from PIL import Image
PROMPT = """
Describe this image concisely to serve as an alt text for accessibility purposes.
Focus on important visual elements, context, and the main subject.
Keep it brief but descriptive, and start with the most important element.
Don't mention that it's an image or photo or give multiple options.
Avoid any personally identifying information.
"""
I set the script up to read its configuration from _config.yml
alongside the rest of Jekyll’s configuration.
alt_text_generator:
google_api_key: YOUR_API_KEY_HERE
model_version: gemini-1.5-flash
max_retries: 3
max_size: 1024
backoff_time: 10
image_metadata:
yaml_path: _data/images.yml
I used Google Gemini due to its generous free API tier, unlike many of its current competitors.
The script ultimate boils down to this:
with Image.open(image_path) as img:
# Convert to RGB if needed (e.g., for PNG with transparency)
if img.mode != 'RGB':
img = img.convert('RGB')
# Resize if needed
img = resize_image_if_needed(img, max_size)
# Generate the alt text
response = model.generate_content([PROMPT, img])
# Extract and return the generated text
return response.text.strip()
I store the results in a file called _data/image_metadata.yml
, which I interpolate into using an _include
template.
The results are perfectly usable!
assets/installation_images/goggles/bottle_view.jpg:
alt: A transparent orange container holding a circuit board with multicolored wires
connected to a white component.
assets/installation_images/goggles/bottlecap_tack.jpg:
alt: A small circuit board with wires attached, secured with modeling clay inside
a white plastic container.
assets/installation_images/goggles/enabled.jpg:
alt: White ski goggles with multicolored LED lights embedded in the frame, and a
small attached battery pack.
assets/installation_images/goggles/finished.jpg:
alt: White ski goggles with dark mirrored lenses and a small device attached to
the strap.
assets/installation_images/goggles/heatgun.jpg:
alt: A heat gun is used to join sections of clear acrylic tubing.
assets/installation_images/goggles/heatshrink1.jpg:
alt: LED light strip being assembled into a zig-zag shape on a notebook page.
assets/installation_images/goggles/heatshrink2.jpg:
alt: LED strips, adhered with putty, forming a curved shape on notebook paper with
a sketched outline.
I can of course edit any of the entries after the fact; I also made it so that if I add a title:
field it will use that as the title attribute for the image.
This makes maintaining 300+ images a lot easier!