Inside the HugoBlox showcase: how we built our site gallery
TL;DR
We built a crawler that discovers HugoBlox sites from GitHub search, takes Puppeteer screenshots, stores them in Supabase Storage, and syncs metadata to a Postgres table. The whole thing runs on a cron job.
Key takeaways
- GitHub's code search API is the best signal for finding community-built Hugo sites.
- Puppeteer on a small VPS is cheaper than a screenshot SaaS for this volume.
- Supabase Storage + Postgres handles the full media + metadata layer cleanly.
The HugoBlox showcase at hugoblox.com/templates lists over 500 community-built sites. Every entry has a live screenshot, a tag cloud, and a star count. Keeping this accurate at scale required building a crawler pipeline rather than managing it manually.
Here's how it works.
Discovery
The pipeline starts with GitHub code search. HugoBlox sites contain a distinctive pattern in their config/_default/hugo.yaml:
module:
imports:
- path: github.com/HugoBlox/hugo-blox-builderA nightly cron job queries the GitHub API for repos matching this signature, filters out forks and archived repos, and enqueues new ones.
Screenshot capture
Each discovered repo has a live URL extracted from its GitHub Pages or Netlify configuration. We spin up a Puppeteer instance, navigate to the URL, wait for the first contentful paint, and take a 1280×800 viewport screenshot.
Puppeteer runs on a $6/month VPS — far cheaper than a commercial screenshot API at our volume (~50 new sites per day).
Storage and indexing
Screenshots upload to Supabase Storage. Metadata — URL, star count, description, detected tags — writes to a showcase_sites Postgres table. The table has a last_screenshot_at column so we re-screenshot sites that haven't been updated in 30 days.
create table showcase_sites (
id uuid primary key default gen_random_uuid(),
github_url text unique not null,
live_url text,
screenshot_path text,
stars int default 0,
tags text[] default '{}',
last_screenshot_at timestamptz,
created_at timestamptz default now()
);Serving
The showcase page fetches from Supabase at build time (Next.js generateStaticParams) and at runtime on filter changes. Screenshots serve from Supabase Storage's CDN — no additional image CDN needed.
The whole pipeline processes 50 new sites in under 5 minutes. Not bad for a side project.
Frequently asked questions
Sources
George Cushen
Founder, HugoBlox
George is the creator of HugoBlox and has helped over 150,000 researchers and developers ship beautiful Hugo sites.