Inside the HugoBlox showcase: how we built our site gallery

George Cushen1 min read

TL;DR

We built a crawler that discovers HugoBlox sites from GitHub search, takes Puppeteer screenshots, stores them in Supabase Storage, and syncs metadata to a Postgres table. The whole thing runs on a cron job.

Key takeaways

  • GitHub's code search API is the best signal for finding community-built Hugo sites.
  • Puppeteer on a small VPS is cheaper than a screenshot SaaS for this volume.
  • Supabase Storage + Postgres handles the full media + metadata layer cleanly.

The HugoBlox showcase at hugoblox.com/templates lists over 500 community-built sites. Every entry has a live screenshot, a tag cloud, and a star count. Keeping this accurate at scale required building a crawler pipeline rather than managing it manually.

Here's how it works.

Discovery

The pipeline starts with GitHub code search. HugoBlox sites contain a distinctive pattern in their config/_default/hugo.yaml:

module:
  imports:
    - path: github.com/HugoBlox/hugo-blox-builder

A nightly cron job queries the GitHub API for repos matching this signature, filters out forks and archived repos, and enqueues new ones.

Screenshot capture

Each discovered repo has a live URL extracted from its GitHub Pages or Netlify configuration. We spin up a Puppeteer instance, navigate to the URL, wait for the first contentful paint, and take a 1280×800 viewport screenshot.

Puppeteer runs on a $6/month VPS — far cheaper than a commercial screenshot API at our volume (~50 new sites per day).

Storage and indexing

Screenshots upload to Supabase Storage. Metadata — URL, star count, description, detected tags — writes to a showcase_sites Postgres table. The table has a last_screenshot_at column so we re-screenshot sites that haven't been updated in 30 days.

create table showcase_sites (
  id uuid primary key default gen_random_uuid(),
  github_url text unique not null,
  live_url text,
  screenshot_path text,
  stars int default 0,
  tags text[] default '{}',
  last_screenshot_at timestamptz,
  created_at timestamptz default now()
);

Serving

The showcase page fetches from Supabase at build time (Next.js generateStaticParams) and at runtime on filter changes. Screenshots serve from Supabase Storage's CDN — no additional image CDN needed.

The whole pipeline processes 50 new sites in under 5 minutes. Not bad for a side project.

Frequently asked questions

Sources

  1. [1]GitHub Code Search API — GitHub
  2. [2]Puppeteer Documentation — Google Chrome Team

George Cushen

Founder, HugoBlox

George is the creator of HugoBlox and has helped over 150,000 researchers and developers ship beautiful Hugo sites.

Continue the conversation