Built a real-time AI bot tracker on Vercel + Neon in a weekend

Built a real-time AI bot tracker on Vercel + Neon in a weekend

Most site owners look at Google Analytics and think they understand
their traffic.

They don’t. The most interesting visitors to your site never show up in GA at all.


I’m William. Retired construction worker. Self-taught programmer. I
built https://aeofix.com — an Answer Engine Optimization platform —
using Vercel, Neon, and Claude as my pair programmer.

A few months in, I noticed something: AI crawlers were hitting my
site constantly and I had zero visibility into which ones, how
often, or what pages they cared about. ChatGPT’s crawler.
Anthropic’s. Perplexity. Grok. Google’s AI training bots. All
invisible.

So I built a tracker. Here’s how it works.


The stack

  • Vercel Edge Middleware — runs on every request, classifies the bot before the page loads
  • Vercel Serverless Functions — /api/bot-pixel logs the visit; cron
    jobs run classification and digests
  • Vercel KV — rate limiting (3 req / 2s; scrapers always trip this)
  • Neon PostgreSQL — stores every visit with UA, ASN, country, page,
    confidence score, tier
  • Gemini 2.5-flash — weekly narrative intelligence digest on crawl
    patterns

The pixel

Every page gets a 1×1 transparent GIF:

When it fires, the serverless function fingerprints the UA, looks up the ASN via ip-api.com, runs reverse DNS if it looks like Google,
and writes the classified visit to Neon in under 50ms.


What it detects

50+ named bots across categories:

┌────────────────────┬──────────────────────────────────────────┐
│ Category │ Examples │
├────────────────────┼──────────────────────────────────────────┤
│ AI Training │ GPTBot, ClaudeBot, Google-Extended, │
│ │ Mistral │
├────────────────────┼──────────────────────────────────────────┤
│ AI Search │ OAI-SearchBot, PerplexityBot, │
│ │ ChatGPT-User, Grok │
├────────────────────┼──────────────────────────────────────────┤
│ AI Assistant │ Amazonbot, Applebot-Extended │
├────────────────────┼──────────────────────────────────────────┤
│ Search Index │ Googlebot, Bingbot, DuckDuckBot │
├────────────────────┼──────────────────────────────────────────┤
│ SEO Tool │ AhrefsBot, SemrushBot, SE Ranking │
├────────────────────┼──────────────────────────────────────────┤
│ Social Media │ Slackbot, LinkedInBot, TwitterBot │
├────────────────────┼──────────────────────────────────────────┤
│ Vulnerability │ Nikto, SQLMap, ZGrab │
│ Scanner │ │
└────────────────────┴──────────────────────────────────────────┘

The honeypot

Unrecognized bots get silently redirected to a fake services page
with wrong pricing. They never know they’re in it. HubSpot’s domain
crawler goes straight there. So does anything that trips the rate
limiter.

Surprising insight #1: When Slackbot-LinkExpanding hits your site,
it means someone just pasted your URL into a Slack channel. That’s a positive signal — someone shared your content. I track it as Social Media, tier 3.

Surprising insight #2: “Googlebot” from a Google ASN ≠ verified
Googlebot. The middleware does a reverse DNS lookup and checks the
hostname ends in .googlebot.com or .google.com. Spoofers fail this.
Real crawlers pass it and get logged as Googlebot (Verified) with
97% confidence.

The classification pipeline

Bots that slip through as Unknown Bot get a second chance. A cron
job runs every 6 hours and re-processes them through 4 layers:

  1. ASN mapping (Google, Meta, Microsoft, Amazon IPs)
  2. UA substring patterns (50+ known signatures)
  3. Accept header analysis (image-only = rendering bot)
  4. Behavioral heuristics (10+ hits in 1 hour = aggressive scraper)

The dashboard

Six tabs: Overview, Bots, Pages, Countries, AI Funnel, and an
Intelligence tab in progress. The AI Funnel tab cross-references bot crawl frequency against actual ChatGPT-referred sessions.

Gemini weekly digest

Every Monday at 9am UTC, Gemini 2.5-flash gets the week’s crawl data and writes a narrative report — which bots increased activity,
which pages AI crawlers are focused on, anomalies worth watching. It lands in my inbox automatically.


I named my internal analysis bot Archimedes/1.0. My external site
scanner uses aeofixtbot/1.0 and declares itself in robots.txt so
site owners can allow or block it — because respecting robots.txt
matters.

The whole thing runs on Vercel’s free/hobby tier + Neon’s free tier. Cold start on the pixel endpoint is around 80ms. Warm is under
20ms.

If you’re building anything in the AEO / AI visibility space and
want to know which AI engines are actually crawling your site — this is the answer.

Happy to answer questions on the architecture, the Neon schema, or
the Gemini prompt structure. The live dashboard is at
AI Bot Tracker — AI Ingestion Intelligence for AEO | AEOfix if you want to see what it looks
like.