facebookexternalhit receiving persistent 403 on custom domain — all other scrapers work

Update for anyone hitting this — GitHub Pages works as a proxy
where Vercel and shared cPanel hosting both fail. Created a simple
HTML page on GitHub Pages with:

  • All OG meta tags pointing to my Promly content
  • og:url set to the real domain (so FB card shows the right hostname label)
  • JS redirect to the real site
  • og:image hosted on GitHub Pages (FB fetches it without issue)

FB Debugger now returns 200 + full preview card. Users see “PROMLY.AI”
on the card thanks to og:url, but the actual scraped URL is a GitHub
Pages address that doesn’t hit Vercel’s edge filter.

Still waiting for proper Vercel-side fix, but this unblocks the launch.

Hi there,

Thanks for sharing your workaround with the community, and glad to hear this unblocked your launch.

After reviewing this further, this appears to have been a transient issue with the Facebook Sharing Debugger:

  • During our investigation of similar reports, we did not identify any active blocks or deny actions affecting Facebook crawler traffic at the edge or proxy layers.
  • We did receive multiple reports around the same timeframe that exhibited similar behavior, all of which appear to have resolved without any changes on Vercel’s side.

For anyone encountering this in the future, a few suggested troubleshooting steps:

Scrape Again to clear a stale cache:

  • The Facebook Sharing Debugger maintains its own cache at the domain level.
    • This means testing a different path or appending a query parameter (e.g. ?v=2) can return the same stale result even if that specific URL has never been fetched before.
  • Clicking “Scrape Again” once is typically not enough to clear the stale cache.
    • Running the scrape 2-3 times forces the debugger to re-fetch and overwrite the cached result.
    • If the issue is purely a stale cache, the status should update to 200 after one of those attempts.
  • Clearing the cache this way is often required after any recent change to Deployment Protection settings or firewall rules.

Check robots.txt

  • If the debugger shows the message “This response could be from a robots.txt block”, we recommend reviewing the robots.txt.
    • For example, if there is something likeDisallow: /*?* rule under User-Agent: * in robots.txt, Facebook’s parser will apply this wildcard disallow to query string URLs even when a dedicated facebookexternalhit block might exist.
    • Adding an explicit Allow: /*?* line to the facebookexternalhit block often resolves this.

For Pro or higher customers, try adding a Custom WAF Rule for the Facebook ASN

  • Meta operates a large pool of crawler IPs under AS32934 and rotates them regularly.
  • Adding a Custom WAF Rule targeting Facebook’s AS32934 can help eliminate the possibility that this may be a proxy-level block.
  • You add this following the steps below:
    • Navigate to Project → Security → Firewall → Rules → Add New → Custom Rule
    • In the rule prompt, type: “Create a rule named ‘Facebook ASN Bypass’ which bypasses if ASN equals 32934”
    • Click “Generate”, then “Save”
    • This covers all current and future Meta crawler IPs without needing to maintain individual IP ranges as Meta rotates addresses.

As a final note, the 403 error shown in the Facebook debugger does not necessarily indicate that Vercel is actively returning a 403 response.

  • This can occur for a variety of reasons, including stale or cached debugger results.
  • In some cases, the debugger may also surface a 403 as a general catch-all response rather than reflecting the exact origin behavior.

Hope you are now seeing this issue resolved for you, but if this resurfaces or the workaround stops holding, please let us know.