OpenAI destroying my usage

jamesrsingleton · July 2, 2025, 4:37pm

I noticed that OpenAI has been hitting my website and hitting it hard. It will hit pages with query params like ?page=1513\ which obviously does not exist. I know I can deny AI Bots in Vercel which I have done in the meantime to stop OpenAI from hitting the website hard. However, I was wondering if there’s a better way to at least let OpenAI and other AI bots know what pages are actually available so it’s not just randomly scraping my website over and over again with incorrect params and paths.

Here is the User Agent according to Vercel

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)

amyegan · July 16, 2025, 5:07pm

llms.txt is what you’d want to use to help LLMs understand you site’s structure and content

jamesrsingleton · July 16, 2025, 5:11pm

Do AI’s adhere to llms.txt? Reading through What Is llms.txt, and Should You Care About It?, it looks like none of the major ones support it. Furthermore, what would an llms.txt look like? I couldn’t find an example in the docs other than you could define the route.

I am just curious how it would look to tell the llm what pages are available etc.

amyegan · July 16, 2025, 5:14pm

This post has a nice explanation: LLMs.txt Explained | TDS Archive

There are a few tools out there now to convert sitemaps.xml to llms.txt if you don’t want to build your own solution. I haven’t tried this one myself, but llmstxt looks like a convenient option

jamesrsingleton · July 16, 2025, 5:33pm

Thanks! I will take a look. I don’t have paginated pages in my sitemap.xml because I couldn’t find any solid advise on if they should or shouldn’t be included and I figured since I have pagination links it wouldn’t be needed. It would be nice if there was a guide on creating an llms.txt or an example.

amyegan · July 16, 2025, 7:00pm

Examples are definitely helpful! You can find format, examples, and more info at llmstxt.org

system · October 14, 2025, 7:01pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
500 Error When Accessing Site via AI Tools (OpenAI / ChatGPT) Help ai	5	434	September 4, 2025
Idea: What if you serve only JSON data to AI Crawlers? Showcase ai , seo	2	53	July 1, 2025
Broken link in WAF Examples Feedback	1	10	March 1, 2025
News Cache (2025-01-06) Announcements news-cache	3	81	July 5, 2025
Getting Robots.txt unreachable Error Help	2	178	October 13, 2024

OpenAI destroying my usage

Related topics