AI crawlers are scraping your docs. You're paying the bill.
GPTBot, ClaudeBot, and Meta-ExternalAgent send billions of requests per day. If you host public docs, API references, or knowledge bases — you're a target. Fairvisor detects AI crawlers and enforces rate limits at the edge.
What Fairvisor Does for Content Sites
Detect AI Crawlers by User-Agent
Knows GPTBot, ClaudeBot, Meta-ExternalAgent, Bytespider, CCBot, Google-Extended, and 38 other AI crawlers — plus 1,300+ bot patterns across search, SEO, archiver, and monitoring categories. Detection runs at the edge with sub-millisecond overhead.Enforce, Don't Just Block
Blocking crawlers entirely can cause SEO problems or break legitimate integrations. Fairvisor applies staged rate limits: allow under 80%, warn at 80%, throttle at 95%, reject at 100%. Crawlers get predictable access. Your bandwidth bill stays sane.Shadow Mode First
Don’t guess how much crawler traffic you have. Deploy in shadow mode and measure which bots are hitting your site, at what rate, on which endpoints, and how much bandwidth they consume — before you enforce anything.Segment by ASN Type and Bot Category
Set different limits for residential vs datacenter traffic, and separate bot categories by policy. One rule set, split enforcement byip:type and ua:bot_category — not a single global crawler rule.
Fairvisor vs. robots.txt
| robots.txt | Fairvisor | |
|---|---|---|
| Enforcement | Honor system | Edge enforcement |
| Compliance model | Advisory (crawler decides) | Inline enforcement in your traffic path |
| Granularity | Allow/disallow paths | Per-bot rate limits with staged actions |
| Analytics | None | Full dashboard: volume, bandwidth, trends |
| Response | Block or nothing | Warn → throttle → reject |
| Latency impact | Zero (static file) | Adds an edge decision step |
| Safe rollout | No | Shadow mode |
Bot Traffic by Category
Fairvisor ships with 1,335 bot patterns across 7 categories. What each category means for a content or docs site:
AI bots (44 patterns)
— GPTBot, ClaudeBot, Meta-ExternalAgent, Bytespider, and 40 others. These are the crawlers consuming your bandwidth for training data. → Bot categories docsSearch engine bots (290 patterns)
— Googlebot, Bingbot, and the long tail of search indexers. You probably want to allow these. Fairvisor lets you set different limits per category, not one global crawler rule.SEO research bots (31 patterns)
— Ahrefs, Semrush, Moz. High-volume, paid tools that crawl aggressively. Worth rate-limiting separately from search engines.Archiver bots (43 patterns)
— Internet Archive, Common Crawl. Generally benign but can generate significant load.Monitoring & uptime bots (65 patterns)
— Pingdom, UptimeRobot, StatusCake. You want these to work. Fairvisor can explicitly allow them by name.Other bots (825 patterns)
— Everything else: feed readers, link checkers, research crawlers. The long tail that IP-based rules miss entirely.Who This Is For
- DevRel teams managing public documentation sites
- Platform teams hosting API references and developer portals
- Content sites with technical blogs and knowledge bases
- Open source projects with hosted docs (GitBook, Docusaurus, MkDocs)
FAQ
How does Fairvisor identify AI crawlers?
By User-Agent string matched against a database of 44 known AI crawler patterns: GPTBot, ClaudeBot, Meta-ExternalAgent, Bytespider, CCBot, Google-Extended, and others. Detection runs at the edge with sub-millisecond overhead and no false-positive risk for real browser traffic.Will rate-limiting bots hurt my SEO?
Fairvisor rate-limits, it doesn’t block. Search engine bots (Googlebot, Bingbot, etc.) can be explicitly allowed or given higher limits than AI training crawlers. You set different limits per bot category — not one kill-switch for all crawlers. → Bot categories docsWhat does shadow mode tell me before I enforce anything?
Which bots are hitting your site, how often, on which pages, and how much bandwidth they consume — all without changing any behavior. Most teams are surprised how much traffic is non-human once they actually look at the breakdown.How does Fairvisor handle bots that ignore robots.txt?
Unlike robots.txt, Fairvisor enforces in the request path. Bots receive rate-limit responses based on policy regardless of robots.txt behavior.Does it work with my existing docs infrastructure?
Yes. Deploy in front of GitBook, Docusaurus, MkDocs, Confluence, or any static host. No changes to your docs infrastructure. Works as Nginx/Traefik/Envoy middleware or standalone reverse proxy. → Deployment optionsCan I set stricter limits only for AI bots while leaving search bots mostly untouched?
Yes. Policies can targetua:bot_category, so AI training crawlers get tighter quotas while search engine and monitoring bots keep higher limits. This avoids one global crawler policy that penalizes everything equally.