# Fairvisor for Content Sites & Documentation

URL: https://fairvisor.com/for/content-sites/

---


 AI crawlers are scraping your docs. You're paying the bill. GPTBot, ClaudeBot, and Meta-ExternalAgent send billions of requests per day. If you host public docs, API references, or knowledge bases — you're a target. Fairvisor detects AI crawlers and enforces rate limits at the edge.
See your crawler traffic (deploy in shadow mode) What Fairvisor Does for Content Sites Detect AI Crawlers by User-Agent Knows GPTBot, ClaudeBot, Meta-ExternalAgent, Bytespider, CCBot, Google-Extended, and 38 other AI crawlers — plus 1,300+ bot patterns across search, SEO, archiver, and monitoring categories. Detection runs at the edge with sub-millisecond overhead. Enforce, Don't Just Block Blocking crawlers entirely can cause SEO problems or break legitimate integrations. Fairvisor applies staged rate limits: allow under 80%, warn at 80%, throttle at 95%, reject at 100%. Crawlers get predictable access. Your bandwidth bill stays sane. Shadow Mode First Don’t guess how much crawler traffic you have. Deploy in shadow mode and measure which bots are hitting your site, at what rate, on which endpoints, and how much bandwidth they consume — before you enforce anything. Segment by ASN Type and Bot Category Set different limits for residential vs datacenter traffic, and separate bot categories by policy. One rule set, split enforcement by ip:type and ua:bot_category — not a single global crawler rule. Fairvisor vs. robots.txt robots.txt Fairvisor Enforcement Honor system Edge enforcement Compliance model Advisory (crawler decides) Inline enforcement in your traffic path Granularity Allow/disallow paths Per-bot rate limits with staged actions Analytics None Full dashboard: volume, bandwidth, trends Response Block or nothing Warn → throttle → reject Latency impact Zero (static file) Adds an edge decision step Safe rollout No Shadow mode Bot Traffic by Category Fairvisor ships with 1,335 bot patterns across 7 categories. What each category means for a content or docs site:
AI bots (44 patterns) — GPTBot, ClaudeBot, Meta-ExternalAgent, Bytespider, and 40 others. These are the crawlers consuming your bandwidth for training data. → Bot categories docs Search engine bots (290 patterns) — Googlebot, Bingbot, and the long tail of search indexers. You probably want to allow these. Fairvisor lets you set different limits per category, not one global crawler rule. SEO research bots (31 patterns) — Ahrefs, Semrush, Moz. High-volume, paid tools that crawl aggressively. Worth rate-limiting separately from search engines. Archiver bots (43 patterns) — Internet Archive, Common Crawl. Generally benign but can generate significant load. Monitoring & uptime bots (65 patterns) — Pingdom, UptimeRobot, StatusCake. You want these to work. Fairvisor can explicitly allow them by name. Other bots (825 patterns) — Everything else: feed readers, link checkers, research crawlers. The long tail that IP-based rules miss entirely. Who This Is For DevRel teams managing public documentation sites Platform teams hosting API references and developer portals Content sites with technical blogs and knowledge bases Open source projects with hosted docs (GitBook, Docusaurus, MkDocs) FAQ How does Fairvisor identify AI crawlers? By User-Agent string matched against a database of 44 known AI crawler patterns: GPTBot, ClaudeBot, Meta-ExternalAgent, Bytespider, CCBot, Google-Extended, and others. Detection runs at the edge with sub-millisecond overhead and no false-positive risk for real browser traffic. Will rate-limiting bots hurt my SEO? Fairvisor rate-limits, it doesn’t block. Search engine bots (Googlebot, Bingbot, etc.) can be explicitly allowed or given higher limits than AI training crawlers. You set different limits per bot category — not one kill-switch for all crawlers. → Bot categories docs What does shadow mode tell me before I enforce anything? Which bots are hitting your site, how often, on which pages, and how much bandwidth they consume — all without changing any behavior. Most teams are surprised how much traffic is non-human once they actually look at the breakdown. How does Fairvisor handle bots that ignore robots.txt? Unlike robots.txt, Fairvisor enforces in the request path. Bots receive rate-limit responses based on policy regardless of robots.txt behavior. Does it work with my existing docs infrastructure? Yes. Deploy in front of GitBook, Docusaurus, MkDocs, Confluence, or any static host. No changes to your docs infrastructure. Works as Nginx/Traefik/Envoy middleware or standalone reverse proxy. → Deployment options Can I set stricter limits only for AI bots while leaving search bots mostly untouched? Yes. Policies can target ua:bot_category, so AI training crawlers get tighter quotas while search engine and monitoring bots keep higher limits. This avoids one global crawler policy that penalizes everything equally. Why teams choose Fairvisor Measurable savings, not estimates Shadow mode shows exactly how much crawler traffic you have before you touch a single policy — so savings are real numbers, not guesses. Granular control, not a kill switch Staged enforcement lets you rate-limit bots without SEO risk or breaking legitimate integrations. Works across any stack Edge enforcement with no changes to your docs infrastructure. Deploy in front of GitBook, Docusaurus, MkDocs, or any static host. Stop subsidizing AI training with your bandwidth budget See your crawler traffic (deploy in shadow mode) Also relevant For AdTech & Media ASN-aware policies and Tor tagging for high-volume media and affiliate APIs.
For API-First SaaS Per-tenant limits, noisy neighbor protection, and tiered plan enforcement.
For SRE Sub-millisecond enforcement, graceful degradation, and SLO alerting.

