AI crawlers are scraping your docs. You're paying the bill.

GPTBot, ClaudeBot, and Meta-ExternalAgent send billions of requests per day. If you host public docs, API references, or knowledge bases — you're a target. Fairvisor detects AI crawlers and enforces rate limits at the edge.

What Fairvisor Does for Content Sites

Detect AI Crawlers by User-Agent

Knows GPTBot, ClaudeBot, Meta-ExternalAgent, Bytespider, CCBot, Google-Extended, and 38 other AI crawlers — plus 1,300+ bot patterns across search, SEO, archiver, and monitoring categories. Detection runs at the edge with sub-millisecond overhead.

Enforce, Don't Just Block

Blocking crawlers entirely can cause SEO problems or break legitimate integrations. Fairvisor applies staged rate limits: allow under 80%, warn at 80%, throttle at 95%, reject at 100%. Crawlers get predictable access. Your bandwidth bill stays sane.

Shadow Mode First

Don’t guess how much crawler traffic you have. Deploy in shadow mode and measure which bots are hitting your site, at what rate, on which endpoints, and how much bandwidth they consume — before you enforce anything.

Segment by ASN Type and Bot Category

Set different limits for residential vs datacenter traffic, and separate bot categories by policy. One rule set, split enforcement by ip:type and ua:bot_category — not a single global crawler rule.

Fairvisor vs. robots.txt

robots.txt Fairvisor
Enforcement Honor system Edge enforcement
Compliance model Advisory (crawler decides) Inline enforcement in your traffic path
Granularity Allow/disallow paths Per-bot rate limits with staged actions
Analytics None Full dashboard: volume, bandwidth, trends
Response Block or nothing Warn → throttle → reject
Latency impact Zero (static file) Adds an edge decision step
Safe rollout No Shadow mode

Bot Traffic by Category

Fairvisor ships with 1,335 bot patterns across 7 categories. What each category means for a content or docs site:

AI bots (44 patterns)

— GPTBot, ClaudeBot, Meta-ExternalAgent, Bytespider, and 40 others. These are the crawlers consuming your bandwidth for training data. → Bot categories docs

Search engine bots (290 patterns)

— Googlebot, Bingbot, and the long tail of search indexers. You probably want to allow these. Fairvisor lets you set different limits per category, not one global crawler rule.

SEO research bots (31 patterns)

— Ahrefs, Semrush, Moz. High-volume, paid tools that crawl aggressively. Worth rate-limiting separately from search engines.

Archiver bots (43 patterns)

— Internet Archive, Common Crawl. Generally benign but can generate significant load.

Monitoring & uptime bots (65 patterns)

— Pingdom, UptimeRobot, StatusCake. You want these to work. Fairvisor can explicitly allow them by name.

Other bots (825 patterns)

— Everything else: feed readers, link checkers, research crawlers. The long tail that IP-based rules miss entirely.

Who This Is For

  • DevRel teams managing public documentation sites
  • Platform teams hosting API references and developer portals
  • Content sites with technical blogs and knowledge bases
  • Open source projects with hosted docs (GitBook, Docusaurus, MkDocs)

FAQ

How does Fairvisor identify AI crawlers?

By User-Agent string matched against a database of 44 known AI crawler patterns: GPTBot, ClaudeBot, Meta-ExternalAgent, Bytespider, CCBot, Google-Extended, and others. Detection runs at the edge with sub-millisecond overhead and no false-positive risk for real browser traffic.

Will rate-limiting bots hurt my SEO?

Fairvisor rate-limits, it doesn’t block. Search engine bots (Googlebot, Bingbot, etc.) can be explicitly allowed or given higher limits than AI training crawlers. You set different limits per bot category — not one kill-switch for all crawlers. → Bot categories docs

What does shadow mode tell me before I enforce anything?

Which bots are hitting your site, how often, on which pages, and how much bandwidth they consume — all without changing any behavior. Most teams are surprised how much traffic is non-human once they actually look at the breakdown.

How does Fairvisor handle bots that ignore robots.txt?

Unlike robots.txt, Fairvisor enforces in the request path. Bots receive rate-limit responses based on policy regardless of robots.txt behavior.

Does it work with my existing docs infrastructure?

Yes. Deploy in front of GitBook, Docusaurus, MkDocs, Confluence, or any static host. No changes to your docs infrastructure. Works as Nginx/Traefik/Envoy middleware or standalone reverse proxy. → Deployment options

Can I set stricter limits only for AI bots while leaving search bots mostly untouched?

Yes. Policies can target ua:bot_category, so AI training crawlers get tighter quotas while search engine and monitoring bots keep higher limits. This avoids one global crawler policy that penalizes everything equally.

Why teams choose Fairvisor

Measurable savings, not estimates

Shadow mode shows exactly how much crawler traffic you have before you touch a single policy — so savings are real numbers, not guesses.

Granular control, not a kill switch

Staged enforcement lets you rate-limit bots without SEO risk or breaking legitimate integrations.

Works across any stack

Edge enforcement with no changes to your docs infrastructure. Deploy in front of GitBook, Docusaurus, MkDocs, or any static host.

Stop subsidizing AI training with your bandwidth budget

See your crawler traffic (deploy in shadow mode)