Make model extraction economically infeasible
Fairvisor acts as an LLM Abuse Firewall at the edge: multi-dimensional quotas, repeated-request detection, and identity-aware enforcement with auditable incident evidence.
What Fairvisor Does for LLM Providers
Multi-Dimensional Quotas
Control more than requests per second:
- Tokens/minute, tokens/day, and cost/minute
- Limits by endpoint type, model, and route
- Burst + sliding-window controls and sustained throughput caps for long-running campaigns
Identity-Aware Enforcement
Policy follows identity, not just IP:
- API key + tenant + user + IP/ASN/geo + UA bot signals
- Distinct trust profiles per key or customer segment
Repeated-Request Detection
Fairvisor detects automated loops: identical requests from the same identity within configurable time windows. If the same request (matched by content fingerprint) repeats above a threshold, Fairvisor can reject, throttle with progressive delay, or warn. Configurable threshold and window per key or tenant. → Loop DetectorEdge Enforcement Playbooks
Automate responses in real time:
- Throttle (progressive delay)
- Hard block (429/503)
- Shadow mode dry-run before enforcement
Forensics and Auditability
Produce proof for security and legal workflows:
- Who initiated traffic, when, and through which identity path
- Token/cost impact over time
- Which controls fired and why
- Exportable incident timeline for compliance and IR
Cost-Based Enforcement
Stop overages before they happen:
- Per-key and per-tenant budget counters enforced on every request
- Configurable cost limits per model, endpoint, and customer segment
- Enforcement triggers before overage — not after the bill arrives
What Repeated-Request Loops Look Like
Automated abuse doesn’t always look like DDoS. The clearest signal Fairvisor catches is exact repetition.
Key rotation loop — When one key hits its limit, traffic shifts to a new key within seconds. If the request content fingerprint is identical, Fairvisor’s loop detector fires on the new key as well. → LLM Token Limiter | Loop Detector
Automated retry storm — A client retries the same failed request hundreds of times within a short window. Loop detection identifies the pattern and triggers throttling or rejection rather than letting retries exhaust token budgets.
Minimal MVP Scope (Practical Rollout)
Start with three controls:
- Token/cost limits with burst controls
- Repeated-request (loop) detection with configurable threshold and window
- Automated responses (
throttle,block) with complete audit log
This is enough to make a credible claim: reduced risk and attack economics for model extraction attempts, with measurable controls.
Who This Is For
- Teams exposing LLM functionality as paid API products
- Vertical copilots with expensive prompt pipelines and proprietary system prompts
- Platforms where behavioral cloning risk is material
FAQ
What types of rate limits can I apply per model?
Token/minute, token/day, cost/minute — configurable per model, endpoint type, or customer segment. Separate burst and sustained throughput controls for limiting both spike and long-running campaign traffic.How does repeated-request detection work?
Fairvisor fingerprints each request (CRC32 of the content) and tracks how many times the same fingerprint appears from the same identity within a configurable time window. When the count exceeds the threshold, the configured action fires: reject, throttle (progressive delay), or warn. → Loop DetectorHow does Fairvisor track identity across key rotation?
Each key’s requests are fingerprinted independently. If a new key immediately begins sending the same exact requests that triggered loop detection on a previous key, loop detection applies to the new key as well once its own threshold is exceeded. → Loop Detector docsDoes Fairvisor support streaming (SSE) responses?
Yes. Token counting happens during streaming. If a completion exceeds configured limits mid-stream, Fairvisor closes the stream gracefully withfinish_reason: length. No corrupted responses, no wasted tokens after the cutoff.