AI Gateway
APISIX AI Gateway for rate limiting LLM API access with per-user or per-group API keys
AI Gateway
An optional APISIX API Gateway provides token-based rate limiting and per-consumer API keys
for LLM endpoints. When enabled, each user or group gets a unique API key and the gateway
enforces token quotas using the ai-rate-limiting plugin.
Architecture
┌──────────────────────────────────────────────────────────────────┐
│ Host Machine │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ APISIX Gateway (:9080) │ │
│ │ │ │
│ │ /v1/chat/completions: │ │
│ │ key-auth → ai-rate-limiting → ai-proxy-multi → Lemonade │ │
│ │ │ │
│ │ /v1/embeddings: │ │
│ │ key-auth → proxy-rewrite → upstream proxy → Lemonade │ │
│ └───────────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ HTTP │
│ ┌──────────────────────────────┴────────────────────────────┐ │
│ │ Lemonade Server (:13305) │ │
│ │ OpenAI-compatible API │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────┐ ┌──────────────────────┐ │
│ │ etcd (:2379) │ │ Redis (:6379) │ │
│ │ APISIX config store │ │ Rate limit counter │ │
│ └──────────────────────┘ └──────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘Gateway Routes
The gateway creates two APISIX routes:
Chat Completions Route (/v1/chat/completions)
Uses the ai-proxy-multi plugin for intelligent LLM load balancing with
key-auth for authentication and ai-rate-limiting for token quotas.
Embedding Route (/v1/embeddings)
Uses simple upstream proxying (via proxy-rewrite) to forward embedding
requests to Lemonade. This route is created automatically unless --no-embedding
is specified. The embedding route enables Kilo Code's semantic code search
(indexing) to go through the gateway with the same consumer API key.
Consumer Modes
per-user (default)
Each user gets their own API key and rate limit:
alpha/alice ──> consumer "alpha-alice" (500 tokens/60s)
alpha/bob ──> consumer "alpha-bob" (500 tokens/60s)
beta/dave ──> consumer "beta-dave" (500 tokens/60s)Best for: Individual accountability, strict per-user limits.
per-group
Each group shares one API key with a combined rate limit:
alpha (3 users) ──> consumer "group-alpha" (1500 tokens/60s, shared)
beta (2 users) ──> consumer "group-beta" (1000 tokens/60s, shared)The group's rate limit is rate_limit_per_user * num_users_in_group, so
each user effectively gets the same per-user quota but the group can
redistribute unused capacity among members.
Best for: Team-based billing, shared capacity pools, simpler key management.
Setup
Installation
# During initial setup
INSTALL_GATEWAY=true bash ./scripts/setup.sh
# Or standalone
bash ./scripts/setup-apisix.shThis installs etcd-server, redis-server, and apisix apt packages
and starts the services.
CLI Setup
# Per-user mode (includes embedding route)
python scripts/apisix_gateway.py setup \
--groups groups.yaml \
--lemonade-url http://127.0.0.1:13305
# Per-group mode
python scripts/apisix_gateway.py setup \
--groups groups.yaml \
--lemonade-url http://127.0.0.1:13305 \
--per-group
# With Redis-backed rate limiting
python scripts/apisix_gateway.py setup \
--groups groups.yaml \
--lemonade-url http://127.0.0.1:13305 \
--redis-host 127.0.0.1
# Without embedding route
python scripts/apisix_gateway.py setup \
--groups groups.yaml \
--lemonade-url http://127.0.0.1:13305 \
--no-embedding
# Custom embedding model
python scripts/apisix_gateway.py setup \
--groups groups.yaml \
--lemonade-url http://127.0.0.1:13305 \
--embedding-model user.my-embedding-modelRunning with main.py
# Per-user gateway
python ./scripts/main.py --groups groups.yaml --external-ip 1.2.3.4 --gateway
# Per-group gateway
python ./scripts/main.py --groups groups.yaml --external-ip 1.2.3.4 \
--gateway --gateway-per-group
# With Redis
python ./scripts/main.py --groups groups.yaml --external-ip 1.2.3.4 \
--gateway --gateway-redis-host 127.0.0.1
# Custom rate limits
python ./scripts/main.py --groups groups.yaml --external-ip 1.2.3.4 \
--gateway --gateway-rate-limit 1000 --gateway-time-window 120Dashboard Setup
The Streamlit dashboard provides a full AI Gateway page where you can:
- Toggle gateway on/off
- Switch between per-user and per-group mode
- Configure rate limits and time windows
- Set Redis host for shared rate limiting
- Setup gateway with one click (creates route + consumers from DB groups)
- View and manage consumers
- Delete individual consumers
- Cleanup all gateway resources
Rate Limiting Modes
| Mode | Redis Host | Policy | Scope |
|---|---|---|---|
| Local | (not set) | local | Per-gateway-instance counters |
| Redis | 127.0.0.1 | redis | Shared across all gateway instances |
Use Redis when running multiple APISIX instances or when counter persistence across APISIX restarts is needed.
Kilo Code Integration
When the gateway is enabled, main.py generates a gateway-aware kilo.json
that points to the gateway instead of directly to Lemonade:
{
"providers": {
"lemonade-gateway": {
"baseUrl": "http://1.2.3.4:9080",
"apiKey": "<consumer-api-key>"
}
},
"models": {
"gemma-4-31b-it": {
"provider": "lemonade-gateway",
"modelId": "user.gemma-4-31b-it"
}
},
"experimental": {
"batch_tool": false,
"codebase_search": true,
"openTelemetry": false,
"continue_loop_on_deny": true,
"semantic_indexing": true,
"agent_manager_tool": true
},
"indexing": {
"enabled": true,
"provider": "openai-compatible",
"vectorStore": "lancedb",
"openai-compatible": {
"baseUrl": "http://1.2.3.4:9080/v1",
"apiKey": "<consumer-api-key>",
"model": "user.harrier-oss-v1-0.6b"
}
}
}The indexing section configures Kilo Code's semantic code search to use the
gateway's embedding route (/v1/embeddings) with the same consumer API key.
This means embedding requests are also rate-limited by the consumer's token quota.
In per-group mode, all users in the same group receive the same kilo.json
with the shared group API key.
When --no-embedding is used during setup, the indexing section is omitted
from the generated kilo.json.
API Endpoints
| Method | Path | Description |
|---|---|---|
GET | /api/gateway/status | Gateway status (running, consumers, route, redis) |
GET | /api/gateway/consumers | List consumers with API keys and rate limits |
POST | /api/gateway/consumers | Create consumer |
DELETE | /api/gateway/consumers/{username} | Delete consumer |
POST | /api/gateway/setup | Full setup (route + consumers from DB groups) |
POST | /api/gateway/cleanup | Remove all consumers and routes |
POST | /api/gateway/route | Create/update AI proxy route |
DELETE | /api/gateway/route | Delete AI proxy route |
Environment Variables
| Variable | Default | Description |
|---|---|---|
GATEWAY_ENABLED | false | Enable AI Gateway features |
GATEWAY_ADMIN_URL | http://127.0.0.1:9180 | APISIX Admin API URL |
GATEWAY_ADMIN_KEY | edd1c9f034335f136f87ad84b625c8f1 | APISIX Admin API key |
GATEWAY_PROXY_PORT | 9080 | APISIX proxy port |
GATEWAY_REDIS_HOST | (none) | Redis host for rate limiting |
GATEWAY_REDIS_PORT | 6379 | Redis port |
GATEWAY_REDIS_PASSWORD | (none) | Redis password |
GATEWAY_RATE_LIMIT_TOKENS | 500 | Default token limit per consumer per window |
GATEWAY_RATE_LIMIT_WINDOW | 60 | Rate limit time window in seconds |
GATEWAY_MODE | per-user | Consumer mode: per-user or per-group |