APISIX AI Gateway for rate limiting LLM API access with per-user or per-group API keys

AI Gateway

An optional APISIX API Gateway provides token-based rate limiting and per-consumer API keys for LLM endpoints. When enabled, each user or group gets a unique API key and the gateway enforces token quotas using the ai-rate-limiting plugin.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                         Host Machine                              │
│                                                                   │
│  ┌───────────────────────────────────────────────────────────┐   │
│  │                  APISIX Gateway (:9080)                     │   │
│  │                                                            │   │
│  │  /v1/chat/completions:                                     │   │
│  │    key-auth → ai-rate-limiting → ai-proxy-multi → Lemonade │   │
│  │                                                            │   │
│  │  /v1/embeddings:                                           │   │
│  │    key-auth → proxy-rewrite → upstream proxy → Lemonade    │   │
│  └───────────────────────────────────────────────────────────┘   │
│                                 ▲                                 │
│                                 │ HTTP                            │
│  ┌──────────────────────────────┴────────────────────────────┐   │
│  │              Lemonade Server (:13305)                       │   │
│  │              OpenAI-compatible API                          │   │
│  └────────────────────────────────────────────────────────────┘   │
│                                                                   │
│  ┌──────────────────────┐  ┌──────────────────────┐              │
│  │    etcd (:2379)       │  │  Redis (:6379)       │              │
│  │  APISIX config store  │  │  Rate limit counter  │              │
│  └──────────────────────┘  └──────────────────────┘              │
└──────────────────────────────────────────────────────────────────┘

Gateway Routes

The gateway creates two APISIX routes:

Chat Completions Route (`/v1/chat/completions`)

Uses the ai-proxy-multi plugin for intelligent LLM load balancing with key-auth for authentication and ai-rate-limiting for token quotas.

Embedding Route (`/v1/embeddings`)

Uses simple upstream proxying (via proxy-rewrite) to forward embedding requests to Lemonade. This route is created automatically unless --no-embedding is specified. The embedding route enables Kilo Code's semantic code search (indexing) to go through the gateway with the same consumer API key.

Consumer Modes

per-user (default)

Each user gets their own API key and rate limit:

alpha/alice  ──> consumer "alpha-alice"  (500 tokens/60s)
alpha/bob    ──> consumer "alpha-bob"    (500 tokens/60s)
beta/dave    ──> consumer "beta-dave"    (500 tokens/60s)

Best for: Individual accountability, strict per-user limits.

per-group

Each group shares one API key with a combined rate limit:

alpha (3 users) ──> consumer "group-alpha"  (1500 tokens/60s, shared)
beta (2 users)   ──> consumer "group-beta"   (1000 tokens/60s, shared)

The group's rate limit is rate_limit_per_user * num_users_in_group, so each user effectively gets the same per-user quota but the group can redistribute unused capacity among members.

Best for: Team-based billing, shared capacity pools, simpler key management.

Setup

Installation

# During initial setup
INSTALL_GATEWAY=true bash ./scripts/setup.sh

# Or standalone
bash ./scripts/setup-apisix.sh

This installs etcd-server, redis-server, and apisix apt packages and starts the services.

CLI Setup

# Per-user mode (includes embedding route)
python scripts/apisix_gateway.py setup \
    --groups groups.yaml \
    --lemonade-url http://127.0.0.1:13305

# Per-group mode
python scripts/apisix_gateway.py setup \
    --groups groups.yaml \
    --lemonade-url http://127.0.0.1:13305 \
    --per-group

# With Redis-backed rate limiting
python scripts/apisix_gateway.py setup \
    --groups groups.yaml \
    --lemonade-url http://127.0.0.1:13305 \
    --redis-host 127.0.0.1

# Without embedding route
python scripts/apisix_gateway.py setup \
    --groups groups.yaml \
    --lemonade-url http://127.0.0.1:13305 \
    --no-embedding

# Custom embedding model
python scripts/apisix_gateway.py setup \
    --groups groups.yaml \
    --lemonade-url http://127.0.0.1:13305 \
    --embedding-model user.my-embedding-model

Running with main.py

# Per-user gateway
python ./scripts/main.py --groups groups.yaml --external-ip 1.2.3.4 --gateway

# Per-group gateway
python ./scripts/main.py --groups groups.yaml --external-ip 1.2.3.4 \
    --gateway --gateway-per-group

# With Redis
python ./scripts/main.py --groups groups.yaml --external-ip 1.2.3.4 \
    --gateway --gateway-redis-host 127.0.0.1

# Custom rate limits
python ./scripts/main.py --groups groups.yaml --external-ip 1.2.3.4 \
    --gateway --gateway-rate-limit 1000 --gateway-time-window 120

Dashboard Setup

The Streamlit dashboard provides a full AI Gateway page where you can:

Toggle gateway on/off
Switch between per-user and per-group mode
Configure rate limits and time windows
Set Redis host for shared rate limiting
Setup gateway with one click (creates route + consumers from DB groups)
View and manage consumers
Delete individual consumers
Cleanup all gateway resources

Rate Limiting Modes

Mode	Redis Host	Policy	Scope
Local	(not set)	`local`	Per-gateway-instance counters
Redis	`127.0.0.1`	`redis`	Shared across all gateway instances

Use Redis when running multiple APISIX instances or when counter persistence across APISIX restarts is needed.

Kilo Code Integration

When the gateway is enabled, main.py generates a gateway-aware kilo.json that points to the gateway instead of directly to Lemonade:

{
  "providers": {
    "lemonade-gateway": {
      "baseUrl": "http://1.2.3.4:9080",
      "apiKey": "<consumer-api-key>"
    }
  },
  "models": {
    "gemma-4-31b-it": {
      "provider": "lemonade-gateway",
      "modelId": "user.gemma-4-31b-it"
    }
  },
  "experimental": {
    "batch_tool": false,
    "codebase_search": true,
    "openTelemetry": false,
    "continue_loop_on_deny": true,
    "semantic_indexing": true,
    "agent_manager_tool": true
  },
  "indexing": {
    "enabled": true,
    "provider": "openai-compatible",
    "vectorStore": "lancedb",
    "openai-compatible": {
      "baseUrl": "http://1.2.3.4:9080/v1",
      "apiKey": "<consumer-api-key>",
      "model": "user.harrier-oss-v1-0.6b"
    }
  }
}

The indexing section configures Kilo Code's semantic code search to use the gateway's embedding route (/v1/embeddings) with the same consumer API key. This means embedding requests are also rate-limited by the consumer's token quota.

In per-group mode, all users in the same group receive the same kilo.json with the shared group API key.

When --no-embedding is used during setup, the indexing section is omitted from the generated kilo.json.

API Endpoints

Method	Path	Description
`GET`	`/api/gateway/status`	Gateway status (running, consumers, route, redis)
`GET`	`/api/gateway/consumers`	List consumers with API keys and rate limits
`POST`	`/api/gateway/consumers`	Create consumer
`DELETE`	`/api/gateway/consumers/{username}`	Delete consumer
`POST`	`/api/gateway/setup`	Full setup (route + consumers from DB groups)
`POST`	`/api/gateway/cleanup`	Remove all consumers and routes
`POST`	`/api/gateway/route`	Create/update AI proxy route
`DELETE`	`/api/gateway/route`	Delete AI proxy route

Environment Variables

Variable	Default	Description
`GATEWAY_ENABLED`	`false`	Enable AI Gateway features
`GATEWAY_ADMIN_URL`	`http://127.0.0.1:9180`	APISIX Admin API URL
`GATEWAY_ADMIN_KEY`	`edd1c9f034335f136f87ad84b625c8f1`	APISIX Admin API key
`GATEWAY_PROXY_PORT`	`9080`	APISIX proxy port
`GATEWAY_REDIS_HOST`	(none)	Redis host for rate limiting
`GATEWAY_REDIS_PORT`	`6379`	Redis port
`GATEWAY_REDIS_PASSWORD`	(none)	Redis password
`GATEWAY_RATE_LIMIT_TOKENS`	`500`	Default token limit per consumer per window
`GATEWAY_RATE_LIMIT_WINDOW`	`60`	Rate limit time window in seconds
`GATEWAY_MODE`	`per-user`	Consumer mode: `per-user` or `per-group`

AI Gateway

On this page