The Hackathon Organizer Node

AI Gateway

APISIX AI Gateway for rate limiting LLM API access with per-user or per-group API keys

AI Gateway

An optional APISIX API Gateway provides token-based rate limiting and per-consumer API keys for LLM endpoints. When enabled, each user or group gets a unique API key and the gateway enforces token quotas using the ai-rate-limiting plugin.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                         Host Machine                              │
│                                                                   │
│  ┌───────────────────────────────────────────────────────────┐   │
│  │                  APISIX Gateway (:9080)                     │   │
│  │                                                            │   │
│  │  /v1/chat/completions:                                     │   │
│  │    key-auth → ai-rate-limiting → ai-proxy-multi → Lemonade │   │
│  │                                                            │   │
│  │  /v1/embeddings:                                           │   │
│  │    key-auth → proxy-rewrite → upstream proxy → Lemonade    │   │
│  └───────────────────────────────────────────────────────────┘   │
│                                 ▲                                 │
│                                 │ HTTP                            │
│  ┌──────────────────────────────┴────────────────────────────┐   │
│  │              Lemonade Server (:13305)                       │   │
│  │              OpenAI-compatible API                          │   │
│  └────────────────────────────────────────────────────────────┘   │
│                                                                   │
│  ┌──────────────────────┐  ┌──────────────────────┐              │
│  │    etcd (:2379)       │  │  Redis (:6379)       │              │
│  │  APISIX config store  │  │  Rate limit counter  │              │
│  └──────────────────────┘  └──────────────────────┘              │
└──────────────────────────────────────────────────────────────────┘

Gateway Routes

The gateway creates two APISIX routes:

Chat Completions Route (/v1/chat/completions)

Uses the ai-proxy-multi plugin for intelligent LLM load balancing with key-auth for authentication and ai-rate-limiting for token quotas.

Embedding Route (/v1/embeddings)

Uses simple upstream proxying (via proxy-rewrite) to forward embedding requests to Lemonade. This route is created automatically unless --no-embedding is specified. The embedding route enables Kilo Code's semantic code search (indexing) to go through the gateway with the same consumer API key.

Consumer Modes

per-user (default)

Each user gets their own API key and rate limit:

alpha/alice  ──> consumer "alpha-alice"  (500 tokens/60s)
alpha/bob    ──> consumer "alpha-bob"    (500 tokens/60s)
beta/dave    ──> consumer "beta-dave"    (500 tokens/60s)

Best for: Individual accountability, strict per-user limits.

per-group

Each group shares one API key with a combined rate limit:

alpha (3 users) ──> consumer "group-alpha"  (1500 tokens/60s, shared)
beta (2 users)   ──> consumer "group-beta"   (1000 tokens/60s, shared)

The group's rate limit is rate_limit_per_user * num_users_in_group, so each user effectively gets the same per-user quota but the group can redistribute unused capacity among members.

Best for: Team-based billing, shared capacity pools, simpler key management.

Setup

Installation

# During initial setup
INSTALL_GATEWAY=true bash ./scripts/setup.sh

# Or standalone
bash ./scripts/setup-apisix.sh

This installs etcd-server, redis-server, and apisix apt packages and starts the services.

CLI Setup

# Per-user mode (includes embedding route)
python scripts/apisix_gateway.py setup \
    --groups groups.yaml \
    --lemonade-url http://127.0.0.1:13305

# Per-group mode
python scripts/apisix_gateway.py setup \
    --groups groups.yaml \
    --lemonade-url http://127.0.0.1:13305 \
    --per-group

# With Redis-backed rate limiting
python scripts/apisix_gateway.py setup \
    --groups groups.yaml \
    --lemonade-url http://127.0.0.1:13305 \
    --redis-host 127.0.0.1

# Without embedding route
python scripts/apisix_gateway.py setup \
    --groups groups.yaml \
    --lemonade-url http://127.0.0.1:13305 \
    --no-embedding

# Custom embedding model
python scripts/apisix_gateway.py setup \
    --groups groups.yaml \
    --lemonade-url http://127.0.0.1:13305 \
    --embedding-model user.my-embedding-model

Running with main.py

# Per-user gateway
python ./scripts/main.py --groups groups.yaml --external-ip 1.2.3.4 --gateway

# Per-group gateway
python ./scripts/main.py --groups groups.yaml --external-ip 1.2.3.4 \
    --gateway --gateway-per-group

# With Redis
python ./scripts/main.py --groups groups.yaml --external-ip 1.2.3.4 \
    --gateway --gateway-redis-host 127.0.0.1

# Custom rate limits
python ./scripts/main.py --groups groups.yaml --external-ip 1.2.3.4 \
    --gateway --gateway-rate-limit 1000 --gateway-time-window 120

Dashboard Setup

The Streamlit dashboard provides a full AI Gateway page where you can:

  1. Toggle gateway on/off
  2. Switch between per-user and per-group mode
  3. Configure rate limits and time windows
  4. Set Redis host for shared rate limiting
  5. Setup gateway with one click (creates route + consumers from DB groups)
  6. View and manage consumers
  7. Delete individual consumers
  8. Cleanup all gateway resources

Rate Limiting Modes

ModeRedis HostPolicyScope
Local(not set)localPer-gateway-instance counters
Redis127.0.0.1redisShared across all gateway instances

Use Redis when running multiple APISIX instances or when counter persistence across APISIX restarts is needed.

Kilo Code Integration

When the gateway is enabled, main.py generates a gateway-aware kilo.json that points to the gateway instead of directly to Lemonade:

{
  "providers": {
    "lemonade-gateway": {
      "baseUrl": "http://1.2.3.4:9080",
      "apiKey": "<consumer-api-key>"
    }
  },
  "models": {
    "gemma-4-31b-it": {
      "provider": "lemonade-gateway",
      "modelId": "user.gemma-4-31b-it"
    }
  },
  "experimental": {
    "batch_tool": false,
    "codebase_search": true,
    "openTelemetry": false,
    "continue_loop_on_deny": true,
    "semantic_indexing": true,
    "agent_manager_tool": true
  },
  "indexing": {
    "enabled": true,
    "provider": "openai-compatible",
    "vectorStore": "lancedb",
    "openai-compatible": {
      "baseUrl": "http://1.2.3.4:9080/v1",
      "apiKey": "<consumer-api-key>",
      "model": "user.harrier-oss-v1-0.6b"
    }
  }
}

The indexing section configures Kilo Code's semantic code search to use the gateway's embedding route (/v1/embeddings) with the same consumer API key. This means embedding requests are also rate-limited by the consumer's token quota.

In per-group mode, all users in the same group receive the same kilo.json with the shared group API key.

When --no-embedding is used during setup, the indexing section is omitted from the generated kilo.json.

API Endpoints

MethodPathDescription
GET/api/gateway/statusGateway status (running, consumers, route, redis)
GET/api/gateway/consumersList consumers with API keys and rate limits
POST/api/gateway/consumersCreate consumer
DELETE/api/gateway/consumers/{username}Delete consumer
POST/api/gateway/setupFull setup (route + consumers from DB groups)
POST/api/gateway/cleanupRemove all consumers and routes
POST/api/gateway/routeCreate/update AI proxy route
DELETE/api/gateway/routeDelete AI proxy route

Environment Variables

VariableDefaultDescription
GATEWAY_ENABLEDfalseEnable AI Gateway features
GATEWAY_ADMIN_URLhttp://127.0.0.1:9180APISIX Admin API URL
GATEWAY_ADMIN_KEYedd1c9f034335f136f87ad84b625c8f1APISIX Admin API key
GATEWAY_PROXY_PORT9080APISIX proxy port
GATEWAY_REDIS_HOST(none)Redis host for rate limiting
GATEWAY_REDIS_PORT6379Redis port
GATEWAY_REDIS_PASSWORD(none)Redis password
GATEWAY_RATE_LIMIT_TOKENS500Default token limit per consumer per window
GATEWAY_RATE_LIMIT_WINDOW60Rate limit time window in seconds
GATEWAY_MODEper-userConsumer mode: per-user or per-group

On this page