Initial commit: CT 116 inference harness — nginx, LiteLLM, router, dashboard, Redis
- Complexity-based routing (MoE default, Dense heavy, Gemma light) - Per-agent API keys with metrics tracking - Time-series usage graphs (24h/7d/30d) - Streaming support (SSE passthrough) - Unicode cleanup (ASCII-only output) - Vision support (gemma-4-E4B) - Tier enforcement (starter/professional/enterprise) - GPU health monitoring via sidecar polling - Unified dashboard with line graph
This commit is contained in:
@@ -0,0 +1,25 @@
|
||||
model_list:
|
||||
- model_name: qwen3.6-35B-A3B
|
||||
litellm_params:
|
||||
model: openai/qwen3.6-35B-A3B
|
||||
api_base: http://192.168.68.15:8080/v1
|
||||
api_key: "not-needed"
|
||||
|
||||
- model_name: qwen3.6-27B-code
|
||||
litellm_params:
|
||||
model: openai/qwen3.6-27B-code-text
|
||||
api_base: http://192.168.68.8:8080/v1
|
||||
api_key: "not-needed"
|
||||
|
||||
- model_name: gemma-4-E4B
|
||||
litellm_params:
|
||||
model: openai/gemma-4-E4B
|
||||
api_base: http://192.168.68.110:8080/v1
|
||||
api_key: "not-needed"
|
||||
|
||||
general_settings:
|
||||
master_key: sk-syslog-local-master-key
|
||||
|
||||
litellm_settings:
|
||||
drop_params: true
|
||||
request_timeout: 120
|
||||
Reference in New Issue
Block a user