inference-harness

T

RTX 3090 was at 94.9% VRAM at 262K context. Reduced to 192K (196608),
freeing ~2.4GB. VRAM now at 85% with room for active inference.

2026-05-25 00:31:40 +00:00

2026-05-19 15:03:47 +00:00

2026-05-19 15:03:47 +00:00

2026-05-25 00:31:40 +00:00

.gitignore

2026-05-19 15:03:47 +00:00

docker-compose.yml

2026-05-23 05:57:13 +00:00

docker-compose.yml.bak

2026-05-19 15:03:47 +00:00

litellm_config.yaml

2026-05-19 15:03:47 +00:00

maintenance.sh

2026-05-19 15:03:47 +00:00