cfb05fa501
Router now buffers streaming response chunks to extract timings (prompt_n, predicted_n, predicted_per_second) from the final SSE data frame before yielding to the client. Streaming requests get real throughput data instead of 0 tok/s. Uses llama.cpp timings field in the last content chunk: - completion_tokens = predicted_n - tokens_per_sec = predicted_per_second - inference_ms = predicted_ms (generation only) Client sees identical stream, no perceptible delay.