There's a cool project on GitHub called
that's a Python script designed to test & monitor an Ollama server. It can check endpoints, run load tests, &, most importantly for us, export Prometheus metrics. You configure it with a
file, telling it which endpoints to check & what to expect.