No Kubernetes? No Problem: llm-d Now Runs Anywhere
llm-d was designed as a Kubernetes-native inference stack, and its guides assume you have a cluster handy. However, a large class of inference workloads runs on infrastructure that isn't managed by Kubernetes, and until recently llm-d was not a fit for them.
With the llm-d router's new file-discovery plugin, that changes. llm-d can now run as a plain process or container in any environment, with no dependency on Kubernetes or any other cluster framework. A YAML file lists your endpoints; the router reads it and reconciles changes live. That's the whole interface.
That opens the door to deployments like:
- HPC clusters running Slurm, where GPU nodes are allocated per-job and there is no cluster API
- Ray-based training loops (VERL, OpenRLHF) where rollout workers are Ray actors, not pods
- Bare-metal inference farms provisioned statically
- Local development on a workstation with one or two GPUs
This post introduces the new endpoint-discovery plugin mechanism in the llm-d router. It then shows how to use llm-d without a Kubernetes cluster by enabling the file-discovery plugin, which reads endpoints from a YAML file on disk. We illustrate this with two concrete examples that generate the endpoints file from a Ray cluster and a Slurm job.


















