Engineering
Notes from building and running a small ML platform - a heterogeneous GPU cluster doing real work: inference serving, agentic tooling, Kubernetes operations, and the occasional post-mortem. More about me.
Currently
Now
Losing the hypervisor for Talos
Wiped a 7-node GPU cluster and rebuilt it from k3s-on-Proxmox to bare-metal Talos - a full rebuild on your own metal, no cloud provider.
Build-log · 5 min→
Elsewhere
Code and tools on GitHub
Projects and open-source work live at github.com/imlach.
GitHub ↗→
Open to
Staff platform & ML platform roles
Thirteen years across SaaS, distributed systems, and ML infrastructure. UK-remote.
Availability · about→
Writing
1 entry
Losing the hypervisor for Talos on bare metal
Wiped a 7-node heterogeneous GPU cluster and rebuilt it from k3s-on-Proxmox to bare-metal Talos. How to do a full rebuild without a cloud provider.