Engineering

Notes from building and running a small ML platform - a heterogeneous GPU cluster doing real work: inference serving, agentic tooling, Kubernetes operations, and the occasional post-mortem. More about me.

Currently

Writing

1 entry
Losing the hypervisor for Talos on bare metal
Wiped a 7-node heterogeneous GPU cluster and rebuilt it from k3s-on-Proxmox to bare-metal Talos. How to do a full rebuild without a cloud provider.
21 Jun 2026 5 min