After upgrading to Kubernetes v1.28, we hit jobs dying with OOM in cgroup v2 environments. This talk walks through the issue and how we fixed it by adding the kubelet option singleProcessOOMKill, including the implementation story and lessons learned.
https://github.com/kubernetes/kubernetes/pull/126096
Event page: https://k8sjp.connpass.com/event/365262/