Being an engine programmer usually means being a bit of a jack of all trades. There’s always something weird going on and you have to be pretty familiar with a bunch of low level details that come in handy in unexpected ways. Recently I went down a somewhat unexpected rabbit hole where those skills came in extremely hand. In an effort to blog more and also because it seems like I was the first to run into this issue, I figured I should sit down and just write about it so future people can benefit from it.
Read post
If you are working with Vulkan, chances are that at some point you’ll run into a VK_ERROR_DEVICE_LOST
error. It’s the worst kind of error, chances are that your GPU choked on some data sometime about a frame or two ago and the position where you received the error is nowhere near where the GPU actually decided to throw up its hands and give up. This is of course because GPUs and CPUs are inherently decoupled from each other, and when you submit your work from the CPU to the GPU, the GPU will start crunching your numbers while in the meantime the CPU goes on with its busy life doing other things. Now, maybe while crunching your numbers the GPU encountered a page fault or did some other computation where it just couldn’t proceed anymore afterwards. Or maybe it took too long to compute results and the kernel watchdog decided that enough was enough and restarted the device. Or something completely different! Either way, at some point this error will have propagated all the way back to your application in userspace and you’ll have no idea why it happened. And are only left to guess about what went wrong and where. Well, Nvidia has your back with the extremely verbosely named VK_NV_device_diagnostic_checkpoints extension, which is a lot like Nvidias Aftermath for DirectX, except it works on Vulkan. And because for some reason nobody on the internet seems to sing its praise, I will do so now in the form of this blog post!
Read post