AMD's "driver issue" complaints are something the company is well aware of, and it is certainly making an active effort to combat them in interesting and innovative ways, like using a new AI tool, for example. Surprisingly, perhaps, it has been Nvidia that has been the one dealing with more driver-related as well as other problems with its current gen RTX 50 series cards.
So while Nvidia continues dealing with problems on Windows, an engineer from there helped resolve an AMD driver bug on Linux. A recent patch note reveals this as it says:
Fix a performance regression on AMD iGPU and dGPU drivers, related to the unintended activation of DMA bounce buffers that regressed game performance if KASLR disturbed things just enough.
Interestingly, this issue, as it turns out, was the consequence of a previous "bad commit" by an Nvidia engineer. It was able to be pinned down thanks to the effort by Bert Karwatzki, who noticed problems when playing Stellaris on Steam. He wrote:
Using linux next-20250307 to play the game stellaris via steam I noticed that loading the game gets sluggish with the progress bar getting stuck at 100%. In this situation mouse and keyboard inputs don't work properly anymore. Switching to a VT and killing stellaris somewhat fixes the situation though in one instance the touchpad did not work after that. I bisected this between v6.14-rc5 and next-20250307 and got this as the first bad commit
....
Reverting commit 7ffb791423c7 in next-20250307 fixes the issue for me. The OS is debian sid (last updated 20250309) and this is the hardware is an MSI Alpha 15 Laptop
Upon investigating further, it was understood that the bug was a result of an issue with the kernel address space layout randomization (KASLR) feature when it was disabled (nokaslr), which leads to a DMA (direct memory access) addressing error. For those wondering, KASLR is a security feature that helps load the kernel to a random location in memory and is meant for memory safety.
The patch notes say:
As Bert Karwatzki reported, the following recent commit causes a performance regression on AMD iGPU and dGPU systems:
7ffb791423c7 ("x86/kaslr: Reduce KASLR entropy on most x86 systems")
It exposed a bug with nokaslr and zone device interaction. The root cause of the bug is that, the GPU driver registers a zone device private memory region. When KASLR is disabled or the above commit is applied, the direct_map_physmem_end is set to much higher than 10 TiB typically to the 64TiB address.
When zone device private memory is added to the system via add_pages(), it bumps up the max_pfn to the same value. This causes dma_addressing_limited() to return true, since the device cannot address memory all the way up to max_pfn.
This caused a regression for games played on the iGPU, as it resulted in the DMA32 zone being used for GPU allocations.
You can read them in full at the source LKML (Linux Kernel Mailing List) links below.
Source: LKML (link1, link2, link3)
Hope you enjoyed this news post.
Thank you for appreciating my time and effort posting news every day for many years.
News posts... 2023: 5,800+ | 2024: 5,700+ | 2025 (till end of March): 1,357
RIP Matrix | Farewell my friend
Recommended Comments
There are no comments to display.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.