Linux 6.1 Will Make It A Bit Easier To Help Spot Faulty CPUs

While mostly of benefit to server administrators with large fleets of hardware, Linux 6.1 aims to make it easier to help spot problematic CPUs/cores by reporting the likely socket and core when a segmentation fault occurs, which can help in spotting any trends if routinely finding the same CPU/core is causing problems.

Queued up now in TIP's x86/cpu branch for the Linux 6.1 merge window in October is a patch to print the likely CPU at segmentation fault time. Printing the likely CPU core and socket when a seg fault occurs can be beneficial if routinely finding seg faults happening on the same CPU package or particular core.

Rik van Riel who authored the change summed it up as:

In a large enough fleet of computers, it is common to have a few bad CPUs. Those can often be identified by seeing that some commonly run kernel code, which runs fine everywhere else, keeps crashing on the same CPU core on one particular bad system.

However, the failure modes in CPUs that have gone bad over the years are often oddly specific, and the only bad behavior seen might be segfaults in programs like bash, python, or various system daemons that run fine everywhere else.

Add a printk() to show_signal_msg() to print the CPU, core, and socket at segfault time.

This is not perfect, since the task might get rescheduled on another CPU between when the fault hit, and when the message is printed, but in practice this has been good enough to help people identify several bad CPU cores.

This little helper to assist in spotting potentially faulty processors will be there for use starting on Linux 6.1 later this year.

image.php?id=2017&image=bent_kaby_1_med

Not directly related: I Bent A Kabylake CPU & It Still Works

It's a small but useful complement to the likes of the new Intel In-Field Scan, MCEs, EDAC reporting, etc.

Source

Karlston
1

User Feedback

0 Comments

Recommended Comments

There are no comments to display.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Add a comment...

× Pasted as rich text. Paste as plain text instead

Only 75 emoji are allowed.

× Your link has been automatically embedded. Display as a link instead

× Your previous content has been restored. Clear editor

× You cannot paste images directly. Upload or insert images from URL.

Insert image from URL

Sign In

Linux 6.1 Will Make It A Bit Easier To Help Spot Faulty CPUs

User Feedback

Recommended Comments

Join the conversation

Recently Browsing 0 members

nsane.down

News

Browse

Activity