Masking off correctable PCIe error.

Is there a way to mask off correctable PCIe error at run time? As experiments show that once bus driver acquire mask we can not change at run time.

If any one has any understanding on the PCIe driver where we can give the AER correctable mask to be interpreted by PCI.sys for a particular device?

xxxxx@hotmail.com wrote:

If any one has any understanding on the PCIe driver where we can give the AER correctable mask to be interpreted by PCI.sys for a particular device?

The correctable error mask is in the PCIe extended configuration space
for your device, and is interpreted at the hardware level, not by
pci.sys. As a driver, you can go out and change those bits.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

ideally it should behave like what you mentioned. But even after setting mask bit the AER is not getting masked out. I means there is something which PCI bus driver is saving and not going again and reading back. Hence asking this question if we can have something in the INF/registry to have it read by bus driver.

>But even after setting mask bit the AER is not getting masked out.

You need to check if your hardware setting the mask which you want.
It is possible that for some reasons your hardware rejects to set the mask.
Igor Sharovar

Good suggestion but my HW is working fine and it is not rejecting the request as on Linux the correctable AER is getting masked off. And on Windows I saw that after setting mask for that particular AER; I’m reading back config space for AER and it is set there but still I see that correctable AER is hitting system; which it was not supposed to be.

“hitting system” meaning, exactly, what? It’s being logged? Or the system is crashing? Or…

Peter
OSR
@OSRDrivers

xxxxx@hotmail.com wrote:

Good suggestion but my HW is working fine and it is not rejecting the request as on Linux the correctable AER is getting masked off. And on Windows I saw that after setting mask for that particular AER; I’m reading back config space for AER and it is set there but still I see that correctable AER is hitting system; which it was not supposed to be.

Well, remember that this masking is entirely a hardware attribute, not a
software attribute. If a a correctable error report reaches Windows,
then Windows is going to handle it. There’s no way to tell Windows to
ignore a PCIe error. It doesn’t check the mask bits.

So, if your hardware is still generating a correctable error report even
after the mask bit is set, then your hardware is broken. It’s also
possible, I suppose, that the correctable error is being generated by
the root complex on behalf of your device, and in that case your mask
bits don’t matter – it’s the root complex mask bits.

Exactly which correctable error is occurring?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

The AER which we are hitting is “Replay timer timeout”. I would have accepted that HW is broken only if Linux would have shown the same issue; but it is not so looks to me root complex is the one which is making this to happen.

Hi Peter,
hitting system means it is getting logged as event.