Crash after SRB_FUNCTION_PNP in Storport Driver

During the driver install, I got following stack. This happened after an SRB_FUNCTION_PNP IRP with SrbPnPFlags = 0x00000000.
This storport driver is for a 16T device. If I reduce the size to 8T, driver works fine.
This stack gives me no information related to our code. Is there any way to do further debug?

*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 0000000000000028, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, bitfield :
bit 0 : value 0 = read operation, 1 = write operation
bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff8019ef708e2, address which referenced memory

Debugging Details:

DUMP_CLASS: 1

DUMP_QUALIFIER: 0

BUILD_VERSION_STRING: 9600.17041.amd64fre.winblue_gdr.140305-1710

DUMP_TYPE: 0

BUGCHECK_P1: 28

BUGCHECK_P2: 2

BUGCHECK_P3: 0

BUGCHECK_P4: fffff8019ef708e2

READ_ADDRESS: 0000000000000028

CURRENT_IRQL: 2

FAULTING_IP:
nt!MiInsertIoSpaceMap+16a
fffff801`9ef708e2 48395a28 cmp qword ptr [rdx+28h],rbx

CPU_COUNT: 38

CPU_MHZ: 7d0

CPU_VENDOR: GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 3f

CPU_STEPPING: 2

CPU_MICROCODE: 6,3f,2,0 (F,M,S,R) SIG: 38’00000000 (cache) 38’00000000 (init)

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

BUGCHECK_STR: AV

PROCESS_NAME: System

ANALYSIS_SESSION_HOST: VINCE-PC

ANALYSIS_SESSION_TIME: 03-02-2018 17:30:45.0777

ANALYSIS_VERSION: 10.0.15063.400 amd64fre

TRAP_FRAME: ffffd00157429500 – (.trap 0xffffd00157429500)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8019ef708e2 rsp=ffffd00157429690 rbp=ffffd00157429708
r8=fffff8019f13ab70 r9=00000000000c7840 r10=00000000000c7841
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl nz na po cy
nt!MiInsertIoSpaceMap+0x16a:
fffff8019ef708e2 48395a28 cmp qword ptr [rdx+28h],rbx ds:0000000000000028=???
Resetting default scope

LAST_CONTROL_TRANSFER: from fffff8019f060a46 to fffff8019efddb90

STACK_TEXT:
ffffd00157428c08 fffff8019f060a46 : 0000000000000000 0000000000000000 ffffd00157428d70 fffff8019eecd8cc : nt!DbgBreakPointWithStatus
ffffd00157428c10 fffff8019f060357 : 0000000000000003 ffffd00157428d70 fffff8019efe4f80 000000000000000a : nt!KiBugCheckDebugBreak+0x12
ffffd00157428c70 fffff8019efd70a4 : 0000000000000001 ffffd00157429898 0000000000000000 ffffffff80000300 : nt!KeBugCheck2+0x8ab
ffffd00157429380 fffff8019efe2ae9 : 000000000000000a 0000000000000028 0000000000000002 0000000000000000 : nt!KeBugCheckEx+0x104
ffffd001574293c0 fffff8019efe133a : 0000000000000000 0000000000000001 0000000000000000 ffffd00157429500 : nt!KiBugCheckDispatch+0x69
ffffd00157429500 fffff8019ef708e2 : 000000000000014b ffffd001580af000 0000000000000020 00000000000007ff : nt!KiPageFault+0x23a
ffffd00157429690 fffff8019ef703da : ffffd001580af000 00000000000c7840 0000000000000023 0000000000000000 : nt!MiInsertIoSpaceMap+0x16a
ffffd00157429750 fffff8019ef7014c : 0000000000000000 ffffc0018ca3b7b4 ffffc0018ca3b7a0 0000000000001000 : nt!MiMapIoSpace+0x286
ffffd00157429860 fffff80157f0d7b5 : fffff801000008d0 ffffe00000001003 ffffc0018ca3b818 ffffc0018ca3b7a0 : nt!MmMapIoSpace+0xc
ffffd00157429890 fffff80157f0a311 : ffffe000627c09d0 ffffe00c6dcd2730 00000000003dec10 ffffd00157429960 : pci!PciProcessStartResources+0x2765
ffffd00157429910 fffff80157ee5996 : ffffe000627e8928 fffff80157a249b0 0000000000000000 0000000000000000 : pci!PciDevice_Start+0x101
ffffd00157429a90 fffff80157a82bf0 : ffffe000627e8928 ffffe0006232e028 0000000000000000 0000000000000000 : pci!PciDispatchPnpPower+0x96
ffffd00157429ad0 fffff8019eed6adb : 0000000000000000 0000000000000000 fffff80157a82b00 ffffd00157429bd0 : ACPI!ACPIFilterIrpStartDeviceWorker+0xf0
ffffd00157429b50 fffff8019ef52794 : 0000000000000000 ffffe00063573880 ffffe00063573880 ffffe00061ba6900 : nt!ExpWorkerThread+0x293
ffffd00157429c00 fffff8019efdd5c6 : ffffd001541c7180 ffffe00063573880 ffffd001541d3fc0 0000000000000000 : nt!PspSystemThreadStartup+0x58
ffffd00157429c60 0000000000000000 : ffffd0015742a000 ffffd00157424000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: kb

THREAD_SHA1_HASH_MOD_FUNC: 46fe3e28d5234cc637b3ee8135aeb41d83e8881e

THREAD_SHA1_HASH_MOD_FUNC_OFFSET: 2976d692767c4598a9723aab8f2f7a8dde19c594

THREAD_SHA1_HASH_MOD: 6cfd8d23c7422c9d585af32c61b05905a0dd1e59

FOLLOWUP_IP:
pci!PciProcessStartResources+2765
fffff801`57f0d7b5 49894510 mov qword ptr [r13+10h],rax

FAULT_INSTR_CODE: 10458949

SYMBOL_STACK_INDEX: 9

SYMBOL_NAME: pci!PciProcessStartResources+2765

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: pci

IMAGE_NAME: pci.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 53089439

IMAGE_VERSION: 6.3.9600.17031

BUCKET_ID_FUNC_OFFSET: 2765

FAILURE_BUCKET_ID: AV_pci!PciProcessStartResources

BUCKET_ID: AV_pci!PciProcessStartResources

PRIMARY_PROBLEM_CLASS: AV_pci!PciProcessStartResources

TARGET_TIME: 2018-03-02T09:27:54.000Z

OSBUILD: 9600

OSSERVICEPACK: 0

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK: 272

PRODUCT_TYPE: 3

OSPLATFORM_TYPE: x64

OSNAME: Windows 8.1

OSEDITION: Windows 8.1 Server TerminalServer SingleUserTS

OS_LOCALE:

USER_LCID: 0

OSBUILD_TIMESTAMP: 2014-03-06 13:18:55

BUILDDATESTAMP_STR: 140305-1710

BUILDLAB_STR: winblue_gdr

BUILDOSVER_STR: 6.3.9600.17041.amd64fre.winblue_gdr.140305-1710

ANALYSIS_SESSION_ELAPSED_TIME: 647

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:av_pci!pciprocessstartresources

FAILURE_ID_HASH: {233d600d-cab5-c458-a35a-b1b07a268848}

Followup: MachineOwner

If I’m looking at the same version as you, here’s the assembly leading up to
your faulting instruction:

fffff801ed710645 mov eax,dword ptr [nt!MmIoHeaderData+0x38 (fffff801ed955b78)]
fffff801ed71064b mov rdx,qword ptr [nt!MmIoHeader (fffff801ed955af0)]
fffff801`ed710652 mov dword ptr [rbp-58h],eax

nt!MiInsertIoSpaceMap+0x159:
fffff801ed710655 lea rax,[nt!MmIoHeader (fffff801ed955af0)]

nt!MiInsertIoSpaceMap+0x160:
fffff801ed71065c cmp rdx,rax fffff801ed71065f je nt!MiInsertIoSpaceMap+0x29f (fffff801`ed71079b)

nt!MiInsertIoSpaceMap+0x169:
fffff801`ed710665 cmp qword ptr [rdx+28h],rbx

Which would mean that nt!MmIoHeader is NULL.

I’ve never debugged anything down this path before so I don’t know what that
is, but this looks like the Flink of a global list is corrupt and set to
NULL (and something around I/O space/device memory, which is suspicious).

I’d set a write access breakpoint on nt!MmIoHeader during boot and watch
what happens in the crashing case versus the working case. You can also
break if the value gets set to NULL:

ba w8 nt!MmIoHeader “.echo "Modified…" ; dq nt!MmIoHeader L1; .if
(poi(nt!MmIoHeader) != 0) {gc}”

Not sure that’s going to be useful, but it’s something to try.

Also, what do you mean by “16T” versus “8T”? Is that the storage size? Does
that affect the size of the PCIe device memory presented to the host?

-scott
OSR
@OSRDrivers

I don’t seem to have the original message describing this, but I know one silly reason for crashes in StorPort SRB_FUNCTION_PNP. If I’m totally out of context, ignore this.

If the lower driver fails the PnP start IRP, StorPort still calls the miniport StartIO function with SRB_FUNCTION_PNP, except StorPort has never allocated/initialized the device context, so any accesses crash. The docs do not spell out the need to ignore any call to StartIO with a null context, without completing the SRB. The docs say every SRB needs to get completed, which in this case is wrong.

The underlying cause is the PnP start was failed, like the device vanished from the bus for a bit after PCI enumeration detected it. The WHQL tests sometimes can stimulate this kind of behavior. You can also force it to happen by writing a little filter driver that fails PnP start on demand by changing the IRP result code as the IRP is going up the completion path.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Scott Noone
Sent: Thursday, March 8, 2018 8:01 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Crash after SRB_FUNCTION_PNP in Storport Driver

If I’m looking at the same version as you, here’s the assembly leading up to your faulting instruction:

fffff801ed710645 mov eax,dword ptr [nt!MmIoHeaderData+0x38 <br>(fffff801ed955b78)]
fffff801ed71064b mov rdx,qword ptr [nt!MmIoHeader (fffff801ed955af0)]
fffff801ed710652 mov dword ptr [rbp-58h],eax<br><br>nt!MiInsertIoSpaceMap+0x159:<br>fffff801ed710655 lea rax,[nt!MmIoHeader (fffff801`ed955af0)]

nt!MiInsertIoSpaceMap+0x160:
fffff801`ed71065c cmp rdx,rax
fffff801`ed71065f je nt!MiInsertIoSpaceMap+0x29f (fffff801`ed71079b)

nt!MiInsertIoSpaceMap+0x169:
fffff801`ed710665 cmp qword ptr [rdx+28h],rbx

Which would mean that nt!MmIoHeader is NULL.

I’ve never debugged anything down this path before so I don’t know what that is, but this looks like the Flink of a global list is corrupt and set to NULL (and something around I/O space/device memory, which is suspicious).

I’d set a write access breakpoint on nt!MmIoHeader during boot and watch what happens in the crashing case versus the working case. You can also break if the value gets set to NULL:

ba w8 nt!MmIoHeader “.echo "Modified…" ; dq nt!MmIoHeader L1; .if
(poi(nt!MmIoHeader) != 0) {gc}”

Not sure that’s going to be useful, but it’s something to try.

Also, what do you mean by “16T” versus “8T”? Is that the storage size? Does that affect the size of the PCIe device memory presented to the host?

-scott
OSR
@OSRDrivers


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:>

The OS version is 2008R2.

I’ve done other tests. This crash happens for about 50% of driver installs.
If driver installed successfully, it can work well like for ever. If it fails, it always after SRB_FUNCTION_PNP with flag set to 0 and action set to StorQueryCapabilities before it starts to query the capacity.
Now it can be concluded that this only happen in our PCIe Gen3 device. The same driver works fine with Gen2 device. Block device size is not related.

I also suspect that there’s something wrong with PnP fuction in the device.

-----Original Message-----
Jan Bottorff
xxxxx@pmatrix.com
Join Date: 16 Apr 2013
Posts To This List: 419
Crash after SRB_FUNCTION_PNP in Storport Driver
I don’t seem to have the original message describing this, but I know one silly
reason for crashes in StorPort SRB_FUNCTION_PNP. If I’m totally out of context,
ignore this.

If the lower driver fails the PnP start IRP, StorPort still calls the miniport
StartIO function with SRB_FUNCTION_PNP, except StorPort has never
allocated/initialized the device context, so any accesses crash. The docs do not
spell out the need to ignore any call to StartIO with a null context, without
completing the SRB. The docs say every SRB needs to get completed, which in this
case is wrong.

The underlying cause is the PnP start was failed, like the device vanished from
the bus for a bit after PCI enumeration detected it. The WHQL tests sometimes
can stimulate this kind of behavior. You can also force it to happen by writing
a little filter driver that fails PnP start on demand by changing the IRP result
code as the IRP is going up the completion path.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Scott Noone

Sent: Thursday, March 8, 2018 8:01 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Crash after SRB_FUNCTION_PNP in Storport Driver

If I’m looking at the same version as you, here’s the assembly leading up to
your faulting instruction:

fffff801ed710645 mov eax,dword ptr [nt!MmIoHeaderData+0x38 <br>(fffff801ed955b78)]
fffff801ed71064b mov rdx,qword ptr [nt!MmIoHeader (fffff801ed955af0)]
fffff801ed710652 mov dword ptr [rbp-58h],eax<br><br>nt!MiInsertIoSpaceMap+0x159:<br>fffff801ed710655 lea rax,[nt!MmIoHeader (fffff801`ed955af0)]

nt!MiInsertIoSpaceMap+0x160:
fffff801`ed71065c cmp rdx,rax
fffff801`ed71065f je nt!MiInsertIoSpaceMap+0x29f (fffff801`ed71079b)

nt!MiInsertIoSpaceMap+0x169:
fffff801`ed710665 cmp qword ptr [rdx+28h],rbx

Which would mean that nt!MmIoHeader is NULL.

I’ve never debugged anything down this path before so I don’t know what that is,
but this looks like the Flink of a global list is corrupt and set to NULL (and
something around I/O space/device memory, which is suspicious).

I’d set a write access breakpoint on nt!MmIoHeader during boot and watch what
happens in the crashing case versus the working case. You can also break if the
value gets set to NULL:

ba w8 nt!MmIoHeader “.echo "Modified…" ; dq nt!MmIoHeader L1; .if
(poi(nt!MmIoHeader) != 0) {gc}”

Not sure that’s going to be useful, but it’s something to try.

Also, what do you mean by “16T” versus “8T”? Is that the storage size? Does that
affect the size of the PCIe device memory presented to the host?

-scott
OSR
@OSRDrivers

On the crash I was talking about, HwStartIO was called BEFORE HwStorFindAdapter. I saw it on some hardware during the WHQL tests for PCIe compliance. Since the system would crash due to a null memory access, it was not so apparent exactly where in the WHQL test it failed. Anything that causes a PnP start to fail could stimulate the behavior.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com On Behalf Of xxxxx@gmail.com
Sent: Tuesday, March 13, 2018 10:47 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Crash after SRB_FUNCTION_PNP in Storport Driver

The OS version is 2008R2.

I’ve done other tests. This crash happens for about 50% of driver installs.
If driver installed successfully, it can work well like for ever. If it fails, it always after SRB_FUNCTION_PNP with flag set to 0 and action set to StorQueryCapabilities before it starts to query the capacity.
Now it can be concluded that this only happen in our PCIe Gen3 device. The same driver works fine with Gen2 device. Block device size is not related.

I also suspect that there’s something wrong with PnP fuction in the device.

-----Original Message-----
Jan Bottorff
xxxxx@pmatrix.com
Join Date: 16 Apr 2013
Posts To This List: 419
Crash after SRB_FUNCTION_PNP in Storport Driver I don’t seem to have the original message describing this, but I know one silly reason for crashes in StorPort SRB_FUNCTION_PNP. If I’m totally out of context, ignore this.

If the lower driver fails the PnP start IRP, StorPort still calls the miniport StartIO function with SRB_FUNCTION_PNP, except StorPort has never allocated/initialized the device context, so any accesses crash. The docs do not spell out the need to ignore any call to StartIO with a null context, without completing the SRB. The docs say every SRB needs to get completed, which in this case is wrong.

The underlying cause is the PnP start was failed, like the device vanished from the bus for a bit after PCI enumeration detected it. The WHQL tests sometimes can stimulate this kind of behavior. You can also force it to happen by writing a little filter driver that fails PnP start on demand by changing the IRP result code as the IRP is going up the completion path.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Scott Noone

Sent: Thursday, March 8, 2018 8:01 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Crash after SRB_FUNCTION_PNP in Storport Driver

If I’m looking at the same version as you, here’s the assembly leading up to
your faulting instruction:

fffff801ed710645 mov eax,dword ptr [nt!MmIoHeaderData+0x38 <br>(fffff801ed955b78)]
fffff801ed71064b mov rdx,qword ptr [nt!MmIoHeader (fffff801ed955af0)]
fffff801ed710652 mov dword ptr [rbp-58h],eax<br><br>nt!MiInsertIoSpaceMap+0x159:<br>fffff801ed710655 lea rax,[nt!MmIoHeader (fffff801`ed955af0)]

nt!MiInsertIoSpaceMap+0x160:
fffff801`ed71065c cmp rdx,rax
fffff801`ed71065f je nt!MiInsertIoSpaceMap+0x29f (fffff801`ed71079b)

nt!MiInsertIoSpaceMap+0x169:
fffff801`ed710665 cmp qword ptr [rdx+28h],rbx

Which would mean that nt!MmIoHeader is NULL.

I’ve never debugged anything down this path before so I don’t know what that is,
but this looks like the Flink of a global list is corrupt and set to NULL (and
something around I/O space/device memory, which is suspicious).

I’d set a write access breakpoint on nt!MmIoHeader during boot and watch what
happens in the crashing case versus the working case. You can also break if the
value gets set to NULL:

ba w8 nt!MmIoHeader “.echo "Modified…" ; dq nt!MmIoHeader L1; .if
(poi(nt!MmIoHeader) != 0) {gc}”

Not sure that’s going to be useful, but it’s something to try.

Also, what do you mean by “16T” versus “8T”? Is that the storage size? Does that
affect the size of the PCIe device memory presented to the host?

-scott
OSR
@OSRDrivers




NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:>