Unusual STORPORT behaivor

I was debugging a storage device under development that under the correct conditions I suspect causes the PCIe bus driver to fail the PnP IRP_MN_START_DEVICE IRP. Even though handling a failed IRP_MN_START_DEVICE IRP is not on the Microsoft PnP state diagram, it does happen sometimes. I don’t have detailed traces from the device, but am told it’s along the lines of it gets successfully enumerated on the bus, and then moments later resets inappropriately, so when the PCI driver tries to start it, it may be gone from the bus for a little while. I’ve worked on dynamic devices in the past, and if you add a device to a PnP bus, but then removed it before the OS can start the driver, the result is generally a failed PnP start.

The unusual behavior seemed to be what STORPORT did in response. My storage miniport saw DriverEntry called, and returned with success, and then directly saw StartIO called with the SrbFunction of SRB_FUNCTION_PNP and a PnP type of remove. This doesn’t seem so odd, except that miniport FindAdapter was never called, and the device context passed into StartIO was allocated but all zeros. I’m not sure what action the STORPORT wrapper is expecting the miniport to take, as the miniport has no context to perform an action on.

Have other people seen this bypassing of STORPORT miniport instance initialization? I’m debating if this conforms to the documented STORPORT behavior, or do I believe this is a STORPORT bug. There doesn’t seem to be a STORPORT state diagram, so one might argue the miniport functions can be called in any order, including functions that use DeviceExtension, even before it’s initialized. I think many developers will just assume there must be an ordering, and a miniport won’t get DeviceExtension as a callback parameter until after it’s been initialized by the miniport. It also seems odd that, StartIO is called with no preceding BuildIO, although looking at the docs, SRB_FUNCTION_PNP is not listed for BuildIO and is for StartIO.

I was looking through the WHQL tests, and didn’t offhand see any test for drivers handling failed PnP IRP_MN_START_DEVICE, so could believe this PnP failure case is not explicitly tested.

One of my coworkers suggested my environment must be somehow corrupted, and this STORPORT behavior is not possible. The environment was freshly installed, and there are three pieces of evidence supporting this behavior: DeviceExtension was zeros in StartIO, trace messages went direct from DriverEntry to StartIO, and setting a breakpoint in windbg on the miniport functions caused a break on StartIO after continuing from inside DriverEntry. If I don’t stimulate the device issue, all three of these sources of evidence give normal miniport behavior, so a corrupted environment is not at the top of my list.

I’m considering writing a little lower filter driver that I can use to cause the PnP IRP_MN_START_DEVICE IRP to fail on demand, so I can stimulate this issue on demand, unless anybody knows a tool that can already fail specific PnP IRPs on demand. A filter like this could also confirm what’s happening is a failed PnP start IRP, and print out the failure status code on windbg. I suppose I could also try using the checked PCI.sys driver and enable debugging messages, and see if it says something about failing a start IRP.

The STORPORT behavior was seen on Server 2012 R2 Update 1, with no other patches. I Googled for anything that seems related, and didn’t find anything.

Jan

FindAdapter is called on successful START_DEVICE.

SRB_PNP/RemoveDevice is called on REMOVE_DEVICE which comes after failed START_DEVICE, as well.

There definitely was an error handling path in Storprt that could result in StartIo being called without FindAdapter being called.

https://www.osronline.com/showThread.CFM?link=217148

In my case the problem turned out to be a memory leak in Server 2008’s storport.sys (revolving around registry accesses). My pnp testing was triggering this bug that made storport consume non paged pool to the point where setting up the device inside storport’s findadapter (before the miniport was called) failed. Storport reacted by calling StartIo with a PnP remove (and an invalid device context because it couldn’t allocate it).