Driver Problems? Questions? Issues?
Put OSR's experience to work for you! Contact us for assistance with:
  • Creating the right design for your requirements
  • Reviewing your existing driver code
  • Analyzing driver reliability/performance issues
  • Custom training mixed with consulting and focused directly on your specific areas of interest/concern.
Check us out. OSR, the Windows driver experts.

Upcoming OSR Seminars:

Writing WDF Drivers I: Core Concepts, Nashua, NH 15-19 May, 2017
Writing WDF Drivers II: Advanced Implementation Tech., Nashua, NH 23-26 May, 2017
Kernel Debugging and Crash Analysis, Dulles, VA 26-30 June, 2017
Windows Internals & Software Driver Development, Nashua, NH 24-28 July, 2017


Go Back   OSR Online Lists > ntdev
Welcome, Guest
You must login to post to this list
  Message 1 of 29  
08 Mar 17 17:56
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

Hello all, This one is a bit of a toughie to debug, I thought I'd ask and see if anyone has any advice from past experience. I'm running into an issue on a beta build of Windows 10 (1048, hence my earlier email about public symbols) wherein some driver in the disk subsystem fails and causing all attempts to read unpaged data to hang. The system slowly grinds to a halt as whatever was paged to memory requires access to the disk and is unable to proceed. Each time the symptoms vary, sometimes I can get far enough to get task manager running, sometimes I can't. No BSOD occurs here without a full driver verifier configured. Now what makes this hard to debug is that no crash dumps are written to the disk. I can't remote debug because I'm only able to get to reproduce on a laptop and with full driver verification enabled (only way I can get it to BSOD.. sometimes), regular boot mode causes a myriad of other BSODs in core drivers (network, touchpad, graphics) but in safe mode with only the necessary MS drivers loaded I can trigger a BSOD when this happens - but no network debugging is available. The only filters I have loaded are EhStorClass and partmgr for the disk class and I've tried both an MS-provided and OEM-provided driver for my (NVME) SCSIAdapter device driver, but still get the disk access failure. I'm using the generic DiskDrive driver for the disk itself. Any clues? I'm happy to provide whatever additional info I can.
  Message 2 of 29  
08 Mar 17 18:41
Alex Grig
xxxxxx@broadcom.com
Join Date: 14 Apr 2008
Posts To This List: 3200
Which driver failure would cause all disk access to cease without system panic?

Paging path is blocked by paging: either called paged code section or touched paged data. Run !stacks and see what the threads are blocked at.
  Message 3 of 29  
09 Mar 17 11:04
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

Thanks for the reply, Alex. I'm not exactly sure what I'm looking for, to be honest. As luck would have it, I ended up running into this right in the middle of a !stacks call and it seemed that pretty much everything was stuck on nt!IoRemoveIoCompletion+0x8d (presumably unable to continue the normal stack progression to nt!KeRemoveQueueEx and beyond).
  Message 4 of 29  
09 Mar 17 11:12
Scott Noone
xxxxxx@osr.com
Join Date:
Posts To This List: 1304
List Moderator
Which driver failure would cause all disk access to cease without system panic?

The StorageKD extension can be useful in these cases as it will show you the state of the storage IRPs: https://msdn.microsoft.com/en-US/library/windows/hardware/dn997250(v=vs.85).aspx I usually dump the system log in these cases also. If there's a hardware failure sometimes you see disk retry errors: !wmitrace.logdump EventLog-System -scott OSR @OSRDrivers wrote in message news:222690@ntdev... Thanks for the reply, Alex. I'm not exactly sure what I'm looking for, to be honest. As luck would have it, I ended up running into this right in the middle of a !stacks call and it seemed that pretty much everything was stuck on nt!IoRemoveIoCompletion+0x8d (presumably unable to continue the normal stack progression to nt!KeRemoveQueueEx and beyond).
  Message 5 of 29  
09 Mar 17 11:36
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

Hey Scott, I appreciate your chiming in. I've been keeping my eye on the system log and there's been nothing there. I'll try the storage extensions next time this happens (if the system doesn't crash before then).
  Message 6 of 29  
09 Mar 17 12:21
Alex Grig
xxxxxx@broadcom.com
Join Date: 14 Apr 2008
Posts To This List: 3200
Which driver failure would cause all disk access to cease without system panic?

You may need to run !stacks 2 !stacks without arguments filters "insignificant" wait reasons, which could be wait for paging.
  Message 7 of 29  
10 Mar 17 09:40
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

Alex, do you know if when debugging locally the output of !stacks is of a snapshot or real-time? i.e. on my machine, !stacks takes over 10 minutes to finish. Is the output of the later entries 10 minutes old or is it consistent with the system state at the time it appears on screen? I ran into the hang earlier today and discovered that the !storagekd.* commands will hang in the debugger when I'm experiencing this issue. Just *busy* and no response before the machine completely froze a few seconds later. Haven't been able to run !stacks 2 _during_ the hang (but did manage to hang it after calling !stacks but before the !stacks call finished - hence my question above).
  Message 8 of 29  
10 Mar 17 18:25
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

OK, let me just say this: I hate this. Turns out I can't run !stacks either once this issue has kicked in. The debugger just goes *busy* and twiddles its thumbs while my PC begins to thrash in the throes of its upcoming and now inevitable death. Maybe related question: why would a driver verifier violation in BTHPORT.SYS fail to write the dump to disk (remaining stuck at 0%)? Other BSODs dump to disk OK and I don't see what BTHPORT.SYS would have to do with the storport subsystem.
  Message 9 of 29  
10 Mar 17 22:39
Alex Grig
xxxxxx@broadcom.com
Join Date: 14 Apr 2008
Posts To This List: 3200
Which driver failure would cause all disk access to cease without system panic?

By definition, when you break in with a remote kernel debugger, the OS state is frozen (except for KDNET activity). The debugger doesn't make any effort to take a snapshot. DO NOT USE LOCAL DEBUGGER FOR THIS.
  Message 10 of 29  
10 Mar 17 22:47
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

Thanks, Alex. As mentioned above, remote debugging is a bit difficult due to the nature of the situation. I'm going to try via USB 3 as soon as my male-to-male cable comes in. Progress however: I got really lucky and was able to execute !storclass xxxx 2 just after this happened without it hanging. I found a host of 0x28 read and 0x2a write failures with SRB status 0x04, which is something. It's not the physical drive because yanking the disk and sticking it in another PC does not exhibit this problem. I suppose in this case it is the NVMe driver that is translating the hardware failure to a code 0x04 SRB failure, though I'm not sure what the underlying hardware error was what could be responsible for this condition. It happens with both the MSFT and the Samsung NVMe drivers, so combined with the SRB failure it's making me suspect the hardware. I think a clean install of Windows on a new partition on the same drive may be in order to see if it exhibits the same?
  Message 11 of 29  
13 Mar 17 10:41
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

So I was fully expecting this to be a hardware problem but a clean installation of Windows 10 15011 revealed no instabilities until upgraded to 15048 at which point they resumed. *sigh* My USB debug cable comes in tomorrow.
  Message 12 of 29  
13 Mar 17 10:49
Scott Noone
xxxxxx@osr.com
Join Date:
Posts To This List: 1304
List Moderator
Which driver failure would cause all disk access to cease without system panic?

Did the !storclass command show you the sense data as well? SRB status of 4 is just the generic "SRB_STATUS_ERROR". The SCSI status and sense data should have more detail. You can even translate them back into the NVMe failure using the SCSI to NVMe spec (see section 7): http://www.nvmexpress.org/wp-content/uploads/NVM-Express-SCSI-Translation-Referen ce-1_1-Gold.pdf Not to say that's necessarily going to provide much more interesting info, but the more details we can find the better. -scott OSR @OSRDrivers wrote in message news:222744@ntdev... Thanks, Alex. As mentioned above, remote debugging is a bit difficult due to the nature of the situation. I'm going to try via USB 3 as soon as my male-to-male cable comes in. Progress however: I got really lucky and was able to execute !storclass xxxx 2 just after this happened without it hanging. I found a host of 0x28 read and 0x2a write failures with SRB status 0x04, which is something. It's not the physical drive because yanking the disk and sticking it in another PC does not exhibit this problem. I suppose in this case it is the NVMe driver that is translating the hardware failure to a code 0x04 SRB failure, though I'm not sure what the underlying hardware error was what could be responsible for this condition. It happens with both the MSFT and the Samsung NVMe drivers, so combined with the SRB failure it's making me suspect the hardware. I think a clean install of Windows on a new partition on the same drive may be in order to see if it exhibits the same?
  Message 13 of 29  
13 Mar 17 13:48
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

Hi Scott, Unfortunately I don't think it was anything useful. Basically, the failed requests were all: Opcode: 2a/28 SRB: 04 SCSI Status: 0 Sense Code: 00000 Sector: random Timestamp: +/- 0.09 seconds apart Retried I know 0x04 is just a generic HBA/driver failure, but at least it meant that it wasn't a regular failing disk timeout error. Beyond that, it's not very useful...
  Message 14 of 29  
14 Mar 17 15:18
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

With my debug USB cable in hand, I was *finally* able to get some real data out of this. Sorry for wasting everyone's time with the pathetic attempts at local debugging earlier in this thread. _Unfortunately_, I don't have the symbols for this (no symbols for 15055 as of yet, I've contacted WinDbgFb and we'll see if we hear back. I may have to debug this on a clean install of a different build that triggers the issue yet has the debug symbols available.. *sigh*). Anyway, this is what I get: 0: kd> !wmitrace.logdump EventLog-System ------------------------------------------------ | | | NT symbols are not available | | reduced functionality | | | ------------------------------------------------ (WmiTrace) LogDump for Logger Id 0x0c Found Buffers: 2 Messages: 2, sorting entries [0]0000.0000:: 131339913007908456 [Microsoft-Windows-StorPort/Bus reset /OpCodeBusReset]Bus reset occured on storport adapter (Port Number: 1) [0]0000.0000:: 131339913307864363 [Microsoft-Windows-StorPort/None /Info ]A request timed out for Storport Device (Port = 1, Path = 0, Target = 0, Lun = 0). Corresponding Class Disk Device Guid is {0d61954a-3e28-2ff4-3116-b0b7bdd7c44b}. Total of 2 Messages from 2 Buffers 0: kd> !storclass <symbols missing> 0: kd> !storadapter STORPORT adapters: ================== Driver Object Extension State ----------------------------------------------------------------- \Driver\secnvme ffffc883f4bc2050 ffffc883f4bc21a0 Working \Driver\storahci ffffc883f4bb5050 ffffc883f4bb51a0 Working 0: kd> !storadapter ffffc883f4bc2050 ADAPTER DeviceObj : ffffc883f4bc2050 AdapterExt: ffffc883f4bc21a0 DriverObj : ffffc883f4ba58d0 DeviceState : Working LowerDO ffffc883f4b59620 PhysicalDO ffffc883f4b59840 SlowLock Free RemLock -666 SystemPowerState: Working AdapterPowerState D0 Full Duplex Bus 4 Slot 0 DMA ffffc883f4bd0dd0 Interrupt 0000000000000000 Allocated ResourceList ffffc883f4bc6960 Translated ResourceList ffffc883f4ba78f0 Gateway: Outstanding 0 Lower 1024 High 1024 PortConfigInfo ffffc883f4bc22d0 HwInit ffffc883f4ba83b0 HwDeviceExt ffffc883f2c6b010 (10024 bytes) SrbExt 4560 bytes LUExt 0 bytes Normal Logical Units: Product SCSI ID Object Extension Pnd Out Ct State --------------------------------------------------------------------------------- ------- NVMe Samsung SS 0 0 0 ffffc883f4b9e060 ffffc883f4b9e1b0 35 0 0 Working Zombie Logical Units: Product SCSI ID Object Extension Pnd Out Ct State --------------------------------------------------------------------------------- ----- 0: kd> !storunit ffffc883f4b9e060 DO ffffc883f4b9e060 Ext ffffc883f4b9e1b0 Adapter ffffc883f4bc21a0 Working Vendor: NVMe Product: Samsung SSD 950 SCSI ID: (0, 0, 0) Claimed Enumerated SlowLock Free RemLock 38 PageCount 2 QueueTagList: ffffc883f4b9e2b0 Outstanding: Head 0000000000000000 Tail 0000000000000000 Timeout 0 (Ticking Down) DeviceQueue ffffc883f4b9e340 Depth: 254 Status: Not Frozen PauseCount: 1 BusyCount: 0 IO Gateway: Busy Count 0 Pause Count 0 Requests: Outstanding 0 Device 35 ByPass 0 [Device-Queued Requests] IRP SRB Type SRB XRB Command MDL SGList Timeout --------------------------------------------------------------------------------- -------------------------------------------------- ffffc88400fd6bb0 [STORAGE] ffffc88400fd6db0 n/a SCSI/WRITE (10) ffffda80a7131ea0 n/a 65s ffffc883ff39a010 [STORAGE] ffffc88401285a10 n/a SCSI/WRITE (10) ffffda80a0b27750 n/a 65s ffffc8840106a4f0 [STORAGE] ffffc8840106a6f0 n/a SCSI/READ (10) ffffc883f46b83a0 n/a 65s ffffc88400fdaa20 [STORAGE] ffffc883fecde200 n/a SCSI/WRITE (10) ffffc883f492a200 n/a 65s ffffc884013dfd40 [STORAGE] ffffc884013dff40 n/a SCSI/WRITE (10) ffffc8840172b010 n/a 65s ffffc883f317ae50 [STORAGE] ffffc883ff400bb0 n/a SCSI/WRITE (10) ffffc88401716100 n/a 65s ffffc883fbe95460 [STORAGE] ffffc883fd84b7b0 n/a SCSI/WRITE (10) ffffc8840174e930 n/a 65s ffffc884012637b0 [STORAGE] ffffc883ff0b4970 n/a SCSI/WRITE (10) ffffda80a16be180 n/a 65s ffffc883fec4e450 [STORAGE] ffffc883ff725d10 n/a SCSI/WRITE (10) ffffc884013daa40 n/a 65s ffffc883ff28e480 [STORAGE] ffffc883fe157960 n/a SCSI/READ (10) ffffc883f4404550 n/a 65s ffffc88400ecfd40 [STORAGE] ffffc88400ecff40 n/a SCSI/WRITE (10) ffffda809fd21180 n/a 65s ffffc883fd363010 [STORAGE] ffffc883fe02c1c0 n/a SCSI/WRITE (10) ffffda80a0f9b180 n/a 65s ffffc88400f0c100 [STORAGE] ffffc88401344a00 n/a SCSI/READ (10) ffffc883f3189b10 n/a 65s ffffc88401371ea0 [STORAGE] ffffc88401343a00 n/a SCSI/WRITE (10) ffffc883ff6d4510 n/a 65s ffffc88400f99790 [STORAGE] ffffc883ffb6a7e0 n/a SCSI/READ (10) ffffc883ff1da8c0 n/a 65s ffffc884013875e0 [STORAGE] ffffc884012171e0 n/a SCSI/WRITE (10) ffffc883fe2df640 n/a 65s ffffc883fe5ee730 [STORAGE] ffffc883ff319c50 n/a SCSI/READ (10) ffffc883f4160d20 n/a 65s ffffc883fb218e50 [STORAGE] ffffc883f45abd40 n/a SCSI/WRITE (10) ffffc883f3d6fb00 n/a 65s ffffc8840159b6f0 [STORAGE] ffffc883f3cbdcd0 n/a SCSI/WRITE (10) ffffc883f42c6200 n/a 65s ffffc883fed71a30 [STORAGE] ffffc883ff3737a0 n/a SCSI/WRITE (10) ffffc883f3907620 n/a 65s ffffc88401553780 [STORAGE] ffffc883fed97d80 n/a SCSI/READ (10) ffffc883ff1355d0 n/a 65s ffffc88401a11c00 [STORAGE] ffffc883fd5d5320 n/a SCSI/READ (10) ffffc883fdbdcdb0 n/a 65s ffffc883f424bea0 [STORAGE] ffffc883ff5d23d0 n/a SCSI/WRITE (10) ffffc883f3c538a0 n/a 65s ffffc883f447e870 [STORAGE] ffffc883ff458c10 n/a SCSI/READ (10) ffffc883ff712110 n/a 65s ffffc884019d4d80 [STORAGE] ffffc88401791860 n/a SCSI/READ (10) ffffc883ff307d90 n/a 65s ffffc883fb8d4b70 [STORAGE] ffffc883f3b501c0 n/a SCSI/WRITE (10) ffffc883fd30e7e0 n/a 65s ffffc883ff196b10 [STORAGE] ffffc883f45ea010 n/a SCSI/READ (10) ffffc883ff00fde0 n/a 65s ffffc883f436b2f0 [STORAGE] ffffc883fdcd7b30 n/a SCSI/READ (10) ffffc883fefd4f50 n/a 65s ffffc884019baa70 [STORAGE] ffffc883f3ad4230 n/a SCSI/WRITE (10) ffffc88401233390 n/a 65s ffffc88401737010 [STORAGE] ffffc883fe38c1b0 n/a SCSI/WRITE (10) ffffda80a027f270 n/a 65s ffffc883fb8ef770 [STORAGE] ffffc883f4921430 n/a SCSI/WRITE (10) ffffda80a6ecd270 n/a 65s ffffc88400f86460 [STORAGE] ffffc883fdd20010 n/a SCSI/READ (10) ffffc883fb2257d0 n/a 65s ffffc883f38cf630 [STORAGE] ffffc883fde17310 n/a SCSI/WRITE (10) ffffc883fae5ab30 n/a 65s ffffc883f42d1570 [STORAGE] ffffc883fdee4140 n/a SCSI/WRITE (10) ffffc883f43cbf40 n/a 65s ffffc883fdbc12b0 [STORAGE] ffffc883fe244ae0 n/a SCSI/WRITE (10) ffffc883f3daa900 n/a 65s [Bypass-Queued Requests] IRP SRB Type SRB XRB Command MDL SGList Timeout --------------------------------------------------------------------------------- -------------------------------------------------- [Outstanding Requests] IRP SRB Type SRB XRB Command MDL SGList Timeout --------------------------------------------------------------------------------- -------------------------------------------------- ffffc8840137bee0 [STORAGE] ffffc88401b863b0 ffffc883f57ab010 RESET LUN 0000000000000000 0000000000000000 30s [Completed Requests] IRP SRB Type SRB XRB Command MDL SGList Timeout --------------------------------------------------------------------------------- -------------------------------------------------- ERROR: 1 counted requests > 0 outstanding requests 0: kd> !storsrb ffffc88401b863b0 SRB is a STORAGE request block (SRB_EX) SRB EX 0xffffc88401b863b0 Function 28 Version 1, Signature 53524258, SrbStatus: 0x00[Pending], SrbFunction 0x20 [RESET LUN] Address Type is BTL8 No SrbExData 4: kd> !storsrb ffffc88401285a10 SRB is a STORAGE request block (SRB_EX) SRB EX 0xffffc88401285a10 Function 28 Version 1, Signature 53524258, SrbStatus: 0x00[Pending], SrbFunction 0x00 [EXECUTE SCSI] Address Type is BTL8 SRB_EX Data Type [SrbExDataTypeScsiCdb16] [EXECUTE SCSI] SRB_EX: 0xffffc88401285aa0 OriginalRequest: 0xffffc883ff39a010 DataBuffer/Length: 0x0000000000000000 / 0x00001000 PTL: (0, 0, 0) CDB: 2A 00 05 E3 FF E0 00 00 08 00 00 00 00 00 00 00 OpCode: SCSI/WRITE (10) 4: kd> !storsrb ffffc8840106a6f0 SRB is a STORAGE request block (SRB_EX) SRB EX 0xffffc8840106a6f0 Function 28 Version 1, Signature 53524258, SrbStatus: 0x00[Pending], SrbFunction 0x00 [EXECUTE SCSI] Address Type is BTL8 SRB_EX Data Type [SrbExDataTypeScsiCdb16] [EXECUTE SCSI] SRB_EX: 0xffffc8840106a780 OriginalRequest: 0xffffc8840106a4f0 DataBuffer/Length: 0xffffc88401a4cdc0 / 0x00000200 PTL: (0, 0, 0) CDB: 28 00 1E 5F 15 DF 00 00 01 00 00 00 00 00 00 00 OpCode: SCSI/READ (10) 4: kd> !storsrb ffffc883fe157960 SRB is a STORAGE request block (SRB_EX) SRB EX 0xffffc883fe157960 Function 28 Version 1, Signature 53524258, SrbStatus: 0x00[Pending], SrbFunction 0x00 [EXECUTE SCSI] Address Type is BTL8 SRB_EX Data Type [SrbExDataTypeScsiCdb16] [EXECUTE SCSI] SRB_EX: 0xffffc883fe1579f0 OriginalRequest: 0xffffc883ff28e480 DataBuffer/Length: 0x0000000000000000 / 0x00008000 PTL: (0, 0, 0) CDB: 28 00 02 48 1F D8 00 00 40 00 00 00 00 00 00 00 OpCode: SCSI/READ (10) 4: kd> dt storport!_EXTENDED_REQUEST_BLOCK 0xffffc883fe1579f0 +0x000 Signature : 0x40 +0x008 Pool : 0x00000000`000a1200 _NPAGED_LOOKASIDE_LIST +0x010 OwnedMdl : 0y0 +0x010 RemoveFromEventQueue : 0y0 +0x010 State : 0y010 +0x010 RemappedSenseInfo : 0y1 +0x010 CompatSrbInUse : 0y0 +0x010 SrbActivateComponent : 0y1 +0x011 DoExtraAdapterDereference : 0y0 +0x011 DoExtraUnitDereference : 0y1 +0x011 AbortInProgress : 0y0 +0x011 ByPassPausedGateway : 0y0 +0x011 Reserved : 0y1110 +0x012 InitiatingProcessor : _PROCESSOR_NUMBER +0x018 InitiatingToken : 0x0000d81f`48020028 _STARTIO_TOKEN +0x020 CompletedLink : _SLIST_ENTRY +0x030 PendingLink : _STOR_EVENT_QUEUE_ENTRY +0x068 Mdl : 0x00000000`00001000 _MDL +0x070 SgList : 0x00000000`00132053 _SCATTER_GATHER_LIST +0x078 RemappedSgListMdl : (null) +0x080 RemappedSgList : 0x0000377c`08fc1648 _SCATTER_GATHER_LIST +0x088 DataInMdl : 0x00000001`00000001 _MDL +0x090 DoubleBufferedMdl : 0x00000000`00000001 _MDL +0x098 DataInSgList : 0x00000000`0000003c _SCATTER_GATHER_LIST +0x0a0 Irp : 0x00000001`00000000 _IRP +0x0a8 Srb : (null) +0x0b0 SrbData : <unnamed-tag> +0x0d8 Adapter : 0xffffc883`f703ebe0 _RAID_ADAPTER_EXTENSION +0x0e0 Unit : 0x00000001`00000001 _RAID_UNIT_EXTENSION +0x0e8 ScatterGatherBuffer : [424] "" +0x290 CompletionRoutine : 0xffffc883`f4782c80 void +ffffc883f4782c80 +0x298 u : <unnamed-tag> +0x2b0 RequestWaitDuration : 0xc +0x2b8 RequestStartTimeStamp : _LARGE_INTEGER 0x8000000 +0x2c0 RequestAfterBuildIoTimeStamp : _LARGE_INTEGER 0xffffc883`f3e99540 +0x2c8 RequestAfterStartIoTimeStamp : _LARGE_INTEGER 0xffffc883`f42a9080 +0x2d0 RequestMiniportDuration : 0x1c +0x2d8 ActivityId : _GUID {00000019-0000-0000-0000-000000000000} +0x2e8 CompatSrbBufferSize : 0 +0x2ec Component : 0 +0x2f0 OriginalSrb : (null) +0x2f8 CompatSrbBuffer : (null) +0x300 ParentIrp : (null) +0x308 AbortStatus : 0n0 +0x310 CryptoKeyInfo : (null) 4: kd> dt storport!_EXTENDED_REQUEST_BLOCK 0xffffc88401285aa0 +0x000 Signature : 0x40 +0x008 Pool : 0x00000000`000a1200 _NPAGED_LOOKASIDE_LIST +0x010 OwnedMdl : 0y0 +0x010 RemoveFromEventQueue : 0y0 +0x010 State : 0y010 +0x010 RemappedSenseInfo : 0y1 +0x010 CompatSrbInUse : 0y1 +0x010 SrbActivateComponent : 0y0 +0x011 DoExtraAdapterDereference : 0y1 +0x011 DoExtraUnitDereference : 0y0 +0x011 AbortInProgress : 0y0 +0x011 ByPassPausedGateway : 0y1 +0x011 Reserved : 0y1000 +0x012 InitiatingProcessor : _PROCESSOR_NUMBER +0x018 InitiatingToken : 0x0000e0ff`e305002a _STARTIO_TOKEN +0x020 CompletedLink : _SLIST_ENTRY +0x030 PendingLink : _STOR_EVENT_QUEUE_ENTRY +0x068 Mdl : 0x0000003f`0000003f _MDL +0x070 SgList : 0x000001c1`00000000 _SCATTER_GATHER_LIST +0x078 RemappedSgListMdl : (null) +0x080 RemappedSgList : (null) +0x088 DataInMdl : 0xffffc884`012adc18 _MDL +0x090 DoubleBufferedMdl : 0xffffc884`01285b30 _MDL +0x098 DataInSgList : 0xffffc884`01285b30 _SCATTER_GATHER_LIST +0x0a0 Irp : (null) +0x0a8 Srb : (null) +0x0b0 SrbData : <unnamed-tag> +0x0d8 Adapter : (null) +0x0e0 Unit : 0x00000000`00000001 _RAID_UNIT_EXTENSION +0x0e8 ScatterGatherBuffer : [424] "" +0x290 CompletionRoutine : (null) +0x298 u : <unnamed-tag> +0x2b0 RequestWaitDuration : 0 +0x2b8 RequestStartTimeStamp : _LARGE_INTEGER 0x0 +0x2c0 RequestAfterBuildIoTimeStamp : _LARGE_INTEGER 0x0 +0x2c8 RequestAfterStartIoTimeStamp : _LARGE_INTEGER 0x0 +0x2d0 RequestMiniportDuration : 0 +0x2d8 ActivityId : _GUID {00000000-0000-0000-0000-000000000000} +0x2e8 CompatSrbBufferSize : 0 +0x2ec Component : 0 +0x2f0 OriginalSrb : (null) +0x2f8 CompatSrbBuffer : (null) +0x300 ParentIrp : (null) +0x308 AbortStatus : 0n19422656 +0x310 CryptoKeyInfo : 0x00000000`0badca11 _STOR_CRYPTO_KEY_INFO 4: kd> dx -r1 (*((storport!_MDL *)0x3f0000003f)) (*((storport!_MDL *)0x3f0000003f)) [Type: _MDL] [+0x000] Next : Unable to read memory at Address 0x3f0000003f [+0x008] Size : Unable to read memory at Address 0x3f00000047 [+0x00a] MdlFlags : Unable to read memory at Address 0x3f00000049 [+0x010] Process : Unable to read memory at Address 0x3f0000004f [+0x018] MappedSystemVa : Unable to read memory at Address 0x3f00000057 [+0x020] StartVa : Unable to read memory at Address 0x3f0000005f [+0x028] ByteCount : Unable to read memory at Address 0x3f00000067 [+0x02c] ByteOffset : Unable to read memory at Address 0x3f0000006b Nothing suspicious about the LUN reset XRB (except for the fact that it never finishes?), whatever went wrong happened before this: 0: kd> dt storport!_EXTENDED_REQUEST_BLOCK 0xffffc883f57ab010 +0x000 Signature : 0x1f2e3d4c +0x008 Pool : (null) +0x010 OwnedMdl : 0y0 +0x010 RemoveFromEventQueue : 0y1 +0x010 State : 0y011 +0x010 RemappedSenseInfo : 0y0 +0x010 CompatSrbInUse : 0y0 +0x010 SrbActivateComponent : 0y0 +0x011 DoExtraAdapterDereference : 0y0 +0x011 DoExtraUnitDereference : 0y0 +0x011 AbortInProgress : 0y0 +0x011 ByPassPausedGateway : 0y0 +0x011 Reserved : 0y0000 +0x012 InitiatingProcessor : _PROCESSOR_NUMBER +0x018 InitiatingToken : (null) +0x020 CompletedLink : _SLIST_ENTRY +0x030 PendingLink : _STOR_EVENT_QUEUE_ENTRY +0x068 Mdl : (null) +0x070 SgList : (null) +0x078 RemappedSgListMdl : (null) +0x080 RemappedSgList : (null) +0x088 DataInMdl : (null) +0x090 DoubleBufferedMdl : (null) +0x098 DataInSgList : (null) +0x0a0 Irp : 0xffffc884`0137bee0 _IRP +0x0a8 Srb : 0xffffc884`01b863b0 _SCSI_REQUEST_BLOCK +0x0b0 SrbData : <unnamed-tag> +0x0d8 Adapter : 0xffffc883`f4bc21a0 _RAID_ADAPTER_EXTENSION +0x0e0 Unit : 0xffffc883`f4b9e1b0 _RAID_UNIT_EXTENSION +0x0e8 ScatterGatherBuffer : [424] "" +0x290 CompletionRoutine : 0xfffff80a`cd88ca80 void storport!RaidUnitCompleteResetRequest+0 +0x298 u : <unnamed-tag> +0x2b0 RequestWaitDuration : 0 +0x2b8 RequestStartTimeStamp : _LARGE_INTEGER 0x00000005`0c00caf5 +0x2c0 RequestAfterBuildIoTimeStamp : _LARGE_INTEGER 0x0 +0x2c8 RequestAfterStartIoTimeStamp : _LARGE_INTEGER 0x0 +0x2d0 RequestMiniportDuration : 0 +0x2d8 ActivityId : _GUID {00000000-0000-0000-0000-000000000000} +0x2e8 CompatSrbBufferSize : 0x90 +0x2ec Component : 0 +0x2f0 OriginalSrb : (null) +0x2f8 CompatSrbBuffer : 0xffffc883`f57ac600 Void +0x300 ParentIrp : (null) +0x308 AbortStatus : 0n0 +0x310 CryptoKeyInfo : (null) I can't read any of the IRPs due to the lack of symbols. Is it normal for some of the SCSI/{READ,WRITE} requests to have DataBuffer/Length be 0x00 / 0xSomeValue? The pool for both reads and writes with DataBuffer zero was _NPAGED_LOOKASIDE_LIST. Or is/was that a null dereference?
  Message 15 of 29  
14 Mar 17 15:39
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

I should add - so I wasn't able to run !storclass because of the missing symbols; I know I can see a list of failed requests there. Should !storunit have shown me the same? Running the target then breaking minutes later shows the same pending requests (despite the timeout have been long-since exceeded) and their SRB status is all PENDING (and none are FAILED). Digging in reveals (as expected) the SENSE data to be all zeros (as they never failed, as far as the kernel/device/driver/something is concerned).
  Message 16 of 29  
15 Mar 17 13:26
Scott Noone
xxxxxx@osr.com
Join Date:
Posts To This List: 1304
List Moderator
Which driver failure would cause all disk access to cease without system panic?

If a request times out, StorPort will try to reset the unit and make it abort all in progress I/O requests. If the reset doesn't complete then that's bad business, usually this means that the hardware has ceased responding (especially if you're seeing this with two different drivers for the same device). Random guess, but do you have the disk configured to power down when idle (!popolicy will tell you, but that requires symbols)? I'd try shutting that off and see if that helps at all. -scott OSR @OSRDrivers wrote in message news:222781@ntdev... With my debug USB cable in hand, I was *finally* able to get some real data out of this. Sorry for wasting everyone's time with the pathetic attempts at local debugging earlier in this thread. _Unfortunately_, I don't have the symbols for this (no symbols for 15055 as of yet, I've contacted WinDbgFb and we'll see if we hear back. I may have to debug this on a clean install of a different build that triggers the issue yet has the debug symbols available.. *sigh*). Anyway, this is what I get: 0: kd> !wmitrace.logdump EventLog-System ------------------------------------------------ | | | NT symbols are not available | | reduced functionality | | | ------------------------------------------------ (WmiTrace) LogDump for Logger Id 0x0c Found Buffers: 2 Messages: 2, sorting entries [0]0000.0000:: 131339913007908456 [Microsoft-Windows-StorPort/Bus reset /OpCodeBusReset]Bus reset occured on storport adapter (Port Number: 1) [0]0000.0000:: 131339913307864363 [Microsoft-Windows-StorPort/None /Info ]A request timed out for Storport Device (Port = 1, Path = 0, Target = 0, Lun = 0). Corresponding Class Disk Device Guid is {0d61954a-3e28-2ff4-3116-b0b7bdd7c44b}. Total of 2 Messages from 2 Buffers 0: kd> !storclass <symbols missing> 0: kd> !storadapter STORPORT adapters: ================== Driver Object Extension State ----------------------------------------------------------------- \Driver\secnvme ffffc883f4bc2050 ffffc883f4bc21a0 Working \Driver\storahci ffffc883f4bb5050 ffffc883f4bb51a0 Working 0: kd> !storadapter ffffc883f4bc2050 ADAPTER DeviceObj : ffffc883f4bc2050 AdapterExt: ffffc883f4bc21a0 DriverObj : ffffc883f4ba58d0 DeviceState : Working LowerDO ffffc883f4b59620 PhysicalDO ffffc883f4b59840 SlowLock Free RemLock -666 SystemPowerState: Working AdapterPowerState D0 Full Duplex Bus 4 Slot 0 DMA ffffc883f4bd0dd0 Interrupt 0000000000000000 Allocated ResourceList ffffc883f4bc6960 Translated ResourceList ffffc883f4ba78f0 Gateway: Outstanding 0 Lower 1024 High 1024 PortConfigInfo ffffc883f4bc22d0 HwInit ffffc883f4ba83b0 HwDeviceExt ffffc883f2c6b010 (10024 bytes) SrbExt 4560 bytes LUExt 0 bytes Normal Logical Units: Product SCSI ID Object Extension Pnd Out Ct State --------------------------------------------------------------------------------- ------- NVMe Samsung SS 0 0 0 ffffc883f4b9e060 ffffc883f4b9e1b0 35 0 0 Working Zombie Logical Units: Product SCSI ID Object Extension Pnd Out Ct State --------------------------------------------------------------------------------- ----- 0: kd> !storunit ffffc883f4b9e060 DO ffffc883f4b9e060 Ext ffffc883f4b9e1b0 Adapter ffffc883f4bc21a0 Working Vendor: NVMe Product: Samsung SSD 950 SCSI ID: (0, 0, 0) Claimed Enumerated SlowLock Free RemLock 38 PageCount 2 QueueTagList: ffffc883f4b9e2b0 Outstanding: Head 0000000000000000 Tail 0000000000000000 Timeout 0 (Ticking Down) DeviceQueue ffffc883f4b9e340 Depth: 254 Status: Not Frozen PauseCount: 1 BusyCount: 0 IO Gateway: Busy Count 0 Pause Count 0 Requests: Outstanding 0 Device 35 ByPass 0 [Device-Queued Requests] IRP SRB Type SRB XRB Command MDL SGList Timeout --------------------------------------------------------------------------------- -------------------------------------------------- ffffc88400fd6bb0 [STORAGE] ffffc88400fd6db0 n/a SCSI/WRITE (10) ffffda80a7131ea0 n/a 65s ffffc883ff39a010 [STORAGE] ffffc88401285a10 n/a SCSI/WRITE (10) ffffda80a0b27750 n/a 65s ffffc8840106a4f0 [STORAGE] ffffc8840106a6f0 n/a SCSI/READ (10) ffffc883f46b83a0 n/a 65s ffffc88400fdaa20 [STORAGE] ffffc883fecde200 n/a SCSI/WRITE (10) ffffc883f492a200 n/a 65s ffffc884013dfd40 [STORAGE] ffffc884013dff40 n/a SCSI/WRITE (10) ffffc8840172b010 n/a 65s ffffc883f317ae50 [STORAGE] ffffc883ff400bb0 n/a SCSI/WRITE (10) ffffc88401716100 n/a 65s ffffc883fbe95460 [STORAGE] ffffc883fd84b7b0 n/a SCSI/WRITE (10) ffffc8840174e930 n/a 65s ffffc884012637b0 [STORAGE] ffffc883ff0b4970 n/a SCSI/WRITE (10) ffffda80a16be180 n/a 65s ffffc883fec4e450 [STORAGE] ffffc883ff725d10 n/a SCSI/WRITE (10) ffffc884013daa40 n/a 65s ffffc883ff28e480 [STORAGE] ffffc883fe157960 n/a SCSI/READ (10) ffffc883f4404550 n/a 65s ffffc88400ecfd40 [STORAGE] ffffc88400ecff40 n/a SCSI/WRITE (10) ffffda809fd21180 n/a 65s ffffc883fd363010 [STORAGE] ffffc883fe02c1c0 n/a SCSI/WRITE (10) ffffda80a0f9b180 n/a 65s ffffc88400f0c100 [STORAGE] ffffc88401344a00 n/a SCSI/READ (10) ffffc883f3189b10 n/a 65s ffffc88401371ea0 [STORAGE] ffffc88401343a00 n/a SCSI/WRITE (10) ffffc883ff6d4510 n/a 65s ffffc88400f99790 [STORAGE] ffffc883ffb6a7e0 n/a SCSI/READ (10) ffffc883ff1da8c0 n/a 65s ffffc884013875e0 [STORAGE] ffffc884012171e0 n/a SCSI/WRITE (10) ffffc883fe2df640 n/a 65s ffffc883fe5ee730 [STORAGE] ffffc883ff319c50 n/a SCSI/READ (10) ffffc883f4160d20 n/a 65s ffffc883fb218e50 [STORAGE] ffffc883f45abd40 n/a SCSI/WRITE (10) ffffc883f3d6fb00 n/a 65s ffffc8840159b6f0 [STORAGE] ffffc883f3cbdcd0 n/a SCSI/WRITE (10) ffffc883f42c6200 n/a 65s ffffc883fed71a30 [STORAGE] ffffc883ff3737a0 n/a SCSI/WRITE (10) ffffc883f3907620 n/a 65s ffffc88401553780 [STORAGE] ffffc883fed97d80 n/a SCSI/READ (10) ffffc883ff1355d0 n/a 65s ffffc88401a11c00 [STORAGE] ffffc883fd5d5320 n/a SCSI/READ (10) ffffc883fdbdcdb0 n/a 65s ffffc883f424bea0 [STORAGE] ffffc883ff5d23d0 n/a SCSI/WRITE (10) ffffc883f3c538a0 n/a 65s ffffc883f447e870 [STORAGE] ffffc883ff458c10 n/a SCSI/READ (10) ffffc883ff712110 n/a 65s ffffc884019d4d80 [STORAGE] ffffc88401791860 n/a SCSI/READ (10) ffffc883ff307d90 n/a 65s ffffc883fb8d4b70 [STORAGE] ffffc883f3b501c0 n/a SCSI/WRITE (10) ffffc883fd30e7e0 n/a 65s ffffc883ff196b10 [STORAGE] ffffc883f45ea010 n/a SCSI/READ (10) ffffc883ff00fde0 n/a 65s ffffc883f436b2f0 [STORAGE] ffffc883fdcd7b30 n/a SCSI/READ (10) ffffc883fefd4f50 n/a 65s ffffc884019baa70 [STORAGE] ffffc883f3ad4230 n/a SCSI/WRITE (10) ffffc88401233390 n/a 65s ffffc88401737010 [STORAGE] ffffc883fe38c1b0 n/a SCSI/WRITE (10) ffffda80a027f270 n/a 65s ffffc883fb8ef770 [STORAGE] ffffc883f4921430 n/a SCSI/WRITE (10) ffffda80a6ecd270 n/a 65s ffffc88400f86460 [STORAGE] ffffc883fdd20010 n/a SCSI/READ (10) ffffc883fb2257d0 n/a 65s ffffc883f38cf630 [STORAGE] ffffc883fde17310 n/a SCSI/WRITE (10) ffffc883fae5ab30 n/a 65s ffffc883f42d1570 [STORAGE] ffffc883fdee4140 n/a SCSI/WRITE (10) ffffc883f43cbf40 n/a 65s ffffc883fdbc12b0 [STORAGE] ffffc883fe244ae0 n/a SCSI/WRITE (10) ffffc883f3daa900 n/a 65s [Bypass-Queued Requests] IRP SRB Type SRB XRB Command MDL SGList Timeout --------------------------------------------------------------------------------- -------------------------------------------------- [Outstanding Requests] IRP SRB Type SRB XRB Command MDL SGList Timeout --------------------------------------------------------------------------------- -------------------------------------------------- ffffc8840137bee0 [STORAGE] ffffc88401b863b0 ffffc883f57ab010 RESET LUN 0000000000000000 0000000000000000 30s [Completed Requests] IRP SRB Type SRB XRB Command MDL SGList Timeout --------------------------------------------------------------------------------- -------------------------------------------------- ERROR: 1 counted requests > 0 outstanding requests 0: kd> !storsrb ffffc88401b863b0 SRB is a STORAGE request block (SRB_EX) SRB EX 0xffffc88401b863b0 Function 28 Version 1, Signature 53524258, SrbStatus: 0x00[Pending], SrbFunction 0x20 [RESET LUN] Address Type is BTL8 No SrbExData 4: kd> !storsrb ffffc88401285a10 SRB is a STORAGE request block (SRB_EX) SRB EX 0xffffc88401285a10 Function 28 Version 1, Signature 53524258, SrbStatus: 0x00[Pending], SrbFunction 0x00 [EXECUTE SCSI] Address Type is BTL8 SRB_EX Data Type [SrbExDataTypeScsiCdb16] [EXECUTE SCSI] SRB_EX: 0xffffc88401285aa0 OriginalRequest: 0xffffc883ff39a010 DataBuffer/Length: 0x0000000000000000 / 0x00001000 PTL: (0, 0, 0) CDB: 2A 00 05 E3 FF E0 00 00 08 00 00 00 00 00 00 00 OpCode: SCSI/WRITE (10) 4: kd> !storsrb ffffc8840106a6f0 SRB is a STORAGE request block (SRB_EX) SRB EX 0xffffc8840106a6f0 Function 28 Version 1, Signature 53524258, SrbStatus: 0x00[Pending], SrbFunction 0x00 [EXECUTE SCSI] Address Type is BTL8 SRB_EX Data Type [SrbExDataTypeScsiCdb16] [EXECUTE SCSI] SRB_EX: 0xffffc8840106a780 OriginalRequest: 0xffffc8840106a4f0 DataBuffer/Length: 0xffffc88401a4cdc0 / 0x00000200 PTL: (0, 0, 0) CDB: 28 00 1E 5F 15 DF 00 00 01 00 00 00 00 00 00 00 OpCode: SCSI/READ (10) 4: kd> !storsrb ffffc883fe157960 SRB is a STORAGE request block (SRB_EX) SRB EX 0xffffc883fe157960 Function 28 Version 1, Signature 53524258, SrbStatus: 0x00[Pending], SrbFunction 0x00 [EXECUTE SCSI] Address Type is BTL8 SRB_EX Data Type [SrbExDataTypeScsiCdb16] [EXECUTE SCSI] SRB_EX: 0xffffc883fe1579f0 OriginalRequest: 0xffffc883ff28e480 DataBuffer/Length: 0x0000000000000000 / 0x00008000 PTL: (0, 0, 0) CDB: 28 00 02 48 1F D8 00 00 40 00 00 00 00 00 00 00 OpCode: SCSI/READ (10) 4: kd> dt storport!_EXTENDED_REQUEST_BLOCK 0xffffc883fe1579f0 +0x000 Signature : 0x40 +0x008 Pool : 0x00000000`000a1200 _NPAGED_LOOKASIDE_LIST +0x010 OwnedMdl : 0y0 +0x010 RemoveFromEventQueue : 0y0 +0x010 State : 0y010 +0x010 RemappedSenseInfo : 0y1 +0x010 CompatSrbInUse : 0y0 +0x010 SrbActivateComponent : 0y1 +0x011 DoExtraAdapterDereference : 0y0 +0x011 DoExtraUnitDereference : 0y1 +0x011 AbortInProgress : 0y0 +0x011 ByPassPausedGateway : 0y0 +0x011 Reserved : 0y1110 +0x012 InitiatingProcessor : _PROCESSOR_NUMBER +0x018 InitiatingToken : 0x0000d81f`48020028 _STARTIO_TOKEN +0x020 CompletedLink : _SLIST_ENTRY +0x030 PendingLink : _STOR_EVENT_QUEUE_ENTRY +0x068 Mdl : 0x00000000`00001000 _MDL +0x070 SgList : 0x00000000`00132053 _SCATTER_GATHER_LIST +0x078 RemappedSgListMdl : (null) +0x080 RemappedSgList : 0x0000377c`08fc1648 _SCATTER_GATHER_LIST +0x088 DataInMdl : 0x00000001`00000001 _MDL +0x090 DoubleBufferedMdl : 0x00000000`00000001 _MDL +0x098 DataInSgList : 0x00000000`0000003c _SCATTER_GATHER_LIST +0x0a0 Irp : 0x00000001`00000000 _IRP +0x0a8 Srb : (null) +0x0b0 SrbData : <unnamed-tag> +0x0d8 Adapter : 0xffffc883`f703ebe0 _RAID_ADAPTER_EXTENSION +0x0e0 Unit : 0x00000001`00000001 _RAID_UNIT_EXTENSION +0x0e8 ScatterGatherBuffer : [424] "" +0x290 CompletionRoutine : 0xffffc883`f4782c80 void +ffffc883f4782c80 +0x298 u : <unnamed-tag> +0x2b0 RequestWaitDuration : 0xc +0x2b8 RequestStartTimeStamp : _LARGE_INTEGER 0x8000000 +0x2c0 RequestAfterBuildIoTimeStamp : _LARGE_INTEGER 0xffffc883`f3e99540 +0x2c8 RequestAfterStartIoTimeStamp : _LARGE_INTEGER 0xffffc883`f42a9080 +0x2d0 RequestMiniportDuration : 0x1c +0x2d8 ActivityId : _GUID {00000019-0000-0000-0000-000000000000} +0x2e8 CompatSrbBufferSize : 0 +0x2ec Component : 0 +0x2f0 OriginalSrb : (null) +0x2f8 CompatSrbBuffer : (null) +0x300 ParentIrp : (null) +0x308 AbortStatus : 0n0 +0x310 CryptoKeyInfo : (null) 4: kd> dt storport!_EXTENDED_REQUEST_BLOCK 0xffffc88401285aa0 +0x000 Signature : 0x40 +0x008 Pool : 0x00000000`000a1200 _NPAGED_LOOKASIDE_LIST +0x010 OwnedMdl : 0y0 +0x010 RemoveFromEventQueue : 0y0 +0x010 State : 0y010 +0x010 RemappedSenseInfo : 0y1 +0x010 CompatSrbInUse : 0y1 +0x010 SrbActivateComponent : 0y0 +0x011 DoExtraAdapterDereference : 0y1 +0x011 DoExtraUnitDereference : 0y0 +0x011 AbortInProgress : 0y0 +0x011 ByPassPausedGateway : 0y1 +0x011 Reserved : 0y1000 +0x012 InitiatingProcessor : _PROCESSOR_NUMBER +0x018 InitiatingToken : 0x0000e0ff`e305002a _STARTIO_TOKEN +0x020 CompletedLink : _SLIST_ENTRY +0x030 PendingLink : _STOR_EVENT_QUEUE_ENTRY +0x068 Mdl : 0x0000003f`0000003f _MDL +0x070 SgList : 0x000001c1`00000000 _SCATTER_GATHER_LIST +0x078 RemappedSgListMdl : (null) +0x080 RemappedSgList : (null) +0x088 DataInMdl : 0xffffc884`012adc18 _MDL +0x090 DoubleBufferedMdl : 0xffffc884`01285b30 _MDL +0x098 DataInSgList : 0xffffc884`01285b30 _SCATTER_GATHER_LIST +0x0a0 Irp : (null) +0x0a8 Srb : (null) +0x0b0 SrbData : <unnamed-tag> +0x0d8 Adapter : (null) +0x0e0 Unit : 0x00000000`00000001 _RAID_UNIT_EXTENSION +0x0e8 ScatterGatherBuffer : [424] "" +0x290 CompletionRoutine : (null) +0x298 u : <unnamed-tag> +0x2b0 RequestWaitDuration : 0 +0x2b8 RequestStartTimeStamp : _LARGE_INTEGER 0x0 +0x2c0 RequestAfterBuildIoTimeStamp : _LARGE_INTEGER 0x0 +0x2c8 RequestAfterStartIoTimeStamp : _LARGE_INTEGER 0x0 +0x2d0 RequestMiniportDuration : 0 +0x2d8 ActivityId : _GUID {00000000-0000-0000-0000-000000000000} +0x2e8 CompatSrbBufferSize : 0 +0x2ec Component : 0 +0x2f0 OriginalSrb : (null) +0x2f8 CompatSrbBuffer : (null) +0x300 ParentIrp : (null) +0x308 AbortStatus : 0n19422656 +0x310 CryptoKeyInfo : 0x00000000`0badca11 _STOR_CRYPTO_KEY_INFO 4: kd> dx -r1 (*((storport!_MDL *)0x3f0000003f)) (*((storport!_MDL *)0x3f0000003f)) [Type: _MDL] [+0x000] Next : Unable to read memory at Address 0x3f0000003f [+0x008] Size : Unable to read memory at Address 0x3f00000047 [+0x00a] MdlFlags : Unable to read memory at Address 0x3f00000049 [+0x010] Process : Unable to read memory at Address 0x3f0000004f [+0x018] MappedSystemVa : Unable to read memory at Address 0x3f00000057 [+0x020] StartVa : Unable to read memory at Address 0x3f0000005f [+0x028] ByteCount : Unable to read memory at Address 0x3f00000067 [+0x02c] ByteOffset : Unable to read memory at Address 0x3f0000006b Nothing suspicious about the LUN reset XRB (except for the fact that it never finishes?), whatever went wrong happened before this: 0: kd> dt storport!_EXTENDED_REQUEST_BLOCK 0xffffc883f57ab010 +0x000 Signature : 0x1f2e3d4c +0x008 Pool : (null) +0x010 OwnedMdl : 0y0 +0x010 RemoveFromEventQueue : 0y1 +0x010 State : 0y011 +0x010 RemappedSenseInfo : 0y0 +0x010 CompatSrbInUse : 0y0 +0x010 SrbActivateComponent : 0y0 +0x011 DoExtraAdapterDereference : 0y0 +0x011 DoExtraUnitDereference : 0y0 +0x011 AbortInProgress : 0y0 +0x011 ByPassPausedGateway : 0y0 +0x011 Reserved : 0y0000 +0x012 InitiatingProcessor : _PROCESSOR_NUMBER +0x018 InitiatingToken : (null) +0x020 CompletedLink : _SLIST_ENTRY +0x030 PendingLink : _STOR_EVENT_QUEUE_ENTRY +0x068 Mdl : (null) +0x070 SgList : (null) +0x078 RemappedSgListMdl : (null) +0x080 RemappedSgList : (null) +0x088 DataInMdl : (null) +0x090 DoubleBufferedMdl : (null) +0x098 DataInSgList : (null) +0x0a0 Irp : 0xffffc884`0137bee0 _IRP +0x0a8 Srb : 0xffffc884`01b863b0 _SCSI_REQUEST_BLOCK +0x0b0 SrbData : <unnamed-tag> +0x0d8 Adapter : 0xffffc883`f4bc21a0 _RAID_ADAPTER_EXTENSION +0x0e0 Unit : 0xffffc883`f4b9e1b0 _RAID_UNIT_EXTENSION +0x0e8 ScatterGatherBuffer : [424] "" +0x290 CompletionRoutine : 0xfffff80a`cd88ca80 void storport!RaidUnitCompleteResetRequest+0 +0x298 u : <unnamed-tag> +0x2b0 RequestWaitDuration : 0 +0x2b8 RequestStartTimeStamp : _LARGE_INTEGER 0x00000005`0c00caf5 +0x2c0 RequestAfterBuildIoTimeStamp : _LARGE_INTEGER 0x0 +0x2c8 RequestAfterStartIoTimeStamp : _LARGE_INTEGER 0x0 +0x2d0 RequestMiniportDuration : 0 +0x2d8 ActivityId : _GUID {00000000-0000-0000-0000-000000000000} +0x2e8 CompatSrbBufferSize : 0x90 +0x2ec Component : 0 +0x2f0 OriginalSrb : (null) +0x2f8 CompatSrbBuffer : 0xffffc883`f57ac600 Void +0x300 ParentIrp : (null) +0x308 AbortStatus : 0n0 +0x310 CryptoKeyInfo : (null) I can't read any of the IRPs due to the lack of symbols. Is it normal for some of the SCSI/{READ,WRITE} requests to have DataBuffer/Length be 0x00 / 0xSomeValue? The pool for both reads and writes with DataBuffer zero was _NPAGED_LOOKASIDE_LIST. Or is/was that a null dereference?
  Message 17 of 29  
15 Mar 17 17:30
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

Hi Scott, Thanks for checking in! You can make that 3 different drivers - I tried the Intel NVMe drivers and was this close to calling it a success when it happened again. I reverted to 15048 and have symbols, did a multihour windbg session yesterday and poked around looking for sense data to no avail. Automatic shutdown of the disk was enabled; I thought it might be that but I didn't explore it further since it was occurring both at idle and peak load. Disabling it didn't help. This is a laptop, fwiw. I thought dynamic PCI-E power management may have been a culprit, this being NVMe and all, but alas that too was a no-go. It's definitely not the drive, I yanked it again and stuck it in another machine for 72 hours of normal usage and had no issues. Put it back and this all started again. If it were a SATA device, we could blame the SATA controller, but being an NVMe it's really just a bus leading straight to the PCIe lines.. It makes me suspect something like active state power management, but the Dell BIOS doesn't expose any such settings. Mini-rant: I absolutely hate buying "power user hardware" for this reason. Entry-level motherboards/devices get so much more eyeballs and QA that BIOS issues get sorted out ASAP whereas some of the so-called "enterprise" gear (like this Precision laptop) are much less-thoroughly vetted. I've already updated both the BIOS and the drive's firmware, neither helped.
  Message 18 of 29  
20 Mar 17 16:56
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

Curiouser and curiouser... except it's getting to be more than I can bear. A replacement device came in today. Different processor and gpu, but same motherboard and everything else. Same behavior. I am so exasperated with this thing... Anyway, the first time it froze up on me with the new device was preceded by a period of intense UI lag (seconds for typed characters to appear on-screen in any application, missed keystrokes, etc) which prompted me to run a WPA session. I had 2 second delays in calls to storport.sys - but of course, no symbols so that was useless. Back when I had symbols, the longest delays in storport were due to TRIM calls, but they were more on the order of 2ms rather than 2s. At this point, it's not a hardware issue and it's not a 3rd party driver issue (clean install exhibits the same behavior, Intel/Samsung/MSFT NVMe controller drivers exhibit identical behavior). Maybe I didn't test long enough, but this isn't something I ran into running W10 15011, but have experienced on upgrading to 15048-15061. The same physical disk runs fine (running the same install of W10) in another machine for weeks on end. I've ordered another model NVMe disk (but same manufacturer, perhaps a mistake on my end...) to see if that could be the issue. I'm not sure where to turn to at this point. The constant missing symbols is a real PITA, but I just keep upgrading from 15048 (which has published symbols) to the latest fast ring build each time a new build comes out in hopes of experiencing something different. I guess the storport delays above count as that (running 15061).
  Message 19 of 29  
21 Mar 17 21:48
Alex Grig
xxxxxx@broadcom.com
Join Date: 14 Apr 2008
Posts To This List: 3200
Which driver failure would cause all disk access to cease without system panic?

FWIW, I'm observing some mysterious hangups (slowdown, and timeouts) in a Intel NUC box with Windows 7 booted from Intel NVMe stick, with Microsoft NVMe driver.
  Message 20 of 29  
22 Mar 17 11:49
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

Alex, your reply is worth more than you could imagine. It's honestly great knowing (or at least, suspecting) I'm not alone in this. Is this a Kaby Lake NUC?
  Message 21 of 29  
22 Mar 17 12:06
Alex Grig
xxxxxx@broadcom.com
Join Date: 14 Apr 2008
Posts To This List: 3200
Which driver failure would cause all disk access to cease without system panic?

@Mahmoud Al-Qudsi NUC SKU: NUC6i3SYH NVMe SSD: Intel SSD 600p Series (128GB, M.2 2280 80mm NVMe PCIe 3.0 x4, 3D1, TLC)
  Message 22 of 29  
22 Mar 17 19:14
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

That NUC would be a 6th-generation Skylake CPU, and I think it's running on Intel's HM170 chipset. Presuming that you and I are seeing the same issue, there's not a very clear overlap between our situations. I've had no problems with these configurations (up to one month+ of uptime): * Intel 750 SSD running on a fourth-generation (Haswell) Xeon 1650 v3, X99 chipset * Samsung 950 Pro running on a sixth-generation (Skylake) Xeon 1545M v5, CM236 chipset But I've been seeing this bug in the following configurations: * Samsung 950 Pro running on a seventh-generation (Kaby Lake) Xeon 1505M v6, CM238 chipset * Samsung 950 Pro running on a seventh-generation (Kaby Lake) Core i7 7700HQ, CM238 chipset Is there some way to enable verbose logging for anything to do with storport?
  Message 23 of 29  
23 Mar 17 14:31
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

I used perfmon to set up a data collector for Microsoft-Windows-storport, but it seems to want to stop logging randomly, without rhyme or reason. All limits are disabled, and I'm logging to a USB stick (as obviously local won't do). I even tried to set it up to restart automatically every ten minutes but that just made things worse as after the first, manually initiated run, future runs would create the etl file but leave it blank at 1KB (even though all limits were disabled). I think there may be a bug in W10 where it's applying limits that aren't checked, so I'm trying again with the limits set to 24 hours and the job set to restart more frequently than that. We'll see if that works. Back to the matter at hand, I tried disabling PCI-E link power management in the power options (given that NVMe is just a glorified PCI-E protocol) and of course it seemed to be working for over 24 hours.. but then inevitably, probably less than 2 minutes after the storport data collector died (of course!), it happened again. Such is life. The odd thing is that with the PCI-E link power management turned off, the system was much more stable. With it enabled, stressing out disk access (such as restoring a Chrome session with 60 or so tabs all at once) triggers the bug far more often.. but it might just be a coincidence.
  Message 24 of 29  
23 Mar 17 23:11
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

I managed to get the data logger working, but the stupid thing refuses to write directly to a file and insists on using buffered I/O. How do MS driver devs get anything done when their data collection is at the mercy of pure luck? Even with the buffer size set to 1kb (the minimum) and the flush timer set to 1 second (the minimum) and all storport/miniport logging enabled, I was unable to get the last commands before the system froze. The ETL file had some captured calls towards the end that were taking on the order of 7,000,000 ns to execute (SCSI command 0x35 SCSIOP_SYNCHRONIZE_CACHE) but that was followed by read/write commands back in the 10s of 100s of ns, so I don't know what to think.
  Message 25 of 29  
25 Mar 17 19:31
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

@Alex Grig Sorry, my second failing machine is actually an HM175 chipset, which is an updated HM170 (your NUC).
  Message 26 of 29  
28 Mar 17 19:34
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

So I cloned my disk over to a 256GB Toshiba THNSN5256GPUK (aka Toshiba XG4) on the 24th, and have been running without any hitches since then. I still don't think it's any sort of hardware failure in the Samsung 950 Pro drive - it's also been running since then on another machine without failure. It seems to be some sort of incompatibility between the Samsung NVMe drive and the chipset. The Samsung 960 Pro I ordered is coming in on Friday (D.V.); we'll see what happens when I clone everything over to that drive!
  Message 27 of 29  
28 Mar 17 21:50
Alex Grig
xxxxxx@broadcom.com
Join Date: 14 Apr 2008
Posts To This List: 3200
Which driver failure would cause all disk access to cease without system panic?

If your system supports S3, see if the issue appears more often after S3-S0 cycle.
  Message 28 of 29  
30 Mar 17 14:40
Mahmoud Al-Qudsi
xxxxxx@neosmart.net
Join Date: 09 Feb 2009
Posts To This List: 93
Which driver failure would cause all disk access to cease without system panic?

No, the most common freeze case is after a full restart when my autorun entries load up immediately after login and I re-open a Chrome session that tries to load several dozen tabs at once.
  Message 29 of 29  
30 Mar 17 21:40
Alex Grig
xxxxxx@broadcom.com
Join Date: 14 Apr 2008
Posts To This List: 3200
Which driver failure would cause all disk access to cease without system panic?

Looks like multiple commands issued to it at the same time F it up.
Posting Rules  
You may not post new threads
You may not post replies
You may not post attachments
You must login to OSR Online AND be a member of the ntdev list to be able to post.

All times are GMT -5. The time now is 03:52.


Copyright ©2015, OSR Open Systems Resources, Inc.
Based on vBulletin Copyright ©2000 - 2005, Jelsoft Enterprises Ltd.
Modified under license