Unload routine is called, but the driver is not unloaded?

Hello gurus,

hopefully someone can point me in the right direction:

I have a legacy driver, that is loaded/unloaded by a user-mode
application via SCM.

When loaded, the driver creates a Control Device Object to communicate
with the user mode app, as well as a couple of other device objects that
perform the real work. The work is not related to any actual device,
it’s just reading/writing to the disk (think virtual file-based drive,
stuff like that). When the work is done, the application sends a control
code to CDO asking it to shut down, the CDO deletes the child device
objects. Before they are deleted, the values of the ReferenceCounts in
their device objects go to 0, indicating that there are no outstanding
references to them.

Finally, the user mode application tells SCM to stop the driver, which
causes the Unload routine to be called, which destroys the CDO, the
DeviceObject member of DriverObject becomes 0, indicating that there are
no device object remaining, and it quits.

At this point it looks like the driver is unloaded, except that it’s
not. Breaking into WinDbg and issuing “lm t n” shows that the driver’s
module is still loaded, and an attempt to start the driver via SCM this
time results in error 2 “file not found”. The only way to solve the
error is to reboot the computer, and after that SCM again can start the
driver, but only once, the subsequent attempts result in error 2 until
the computer is restarted.

As I mentioned, the state of the DeviceObject’s and DriverObject before
unloading shows no outstanding resources that might prevent the driver
from actually being unloaded.

The tracing in the memory allocation routines shows that all memory the
driver allocates gets deleted properly, so there goes another possible
reason.

The Irp tracking shows no outstanding Irp’s either.

Driver Verifier is enabled for this driver and reports no problems.

It seems like this is happening on Vista (with SP1) only, I was not able
to reproduce this error on XP (the driver gets actually unloaded from
the memory).

Does this ring any bell? What else can I check to see if there is
something that forces Vista to keep the driver in memory after the
Unload routine is called?

Any advice would be greatly appreciated! Thank you in advance.

Andrei Belogortseff

The explanation is trivial - neither unloading driver or deleting the device actually takes place as long as outstanding refcount on device object (and hence, its driver object) is non-zero. Your device object is just marked for deletion, but it will be removed from memory only after refcount goes down to zero…

Anton Bassov

xxxxx@hotmail.com wrote:

The explanation is trivial - neither unloading driver or deleting the device actually takes place as long as outstanding refcount on device object (and hence, its driver object) is non-zero. Your device object is just marked for deletion, but it will be removed from memory only after refcount goes down to zero…

Right, but, as I mentioned, when I examine the values of
DeviceObject->ReferenceCount before deleting them, they are already 0s.
Am I looking at a wrong ReferenceCount?

Besides, it all gets deleted and unloaded properly on XP, only Vista for
some reason does not want to do that.

Thanks,

A.

!object
And
!object

Will give you the Ob reference counts (and handle counts)

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Andrei Belogortseff
Sent: Wednesday, October 29, 2008 5:08 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Unload routine is called, but the driver is not unloaded?

xxxxx@hotmail.com wrote:
> The explanation is trivial - neither unloading driver or deleting the device actually takes place as long as outstanding refcount on device object (and hence, its driver object) is non-zero. Your device object is just marked for deletion, but it will be removed from memory only after refcount goes down to zero…

Right, but, as I mentioned, when I examine the values of
DeviceObject->ReferenceCount before deleting them, they are already 0s.
Am I looking at a wrong ReferenceCount?

Besides, it all gets deleted and unloaded properly on XP, only Vista for
some reason does not want to do that.

Thanks,

A.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

> when I examine the values of DeviceObject->ReferenceCount before deleting them,

they are already 0s. Am I looking at a wrong ReferenceCount?

Yes - IIRC, this field refers to the number of file objects that are open on a device , i.e. is related to IO Manager. However, reference count by pointer and handle is managed by the Object Manager, and its header is. located BEFORE the object body…

Anton Bassov

Doron Holan wrote:

!object

Thank you, I’ve tried that to see what’s going on with my objects and it
looks like there is an extra pointer to the device object remaining
before it gets deleted. Is there a way to see what’s keeping that extra
pointer? (I’d expect Driver Verifier to catch situations like this.)
Some magic WinDbg command maybe?

Thanks!

A.

Andrei Belogortseff wrote:

Some magic WinDbg command maybe?

Answering my own question: looks like !obtrace should do that, going to
try it.

A.

!obtrace requires a reboot, but since you have a consistent repro, that should work. Are you calling IoGetAttachedDeviceReference or ObReferenceObject anywhere in your code? Basically, is there any possibility you are leaking the ref yourself? Another way to track this is to set a break on write (ba w4 ) on the pointer count. To get the right addr you will have to dump memory before the device object pointer and find the right dword (one easy way to validate that you found the right offset is to change the value to something strange (say 0x1234)) and then run !object again to see if it picks up your new value. I typically do this when tracking the pointer value

Ba w4 “k;g”

Which dumps the stack and then continues execution automatically

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Andrei Belogortseff
Sent: Thursday, October 30, 2008 10:14 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Unload routine is called, but the driver is not unloaded?

Andrei Belogortseff wrote:
> Some magic WinDbg command maybe?

Answering my own question: looks like !obtrace should do that, going to
try it.

A.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

On 10/30/08, Doron Holan wrote:
> !obtrace requires a reboot, but since you have a consistent repro,

undocumented ?

the docs state we can set a kernel flag

either with /ko in cmdline or
by !gflag +otl

lkd> !gflag
Current NtGlobalFlag contents: 0x00004000
otl - Maintain a list of objects for each type

lkd> !object 0 Driver
Scanning 108 objects of type ‘Driver’
*** objects of the same type are only linked together if the 4000 flag
is set in NtGlobalFlags

lkd> !object \Windows\WindowStations\Winsta0
Object: 862d0fa0 Type: (865c2ca0) WindowStation
ObjectHeader: 862d0f88 (old version)
HandleCount: 23 PointerCount: 39
Directory Object: e17b6a48 Name: WinSta0
lkd> !obtrace e17b6a48
GetPointerFromAddress: unable to read from 000003e0
Unable to find object in table.
lkd> !obtrace 862d0fa0
GetPointerFromAddress: unable to read from 000001b8
Unable to find object in table.
lkd> !obtrace 862d0f88
GetPointerFromAddress: unable to read from 000001b0
Unable to find object in table.
lkd> !gflag
Current NtGlobalFlag contents: 0x00004000
otl - Maintain a list of objects for each type

I am not that familiar with !obtrace. it used to require a reboot, but perhaps it has been improved since then. I would think that whatever flag you are setting would only affect newly created objects though

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of raj_r
Sent: Thursday, October 30, 2008 11:03 AM
To: Windows System Software Devs Interest List
Subject: Re: Re:[ntdev] Unload routine is called, but the driver is not unloaded?

On 10/30/08, Doron Holan wrote:
> !obtrace requires a reboot, but since you have a consistent repro,

undocumented ?

the docs state we can set a kernel flag

either with /ko in cmdline or
by !gflag +otl

lkd> !gflag
Current NtGlobalFlag contents: 0x00004000
otl - Maintain a list of objects for each type

lkd> !object 0 Driver
Scanning 108 objects of type ‘Driver’
*** objects of the same type are only linked together if the 4000 flag
is set in NtGlobalFlags

lkd> !object \Windows\WindowStations\Winsta0
Object: 862d0fa0 Type: (865c2ca0) WindowStation
ObjectHeader: 862d0f88 (old version)
HandleCount: 23 PointerCount: 39
Directory Object: e17b6a48 Name: WinSta0
lkd> !obtrace e17b6a48
GetPointerFromAddress: unable to read from 000003e0
Unable to find object in table.
lkd> !obtrace 862d0fa0
GetPointerFromAddress: unable to read from 000001b8
Unable to find object in table.
lkd> !obtrace 862d0f88
GetPointerFromAddress: unable to read from 000001b0
Unable to find object in table.
lkd> !gflag
Current NtGlobalFlag contents: 0x00004000
otl - Maintain a list of objects for each type


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

On 10/30/08, Doron Holan wrote:
> I am not that familiar with !obtrace. it used to require a reboot, but perhaps it has been improved since then. I would think that whatever flag you are setting would only affect newly created objects though
>
> d

Thanks Doron

i was just asking because this command never worked for me
either with boot or without boot

if i used cmdline and do as stated in docs
it used to spit out

C:\Program Files\Debugging Tools for Windows (x86)>gflags.exe /ro
GFLAGS: Object Reference Tracing is not enabled for this version of the OS

C:\Program Files\Debugging Tools for Windows (x86)>

the gui version never had the enabled checkbox ungrayed and tickable

os is xp-sp2

regards

raj_r

You should try vista, I know a lot of obtrace worked occurred between xp sp2 and vista, perhaps gflags is dependent on vista for this type of functionality

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of raj_r
Sent: Thursday, October 30, 2008 11:47 AM
To: Windows System Software Devs Interest List
Subject: Re: Re:[ntdev] Unload routine is called, but the driver is not unloaded?

On 10/30/08, Doron Holan wrote:
> I am not that familiar with !obtrace. it used to require a reboot, but perhaps it has been improved since then. I would think that whatever flag you are setting would only affect newly created objects though
>
> d

Thanks Doron

i was just asking because this command never worked for me
either with boot or without boot

if i used cmdline and do as stated in docs
it used to spit out

C:\Program Files\Debugging Tools for Windows (x86)>gflags.exe /ro
GFLAGS: Object Reference Tracing is not enabled for this version of the OS

C:\Program Files\Debugging Tools for Windows (x86)>

the gui version never had the enabled checkbox ungrayed and tickable

os is xp-sp2

regards

raj_r


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Doron Holan wrote:

!obtrace requires a reboot,

No, one can set the “kernel” flag (rather than the “registry” one) and
it would work immediately, without a reboot.

Anyway, it took me awhile to find out that in order to trace references
to the device objects I had to specify “Devi” as the tag for the
tracing, and after that I was able to obtain the complete trace for my
device object from the beginning to the end. The problem is, the trace
contains hundreds of entries, from several different parts of the kernel
(mount manager, CLFS, filter manager, etc.) all mixed up together and
after spending an hour trying to make sense of it I gave up.

Then I decided to try something: right before deleting the device
object, I added one more ObDereferenceObject call on it, to artificially
decrease the number of outstanding pointers. This caused the driver to
be fully unloaded from the memory. My thinking was, if something keeps a
pointer to the device object, then sooner or later it’s going to use it,
and that is going to crash the system, and hopefully that would give me
enough information to see what is keeping the extra pointer. I let it
run for some time, doing all sorts of things, but the crash did not
happen. Sigh…

Are you calling IoGetAttachedDeviceReference or ObReferenceObject anywhere in your code?

Yes, but it all gets released properly. My confidence comes from the
fact that if it was me forgetting to dereference something, I would see
this error under XP, too. But I’ve just double-checked it by trying to
load/unload my driver under XP, and it got loaded/unloaded properly,
several times. The code of the driver has no Vista-specific parts. So my
best guess at this point is that it’s something Vista-specific that
forgets to dereference my object.

Thanks for your help!

A.

The problem with your solution to artificially decrement the count before delete is that the memory can be reused and when the real deref (if ever) occurs, it will corrupt memory somewhere else. What I would suggest you do is also set a break on write on the address which contains the pointer ref count after you artificially decrement it. That way you will see who is touching the memory even if they are not causing the crash

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Andrei Belogortseff
Sent: Thursday, October 30, 2008 1:50 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Unload routine is called, but the driver is not unloaded?

Doron Holan wrote:

!obtrace requires a reboot,

No, one can set the “kernel” flag (rather than the “registry” one) and
it would work immediately, without a reboot.

Anyway, it took me awhile to find out that in order to trace references
to the device objects I had to specify “Devi” as the tag for the
tracing, and after that I was able to obtain the complete trace for my
device object from the beginning to the end. The problem is, the trace
contains hundreds of entries, from several different parts of the kernel
(mount manager, CLFS, filter manager, etc.) all mixed up together and
after spending an hour trying to make sense of it I gave up.

Then I decided to try something: right before deleting the device
object, I added one more ObDereferenceObject call on it, to artificially
decrease the number of outstanding pointers. This caused the driver to
be fully unloaded from the memory. My thinking was, if something keeps a
pointer to the device object, then sooner or later it’s going to use it,
and that is going to crash the system, and hopefully that would give me
enough information to see what is keeping the extra pointer. I let it
run for some time, doing all sorts of things, but the crash did not
happen. Sigh…

Are you calling IoGetAttachedDeviceReference or ObReferenceObject anywhere in your code?

Yes, but it all gets released properly. My confidence comes from the
fact that if it was me forgetting to dereference something, I would see
this error under XP, too. But I’ve just double-checked it by trying to
load/unload my driver under XP, and it got loaded/unloaded properly,
several times. The code of the driver has no Vista-specific parts. So my
best guess at this point is that it’s something Vista-specific that
forgets to dereference my object.

Thanks for your help!

A.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Doron Holan wrote:

The problem with your solution to artificially decrement the count before delete is that the memory can be reused and when the real deref (if ever) occurs, it will corrupt memory somewhere else. What I would suggest you do is also set a break on write on the address which contains the pointer ref count after you artificially decrement it. That way you will see who is touching the memory even if they are not causing the crash

Good idea, I’ll try that, thanks!

A.