Jump-start your project by learning from devs who
write Windows drivers and file systems every day.
Take an OSR seminar!

OSR is Hiring! Click here to find out more.

Windows Internals & Software Drivers Lab, Santa Clara, CA 5-9 August, 2013
Kernel Debugging & Crash Analysis for Windows Lab, Santa Clara, CA 9-13 September, 2013
Upcoming OSR Seminars:
Writing WDF Drivers for Windows Lab, Boston, MA 7-11 October, 2013
Developing File Systems for Windows, Seattle, WA 5-8 November, 2013


Go Back   OSR Online Lists > ntdev
Welcome, Guest
You must login to post to this list
  Message 1 of 10  
21 Jul 12 23:32
ravi gupta
xxxxxx@gmail.com
Join Date: 29 Nov 2010
Posts To This List: 27
getting bug check 0x109 on server 2012.

Hi Experts, Need some suggestions, on how to debug a server. Asking this because it is not a client machine which I can connect through 1394,etc. I am debugging one critical structure corruption ( bugcheck code 0x109) bug. While long run with my application which has a driver component I am getting the BSOD. I got kernel memory dump but it is not having relevant starck trace and it is pointing to nt module but I am sure that my drivers causing this issue. From bugcheck code it seems some structure getting corrupted and fourth argument of the bugcheck code is indicating that structure type could be IDT. I tired to look into my driver code but did not find something relevant. Help needed in - How should we debug servers? Is there any tools? - What are general reasons of these kind of bugchecks? - Why kernel dump is not showing stack trace? - Will driver verifier help here to detect structure corruption? I know it helps in buffer overrun. thanks -ravi --
  Message 2 of 10  
22 Jul 12 17:07
Joseph M. Newcomer
xxxxxx@flounder.com
Join Date: 20 Nov 2008
Posts To This List: 1886
Re: getting bug check 0x109 on server 2012.

I can't answer the specific questions, but whenever you get a corruption message, the most probable cause is storing through an incorrect pointer. When declaring a local variable for any kind of pointer, you should always initialize it to NULL, otherwise you may have a perfectly addressible pointer to an invalid address, such as free storage in the heap or a perfectly valid allocated object that had been used earlier. The bad news is that when you corrupt some innocent bystander, you can corrupt something that all users of that object know to be perfectly valid (an index or a pointer, just to name a couple), and therefore blindly use it without checking it. This can in turn damage another structure...anyway, the worst one I dealt with was (thankfully) in application space and was absolutely reproducible, and was SEVENTH-level damage before it was detected (via an access fault). Using the Driver Verifier and enabling special pool *might* discover if it is a problem in your driver, but it is not a sure thing. Some people may try to convince you that writing WHATEVER * w = NULL; is "inefficient". You may dismiss these people as totally clueless, and ignore them, since they are wrong. If all paths of code lead to w being set before it is used, the compiler will eliminate the unnecessary initialization: net cost, 0. If there exists any path by which w will not be set before it is used, the initialization will remain and the access fault happens in your driver, making it easy to isolate. If an unnecessary initialization is left in, it costs you, one the average, 1/2 of a CPU clock to execute it on a two-unit pipelined/cached superscalar architecture; this means each of these initializations, on 2.8GHz Pentium 4, cost you 175 PICOseconds. Big whoop! (Note that the Core series of processors can dispatch more than two instructions per clock cycle) If you are allocating objects from the heap, it is considered Best Practice to initialize them to 0. This will detect a similar type failure of using uninitialized pointer fields in heap objects. The cost is again 1/2 a CPU clock cycle per initialization, on the average (remember that, since the object is newly-allocated, parts of it are already in the L1 cache). Finally, it is ideal if you have at most one pointer to a heap object, and you set that pointer to NULL after any operation that frees it. Another good policy is to set IRP pointers to NULL immediately after enqueueing them or completing them or passing them down to a lower-level driver (there are numerous exceptions to this last case, such as when an IRP is sent synchronously). When confronted with memory damage problems, I do not first turn to Driver Verifier, because it is already being used and would have caught the problem already; instead, I read the code looking for the above problems and fix them first. It is surprising how frequently this works. Note also that problems like these in tbe Driver world are often hard to reproduce because the are the result of particular concurrency relationships in the code. So your driver almost always damages something completely harmless, but if some other component of the system thought IT owned the storage at that time, you're screwed. The same is true of other drivers; one of your structures may get damaged by some other driver, leading you to use what you think is a perfectly valid pointer to store sonething which corrupts yet another structure, and a damage cascade ensues. Also keep in mind that these detections can happen tens of billions of instructions after the actual damage is done. In human terms, think of it as someone removed a manhole cover, and later someone else walks down the street and falls in. The issue is not that someone fell in, but who removed the cover? Given the potential time lags, think of the fact that this cover was removed in the 13th century, but nobody fell into it (or even noticed it was open) until this morning. The report that it is the IDT table may be a gratuitous outcome of a damage cascade. Then again, some piece of malware on that machine might be deliberately messing with te IDT, and server 2012 is the first version that is able to detect this, and your driver isn't even involved in the problem. joe > Hi Experts, > Need some suggestions, on how to debug a server. Asking this because it > is not a client machine which I can connect through 1394,etc. > I am debugging one critical structure corruption ( bugcheck code 0x109) > bug. While long run with my application which has a driver component I am > getting the BSOD. I got kernel memory dump but it is not having relevant > starck trace and it is pointing to nt module but I am sure that my drivers > causing this issue. > From bugcheck code it seems some structure getting corrupted and fourth > argument of the bugcheck code is indicating that structure type could be <...excess quoted lines suppressed...>
  Message 3 of 10  
22 Jul 12 20:21
Doron Holan
xxxxxx@microsoft.com
Join Date: 08 Sep 2005
Posts To This List: 8270
RE: getting bug check 0x109 on server 2012.

It's not only pointer mismanagement, but also potentially state mismanagement. For instance reinitializing an event that is being waited on in another thread. Or letting an on stack event do out of scope while still in use. d debt from my phone ________________________________ From: xxxxx@flounder.com Sent: 7/22/2012 2:07 PM To: Windows System Software Devs Interest List Subject: Re: [ntdev] getting bug check 0x109 on server 2012. I can't answer the specific questions, but whenever you get a corruption message, the most probable cause is storing through an incorrect pointer. When declaring a local variable for any kind of pointer, you should always initialize it to NULL, otherwise you may have a perfectly addressible pointer to an invalid address, such as free storage in the heap or a perfectly valid allocated object that had been used earlier. The bad news is that when you corrupt some innocent bystander, you can corrupt something that all users of that object know to be perfectly valid (an index or a pointer, just to name a couple), and therefore blindly use it without checking it. This can in turn damage another structure...anyway, the worst one I dealt with was (thankfully) in application space and was absolutely reproducible, and was SEVENTH-level damage before it was detected (via an access fault). Using the Driver Verifier and enabling special pool *might* discover if it is a problem in your driver, but it is not a sure thing. Some people may try to convince you that writing WHATEVER * w = NULL; is "inefficient". You may dismiss these people as totally clueless, and ignore them, since they are wrong. If all paths of code lead to w being set before it is used, the compiler will eliminate the unnecessary initialization: net cost, 0. If there exists any path by which w will not be set before it is used, the initialization will remain and the access fault happens in your driver, making it easy to isolate. If an unnecessary initialization is left in, it costs you, one the average, 1/2 of a CPU clock to execute it on a two-unit pipelined/cached superscalar architecture; this means each of these initializations, on 2.8GHz Pentium 4, cost you 175 PICOseconds. Big whoop! (Note that the Core series of processors can dispatch more than two instructions per clock cycle) If you are allocating objects from the heap, it is considered Best Practice to initialize them to 0. This will detect a similar type failure of using uninitialized pointer fields in heap objects. The cost is again 1/2 a CPU clock cycle per initialization, on the average (remember that, since the object is newly-allocated, parts of it are already in the L1 cache). Finally, it is ideal if you have at most one pointer to a heap object, and you set that pointer to NULL after any operation that frees it. Another good policy is to set IRP pointers to NULL immediately after enqueueing them or completing them or passing them down to a lower-level driver (there are numerous exceptions to this last case, such as when an IRP is sent synchronously). When confronted with memory damage problems, I do not first turn to Driver Verifier, because it is already being used and would have caught the problem already; instead, I read the code looking for the above problems and fix them first. It is surprising how frequently this works. Note also that problems like these in tbe Driver world are often hard to reproduce because the are the result of particular concurrency relationships in the code. So your driver almost always damages something completely harmless, but if some other component of the system thought IT owned the storage at that time, you're screwed. The same is true of other drivers; one of your structures may get damaged by some other driver, leading you to use what you think is a perfectly valid pointer to store sonething which corrupts yet another structure, and a damage cascade ensues. Also keep in mind that these detections can happen tens of billions of instructions after the actual damage is done. In human terms, think of it as someone removed a manhole cover, and later someone else walks down the street and falls in. The issue is not that someone fell in, but who removed the cover? Given the potential time lags, think of the fact that this cover was removed in the 13th century, but nobody fell into it (or even noticed it was open) until this morning. The report that it is the IDT table may be a gratuitous outcome of a damage cascade. Then again, some piece of malware on that machine might be deliberately messing with te IDT, and server 2012 is the first version that is able to detect this, and your driver isn't even involved in the problem. joe > Hi Experts, > Need some suggestions, on how to debug a server. Asking this because it > is not a client machine which I can connect through 1394,etc. > I am debugging one critical structure corruption ( bugcheck code 0x109) > bug. While long run with my application which has a driver component I am > getting the BSOD. I got kernel memory dump but it is not having relevant > starck trace and it is pointing to nt module but I am sure that my drivers > causing this issue. > From bugcheck code it seems some structure getting corrupted and fourth > argument of the bugcheck code is indicating that structure type could be <...excess quoted lines suppressed...> --- NTDEV is sponsored by OSR For our schedule of WDF, WDM, debugging and other seminars visit: http://www.osr.com/seminars To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer --
  Message 4 of 10  
22 Jul 12 22:17
Joseph M. Newcomer
xxxxxx@flounder.com
Join Date: 20 Nov 2008
Posts To This List: 1886
RE: getting bug check 0x109 on server 2012.

Thanks. I had not thought of those as part of object damage, but of course it makes perfect sense (altough the local-going-out-of-scope is a special case of using a stale pointer). joe > It's not only pointer mismanagement, but also potentially state > mismanagement. For instance reinitializing an event that is being waited > on in another thread. Or letting an on stack event do out of scope while > still in use. > > d > > debt from my phone > ________________________________ > From: xxxxx@flounder.com <...excess quoted lines suppressed...>
  Message 5 of 10  
23 Jul 12 07:56
James Harper
xxxxxx@bendigoit.com.au
Join Date: 01 Dec 2008
Posts To This List: 1510
RE: getting bug check 0x109 on server 2012.

> > I can't answer the specific questions, but whenever you get a corruption > message, the most probable cause is storing through an incorrect pointer. > When declaring a local variable for any kind of pointer, you should always > initialize it to NULL, otherwise you may have a perfectly addressible pointer > to an invalid address, such as free storage in the heap or a perfectly valid > allocated object that had been used earlier. By initialising your local variable to NULL, don't you turn what should be a basic compile time error (variable used without initialisation) into a runtime error?? I would have thought that these are the sorts of bugs that would get resolved before you let your code actually run on anything... what am I missing? James
  Message 6 of 10  
23 Jul 12 08:18
Don Burn
xxxxxx@windrvr.com
Join Date: 23 Feb 2011
Posts To This List: 650
RE: getting bug check 0x109 on server 2012.

Except if you use PreFast or some of the other checking tools in the language space, they will then flag you are potentially using a NULL pointer without testing it. I disagree with Joe's "always" and use the compiler with /W4 to flag the ones I miss, then use PreFast and PC-Lint to check that I a have the tests. The "initialize them all to NULL" rule by the way means that PC-Lint will catch that you set a value to a variable and did not use or test it. Don Burn Windows Filesystem and Driver Consulting Website: http://www.windrvr.com Blog: http://msmvps.com/blogs/WinDrvr "James Harper" <xxxxx@bendigoit.com.au> wrote in message news:180117@ntdev: > > > > I can't answer the specific questions, but whenever you get a corruption > > message, the most probable cause is storing through an incorrect pointer. > > When declaring a local variable for any kind of pointer, you should always > > initialize it to NULL, otherwise you may have a perfectly addressible pointer > > to an invalid address, such as free storage in the heap or a perfectly valid > > allocated object that had been used earlier. > > By initialising your local variable to NULL, don't you turn what should be a basic compile time error (variable used without initialisation) into a runtime error?? > I would have thought that these are the sorts of bugs that would get resolved before you let your code actually run on anything... what am I missing? <...excess quoted lines suppressed...>
  Message 7 of 10  
23 Jul 12 12:34
ravi gupta
xxxxxx@gmail.com
Join Date: 29 Nov 2010
Posts To This List: 27
Re: getting bug check 0x109 on server 2012.

Thanks for all suggestions. On Mon, Jul 23, 2012 at 5:48 PM, Don Burn <xxxxx@windrvr.com> wrote: > Except if you use PreFast or some of the other checking tools in the > language space, they will then flag you are potentially using a NULL > pointer without testing it. I disagree with Joe's "always" and use the > compiler with /W4 to flag the ones I miss, then use PreFast and PC-Lint to > check that I a have the tests. The "initialize them all to NULL" rule by > the way means that PC-Lint will catch that you set a value to a variable > and did not use or test it. > > > Don Burn <...excess quoted lines suppressed...> --
  Message 8 of 10  
23 Jul 12 12:52
Scott Noone
xxxxxx@osr.com
Join Date:
Posts To This List: 872
List Moderator
Re: getting bug check 0x109 on server 2012.

So, to summarize, make sure: * Your driver passes PREfast with ALL warnings enabled * You compile with /W4 enabled * Your driver passes Driver Verifier (all options except Low Resource Simulation) * Your driver doesn't trigger any ASSERTs while running under the checked Kernel/HAL That should cover you pretty well for basic corruptions. Also, make sure you test your driver with a test system matches the customer system as closely as possible (i.e. amount of RAM, number of CPUs, etc.). And to answer your other question: >- Why kernel dump is not showing stack trace? 0x109 is the Kernel Patch Protection bugcheck. In what I presume is an effort to hinder reverse engineering or circumvention of this component, the O/S zeroes the call stack before bugchecking the system. -scott -- Scott Noone Consulting Associate and Chief System Problem Analyst OSR Open Systems Resources, Inc. http://www.osronline.com "Ravi Gupta" <xxxxx@gmail.com> wrote in message news:180081@ntdev... Hi Experts, Need some suggestions, on how to debug a server. Asking this because it is not a client machine which I can connect through 1394,etc. I am debugging one critical structure corruption ( bugcheck code 0x109) bug. While long run with my application which has a driver component I am getting the BSOD. I got kernel memory dump but it is not having relevant starck trace and it is pointing to nt module but I am sure that my drivers causing this issue. From bugcheck code it seems some structure getting corrupted and fourth argument of the bugcheck code is indicating that structure type could be IDT. I tired to look into my driver code but did not find something relevant. Help needed in - How should we debug servers? Is there any tools? - What are general reasons of these kind of bugchecks? - Why kernel dump is not showing stack trace? - Will driver verifier help here to detect structure corruption? I know it helps in buffer overrun. thanks -ravi
  Message 9 of 10  
24 Jul 12 14:23
Maxim S. Shatskih
xxxxxx@storagecraft.com
Join Date: 20 Feb 2003
Posts To This List: 8628
Re: getting bug check 0x109 on server 2012.

One of the most disastrous things in Windows kernel is to free the structure which is still on the LIST_ENTRY list. No Verifier or PREFast will warn about this. Check for such a scenario. -- Maxim S. Shatskih Windows DDK MVP xxxxx@storagecraft.com http://www.storagecraft.com "Ravi Gupta" <xxxxx@gmail.com> wrote in message news:180081@ntdev... Hi Experts, Need some suggestions, on how to debug a server. Asking this because it is not a client machine which I can connect through 1394,etc. I am debugging one critical structure corruption ( bugcheck code 0x109) bug. While long run with my application which has a driver component I am getting the BSOD. I got kernel memory dump but it is not having relevant starck trace and it is pointing to nt module but I am sure that my drivers causing this issue. From bugcheck code it seems some structure getting corrupted and fourth argument of the bugcheck code is indicating that structure type could be IDT. I tired to look into my driver code but did not find something relevant. Help needed in - How should we debug servers? Is there any tools? - What are general reasons of these kind of bugchecks? - Why kernel dump is not showing stack trace? - Will driver verifier help here to detect structure corruption? I know it helps in buffer overrun. thanks -ravi
  Message 10 of 10  
24 Jul 12 16:40
Joseph M. Newcomer
xxxxxx@flounder.com
Join Date: 20 Nov 2008
Posts To This List: 1886
Re:getting bug check 0x109 on server 2012.

This goes back to my comment about having only one "live" pointer at a time. If you work in a model like this, you can't delete an object in a list, because either the list has the only live pointer or you have removed the element from the list and therefore your code that is processing that item is the only live pointer. Another failure pattern along the same line is having a pointer, calling a function through a set of calls which ends up deleting the object, then returning back to the original call site. In the old driver model this was exemplified by the sequence IoStartPacket(irp...); // i have not memorized all the parameters IoMarkIrpPending(irp); I had to take some time to demonstrate to the student that the IRP could be completed before the RET instruction in IoStartPacket! (Hint to new driver writers: work out for yourselves why this can happen! Once you understand this you will have reached a new plateau in Enlightenment.) joe > One of the most disastrous things in Windows kernel is to free the > structure which is still on the LIST_ENTRY list. > > No Verifier or PREFast will warn about this. > > Check for such a scenario. > > -- > Maxim S. Shatskih > Windows DDK MVP <...excess quoted lines suppressed...>
Posting Rules  
You may not post new threads
You may not post replies
You may not post attachments
You must login to OSR Online AND be a member of the ntdev list to be able to post.

All times are GMT -5. The time now is 22:39.


Copyright ©2012, OSR Open Systems Resources, Inc.
Based on vBulletin Copyright ©2000 - 2005, Jelsoft Enterprises Ltd.
Modified under license