OSRLogo
OSRLogoOSRLogoOSRLogo x Seminar Ad
OSRLogo
x

Everything Windows Driver Development

x
x
x
GoToHomePage xLoginx
 
 

    Thu, 14 Mar 2019     118020 members

   Login
   Join


 
 
Contents
  Online Dump Analyzer
OSR Dev Blog
The NT Insider
The Basics
File Systems
Downloads
ListServer / Forum
  Express Links
  · The NT Insider Digital Edition - May-June 2016 Now Available!
  · Windows 8.1 Update: VS Express Now Supported
  · HCK Client install on Windows N versions
  · There's a WDFSTRING?
  · When CAN You Call WdfIoQueueP...ously

Cache Me if You Can: Using the NT Cache Manager

[NOTE:  This is an updated version of the original article.]

Cache Manager Runtime

In this article, we provide a basic description of the runtime routines used by the cache manager. Additionally, there are samples for using several of the routines as well as cross-references to code that can be found within the Microsoft IFS Kit.

Cache Manager Overview

The Cache Manager is a software-only component that is tightly integrated with the Windows NT Memory Manager to integrate the caching of file system data with the Virtual Memory System. Some operating systems implement their file systems so they have a distinct data cache. However, because such caches must be managed from physical memory they are limited in size ¡V and memory used for such a cache is not available for use elsewhere in the system.

Thus one key advantage of using the Windows NT Cache Manager is that it allows for a balancing of the use of physical memory between file caching and programs running on the system. When an application is I/O intensive, the balance can be "tipped" towards caching data. When an application is consuming memory the amount of memory used for caching data can be reduced to practically zero. Thus, the net result is that the system makes better use of physical memory and ultimately provides better performance.

The other key reason for the file system to use the Cache Manager is that a file can be accessed either via the standard file system interface, such as read and write, or it can be accessed via the Memory Manager ¡V a "memory mapped" file. When both access methods are being used on the same file the Cache Manager provides a mechanism for bridging between the two to ensure consistency of the data.

Cache Manager Data Structures

The interface between the file system and the Cache Manager relies upon a procedural interface. Essentially all of the data structures within the Cache Manager are associated with a file, but the actual internal structure of those data structures is transparent to the file system. In this section we describe those key data structures which are shared between the file system and the Cache Manager.

Buffer Control Bloxk (BCB)

A buffer control block is used internally by the Cache Manager to track when a portion of a file is mapped into the system address space. This is exposed to the file system because sometimes it is necessary for the file system to pin the data in memory while it is performing some critical operation.

Most of the buffer control block (BCB) is opaque. The first portion of the BCB is exposed to file systems:

typedef struct _PUBLIC_BCB {
    CSHORT NodeTypeCode;
    CSHORT NodeByteSize;
    ULONG MappedLength;
    LARGE_INTEGER MappedFileOffset;
} PUBLIC_BCB, *PPUBLIC_BCB;

The first two fields of the buffer control block are standard for Windows NT data structures ¡V they uniquely identify both the type and size of the data structure itself. The last two fields are of interest to the file system as they identify the range of the file managed by this particular buffer control block.

File Size Information

The file system and Memory Manager each maintain information about the size of the file. Whenever the file system establishes mapping for a file it indicates the current size of the file. Any subsequent changes to the size of the file are similarly indicated to the Cache Manager.

There are three values used by the Cache Manager to indicate the current size of the file:

typedef struct _CC_FILE_SIZES {
    LARGE_INTEGER AllocationSize;
    LARGE_INTEGER FileSize;

    LARGE_INTEGER ValidDataLength;
} CC_FILE_SIZES, *PCC_FILE_SIZES;

The names of these fields can be confusing. For instance the AllocationSize field is use, not to identify the actual physical space allocated for the file, but rather the amount of data which can fit in the presently allocated space. For some file systems, this turns out to be the same value. However, for a file system which supports compression or expansion of the actual data, this value represents the amount of data which could fit.

The AllocationSize of the file is used by the Memory Manager to represent the size of the "section object." Since a section object is then used to determine how a file is mapped into memory it is essential that the AllocationSizealways be at least as large as the file. The Cache Manager and Memory Manager do not detect the case when the file system sets the AllocationSize to be smaller than the file size ¡V instead the system crashes due to the inconsistency in the data structures.

The FileSize of the file represents the last valid byte of data in the file ¡V logically it is the "End of File" marker.

The ValidDataLength of the file represents the last valid byte of data in memory. Thus, a file can be extended in memory prior to the data being written to disk.

Note that the CC_FILE_SIZES structure has precisely the same layout of the size fields as the FSRTL_COMMON_FCB_HEADER structure has. Typically, a file system does not maintain a separate CC_FILE_SIZES data structure but instead passes the address of the overlapping fields.

Cache Manager Callbacks

Interactions between the file system and the Cache Manager are manipulated via a series of callback functions. These callback functions are registered on a per file basis with the Cache Manager and are then used by the Cache Manager in order to ensure that the data structures are "locked" prior to performing a file system operation.

Windows NT assumes there is a strict ordering in how resources are acquired between the file system, Cache Manager, and Memory Manager. If followed, this ordering will ensure that deadlocks do not occur. Of course, if it is not followed, deadlocks can (and will) occur. Specifically, file system resources are acquired first. Then Cache Manager resources are acquired. Finally, Memory Manager resources are acquired.

Thus, these callbacks are used by the Cache Manager to honor this hierarchy. The callbacks required by the Cache Manager are:

typedef BOOLEAN (*PACQUIRE_FOR_LAZY_WRITE) (
        IN PVOID Context,
        IN BOOLEAN Wait
        );

typedef VOID (*PRELEASE_FROM_LAZY_WRITE) (
        IN PVOID Context
        );

typedef BOOLEAN (*PACQUIRE_FOR_READ_AHEAD) (
        IN PVOID Context,
        IN BOOLEAN Wait
        );

typedef VOID (*PRELEASE_FROM_READ_AHEAD) (
        IN PVOID Context
        );

typedef struct _CACHE_MANAGER_CALLBACKS {
        PACQUIRE_FOR_LAZY_WRITE AcquireForLazyWrite;
        PRELEASE_FROM_LAZY_WRITE ReleaseFromLazyWrite;
        PACQUIRE_FOR_READ_AHEAD AcquireForReadAhead;
        PRELEASE_FROM_READ_AHEAD ReleaseFromReadAhead;
} CACHE_MANAGER_CALLBACKS, *PCACHE_MANAGER_CALLBACKS;

Note that the callbacks are used for two distinct parts of the Cache Manager. The first, the lazy writer, is responsible for writing dirty cached data back to the file system. The second is for read ahead handling ¡V reading data prior to an actual call from the user to obtain that information.

First, in designing these it is important to note what you are protecting your file system against (and what you aren¡¦t.) There is no reason to serialize cached I/O operations from applications with I/O from the Cache Manager¡¦s lazy writer. However, you do need to protect against non-cached user I/O operations and user operations that modify the size of the file.

The NT file systems do this by using two ERESOURCE structures. Both of these can also be used (and located) by other components within the operating system by walking through the common header ¡V specifically the Resource and PagingIoResource fields within the common header. The Cache Manager does not directly acquire these resources ¡V instead it calls into the file system to acquire any necessary resources (typically these resources.)

Note that these routines must be provided by your file system ¡V they are not optional and the system will crash if you fail to provide them.

The following code is a sample implementation from an older version of the OSR FSDK on the implementation of a callback management routine:

static BOOLEAN OwAcquireForLazyWrite(PVOID Context, BOOLEAN Wait)
{
POW_FCB fcb = (POW_FCB) Context;
BOOLEAN result;

// Take out the lock on the file.

result = OwAcquireResourceExclusiveExp(&fcb->Resource, Wait);
if (!result) {

// We did not acquire the resource.

return (result);
}

// We did acquire the resource. We need to:
// (1) Store away the thread id of this thread (for the release)
// (2) Set top level irp to a pseudo value
// In both cases, the previous value should be zero.

OwAssert(!fcb->ResourceThread);
fcb->ResourceThread = OwGetCurrentResourceThread();
return (TRUE);
}

Each of the file systems in the Microsoft IFS kit also contains examples of routines like these. For the Lazy Writer these routines are located in the following locations:

File System

File

Routine

FAT

resrcsup.c

FatAcquireFcbForLazyWrite

CDFS

resrcsup.c

CdAcquireForCache

RDR2

rxce\resrcsup.c

RxAcquireFcbForLazyWrite

Other similar routines (for read ahead for example) can be located in the same file.

CcCanlWrite

Because an application program can modify data in memory at a rate that exceeds the ability to write the data to disk, the Virtual Memory system can "fill up" with data. This in turn can then cause fatal out-of-memory conditions to occur within the VM system. To avoid this, the file system must cooperate with the VM system to detect these conditions. One of the key operations provided by the Cache Manager for this support is CcCanIWrite. The prototype for this call is:

NTKERNELAPI BOOLEAN CcCanIWrite (
    IN PFILE_OBJECT FileObject,
    IN ULONG
BytesToWrite,
   
IN
BOOLEAN Wait,
    IN BOOLEAN Retrying
    );

If this call returns FALSE then the FSD needs to delay actually writing dirty data into the cache in order to avoid an out-of-memory condition. The typical symptom of such out of memory conditions is a stop code of NO_PAGES_AVAILABLE.

The FSD must handle posting and subsequent retrying of the write operation. The FSD can post the write either via an internal posting mechanism or by using the routine CcDeferWrite.

The routine FsRtlCopyWrite can be used by your FSD instead of accessing the cache directly. In this case, deferring the I/O operation is handled internally within this function.

CcCopyRead

Once a file system has established caching (via the CcInitializeCacheMap call,) it uses either the FsRtl routines (such as FsRtlCopyRead,) or this routine. Typically, FsRtlCopyRead is used to implement the fast I/O path for read and this routine is used to implement IRP_MJ_READ. The prototype for this call is:

NTKERNELAPI BOOLEAN CcCopyRead (
    IN PFILE_OBJECT FileObject,
    IN PLARGE_INTEGER FileOffset,
    IN ULONG Length,
    IN BOOLEAN Wait,
    OUT PVOID Buffer,
    OUT PIO_STATUS_BLOCK IoStatus
    );

The FileObject contains a pointer to the SectionObjectPointer that are to be used by the Cache Manager when copying data from the cache into the user buffer (the Buffer argument provided here.) Thus, there is an assumption here that caching has been previously initialized.

The Length indicates the length of the read operation. The Buffer is assumed to be large enough to contain the amount of data being copied from the cache.

The Wait parameter indicates if the caller is willing to block for an indeterminate period of time, such as might be required if a lock must be acquired. This parameter should be viewed as a "hint" however, rather than a guarantee. For example, if disk I/O is necessary to complete this operation the operation might proceed, even if Wait is FALSE.

The Buffer refers to the caller-provided buffer. It need not be valid and in such a case this routine will raise an exception. An FSD should trap that exception and return an error to the user application.

The IoStatus block will be set to indicate the completion status of the operation and the total number of bytes read.

Note that the Cache Manager may be required to page-fault the data into the cache. In that case an FSD will be re-entered in order to process the actual paging I/O operation.

CcCopyWrite

Once a file system has established caching (via the CcInitializeCacheMap call,) it uses either the FsRtl routines (such as FsRtlCopyWrite,) or this routine. Typically, FsRtlCopyRead is used to implement the fast I/O path for read and this routine is used to implement IRP_MJ_READ. The prototype for this call is:

NTKERNELAPI BOOLEAN CcCopyWrite (
    IN PFILE_OBJECT FileObject,
    IN PLARGE_INTEGER FileOffset,
    IN ULONG Length,
    IN BOOLEAN Wait,
    IN PVOID Buffer
    );

The FileObject contains a pointer to the SectionObjectPointer that are to be used by the Cache Manager when copying data from the cache into the user buffer (the Buffer argument provided here.) Thus, there is an assumption here that caching has been previously initialized.

The Length indicates the length of the write operation. The Buffer is assumed to be large enough to contain the amount of data being copied to the cache.

The Wait parameter indicates if the caller is willing to block for an indeterminate period of time, such as might be required if a lock must be acquired. This parameter should be viewed as a "hint" however, rather than a guarantee. For example, if disk I/O is necessary to complete this operation the operation might proceed, even if Wait is FALSE.

The Buffer refers to the caller-provided buffer. It need not be valid and in such a case this routine will raise an exception. An FSD should trap that exception and return an error to the user application.

Note that this operation may require that a portion of a cached page be written. If that is the case, then the contents of that page will be read from disk first and then the portion of the page that is being written will be modified. Thus, it is possible for this call to cause re-entry into an FSD to process read page faults against the file being modified.

Because data can be written to the VM system at a rate considerably faster than the rate at which it can be written, an FSD must implement a write-throttling mechanism, typically by using CcCanIWrite in combination with CcDeferWrite. A failure to implement write throttling will cause the system to crash with a stop code of NO_PAGES_AVAILABLE.

CcDeferWrite

In order to simplify the process of implementing write throttling within your file system, the Cache Manager provides a simple mechanism for queuing write operations until the VM system can accommodate them. This is done via a Deferred Write callback which your FSD registers with the Cache Manager when the call CcCanIWrite returns FALSE.

The prototype for this callback function is:

Typedef VOID (*PCC_POST_DEFERRED_WRITE) (
    IN PVOID Context1,
    IN PVOID Context2
    );

The context pointers are typically specified by your file system as part of establishing the deferred write processing. The prototype for CcDeferWrite is:

NTKERNELAPI VOID CcDeferWrite (
    IN PFILE_OBJECT FileObject,
    IN PCC_POST_DEFERRED_WRITE PostRoutine,
    IN PVOID Context1,
    IN PVOID Context2,
    IN ULONG
BytesToWrite,
   
IN
BOOLEAN Retrying
    );

The FileObject indicates the file to which the caller is attempting to write.

The PostRoutine is the FSD-provided callback function which will be called by the Cache Manager when the VM state has changed so that additional writes can be allowed.

The Context1 and Context2 pointers are FSD-defined and will be passed to the FSD-provided callback function once writing to the file is allowed.

The BytesToWrite argument indicates the number of bytes that are to be written to the file by this operation. The VM system uses this information to determine if it has become "safe" to write (based upon the number of available pages.)

The Retrying argument indicates whether or not this is the first attempt (Retrying is FALSE) or a subsequent attempt (Retrying is TRUE.)

CcGetDirtyPages

This routine is listed here for completeness. It is used within file systems that take advantage of the internal logging mechanism within Windows NT. It is not generally useful for file systems. The prototype for this function is:

NTKERNELAPI LARGE_INTEGER CcGetDirtyPages (
    IN PVOID LogHandle,
    IN PDIRTY_PAGE_ROUTINE DirtyPageRoutine,
    IN PVOID Context1,
    IN PVOID Context2
    );

CcGetFileObjectFromBcb

An individual Buffer Control Block includes within it a pointer to the file object that is being used by the VM system to track the file cache information. The file object can thus be extracted from a given BCB, should that be necessary. The prototype for this routine is:

NTKERNELAPI PFILE_OBJECT CcGetFileObjectFromBcb (
    IN PVOID Bcb
    );

CcGetFileObjectFromSectionPtrs

When caching is first established for the file, the Cache Manager uses the FileObject argument to CcInitializeCacheMap to create the new section object that is used for caching the file data. So long as cached data is maintained by the Cache Manager for that file, that original FileObject is used by the VM System for all the various necessary I/O operations.

Given a SectionObjectPointer structure from an arbitrary FileObject, this routine can thus tell the file system about the actual file object that is used by the VM system for the various necessary I/O operations. The prototype for this call is:

NTKERNELAPI PFILE_OBJECT CcGetFileObjectFromSectionPtrs (
    IN PSECTION_OBJECT_POINTERS SectionObjectPointer
    );

An interesting side-effect of this implementation model (where the SectionObjectPointer field refers to a particular section object that in turn refers to a particular file object) is that a given FileObject may remain valid for a considerable period of time - far beyond the point when the file has been closed by the application program.

CcGetLsnForFileObject

This routine is listed here for completeness. It is used within file systems that take advantage of the internal logging mechanism within Windows NT. It is not generally useful for file systems. The prototype for this function is:

NTKERNELAPI LARGE_INTEGER CcGetLsnForFileObject(
    IN PFILE_OBJECT FileObject,
    OUT PLARGE_INTEGER OldestLsn OPTIONAL
    );

CcFastCopyRead

This routine can be used by file systems that do not support file offsets larger than 4GB. It is a "replacement" call for CcCopyRead and can be used in an essentially identical fashion. The prototype for this function is:

NTKERNELAPI VOID CcFastCopyRead (
    IN PFILE_OBJECT FileObject,
    IN ULONG FileOffset,
    IN ULONG Length,
    IN ULONG PageCount,

    OUT PVOID Buffer,
    OUT PIO_STATUS_BLOCK IoStatus
    );

The FileObject indicates the file being read.

The Length indicates the number of bytes to be copied to the caller-supplied Buffer.

The PageCount indicates the number of physical pages that are spanned by the caller-supplied Buffer.

The IoStatus contains the completion status of the read operation as well as the number of bytes read.

CcFastCopyWrite

This routine can be used by file systems that do not support file offsets larger than 4GB. It is a "replacement" call for CcCopyWrite and is used in essentially identical fashion. The prototype for this function is:

NTKERNELAPI VOID CcFastCopyWrite (
    IN PFILE_OBJECT FileObject,
    IN ULONG FileOffset,
    IN ULONG Length,
    IN PVOID Buffer
    );

The FileObject indicates the file being read.

The Length indicates the number of bytes to be copied to the caller-supplied Buffer.

The PageCount indicates the number of physical pages that are spanned by the caller-supplied Buffer.

The IoStatus contains the completion status of the read operation as well as the number of bytes read.

As with CcCopyWrite, a file system using this routine must also implement write throttling using CcCanIWrite.

CcFlushCache

This routine is used by an FSD to ensure that any dirty data presently being cached for the given file is written back to disk. The prototype for this call is:

NTKERNELAPI VOID CcFlushCache (
    IN PSECTION_OBJECT_POINTERS SectionObjectPointer,
    IN PLARGE_INTEGER FileOffset OPTIONAL,
    IN ULONG Length,
    OUT PIO_STATUS_BLOCK IoStatus OPTIONAL
    );

This routine is used by an FSD to ensure that all dirty data is committed to disk.

If the FileOffset parameter is null, the whole file is flushed.

If the FileOffset parameter is set, the portion of the file from that offset and for Length bytes is flushed.

Note that this call can cause I/O operations and hence reenter the FSD. This call is typically used by an FSD as part of its implementation of IRP_MJ_FLUSH_BUFFERS.

CcInitializeCacheMap

A cache map of a file is maintained by the Cache Manager to track the activities being performed for the file. The first open instance of a file causes the generation of a public cache map (information shared between the various open instances of the file.) In addition, each open instance of the file also has a private cache map which tracks information specific to operations that are ongoing for that particular file object.

Normally, the creation of the cache maps is deferred until the first I/O. This ensures that the underlying file system does not create and delete the cache maps for operations which entail no I/O operations, as such operations are quite common for Win32 applications.

Once I/O is being performed on the given file, however, the underlying file system establishes the cache map for the file in question. This is done via the CcInitializeCacheMap call:

NTKERNELAPI VOID CcInitializeCacheMap (
    IN PFILE_OBJECT FileObject,
    IN PCC_FILE_SIZES
FileSizes,
   
IN
BOOLEAN PinAccess,
    IN PCACHE_MANAGER_CALLBACKS Callbacks,
    IN PVOID LazyWriteContext
    );

Most of these parameters are reasonably self-explanatory ¡V the file sizes normally coming from the common header, the Callbacks being a consistent set of functions defined by your file system. The LazyWriteContext is the argument passed to the callback functions (the Context argument each of them takes) so it allows you to specify what information will be passed back to your callback function for further processing.

The PinAccess argument is used by the Cache Manager to determine if the data represented by this memory region will be locked (or pinned) in memory by the file system. The NT file systems use this ability to pin memory down as a mechanism for memory mapping their own data structures. However, in order to ensure that a particular critical data structure is resident in memory and ineligible to be released, the Cache Manager allows a file system to pin the data in memory for the duration of the critical operation. Buffer Control Blocks (BCBs) describe such pinned sections.

Thus, normally user data is not held for pinned access.

Note that the cache map is not normally initialized for files being accessed using unbuffered I/O operations ¡V files that were opened with the FILE_NO_INTERMEDIATE_BUFFERING bit specified.

Once CcInitializeCacheMap has been called by your file system, it is possible for you to receive fast I/O operations for the file. The Cache Manager sets the PrivateCacheMap field in the file object to point to a Cache Manager allocated data structure and the I/O Manager decides if the fast I/O path can be taken based on the value in this field.

The following code sample is from an older version of the OSR FSDK:

NTSTATUS OwInitializeCacheMap(POW_IRP_CONTEXT IrpContext)

{

// Make sure that this thing can even be cached.

OwAssert(IrpContext->IrpSp->FileObject->SectionObjectPointer);
OwAssert(OwIsResourceAcquiredExclusive(&IrpContext->Fcb->Resource));
OwAssert(!IrpContext->FileObject->PrivateCacheMap);
OwAssert(IrpContext->Fcb->CommonHeader.AllocationSize.QuadPart >=
IrpContext->Fcb->CommonHeader.FileSize.QuadPart);

CcInitializeCacheMap(IrpContext->FileObject,

(PCC_FILE_SIZES) &IrpContext->Fcb->CommonHeader.AllocationSize,

FALSE, // access for pinning?

&OwCallbacks,

IrpContext->Fcb);

OSR_TRACE1(IrpContext);

return (STATUS_SUCCESS);

}

Each of the file systems in the Microsoft IFS kit also contains examples of cache map initialization:

File System

File

Routine

FAT

read.c

FatCommonRead

CDFS

write.c

CdCommonWrite

RDR2

rdbss\fileinfo.c

RxSetAllocationInfo

Other similar routines (for read ahead for example) can be located in the same file.

CcIsThereDirtyData

This routine is used to determine if there is any dirty data on the given physical media volume, as specified by its VPB structure. The prototype for this routine is:

NTKERNELAPI BOOLEAN CcIsThereDirtyData (
    IN PVPB Vpb
    );

Data cached by the Cache Manager is described using Section Objects. In turn, a Section Object refers to some File Object that backs it. That File Object indicates what volume it is located on (for physical media file systems.) Thus, the Cache Manager can ascertain if a given physical media volume has any dirty data stored on it by calling this routine.

File Systems that do not maintain a VPB structure, such as network file systems, cannot use this call.

CcMapData

This routine is used by a file system to build a mapping for data in such a fashion it can be controlled (via a BCB) by the file system. Typically, this is used by a file system that memory maps its own file system data structures. The prototype for this call is:

NTKERNELAPI BOOLEAN CcMapData (
    IN PFILE_OBJECT FileObject,
    IN PLARGE_INTEGER FileOffset,
    IN ULONG Length,
    IN BOOLEAN Wait,
    OUT PVOID *Bcb,
    OUT PVOID *Buffer
    );

After calling this routine, the file system must then pin the buffer prior to actually accessing the data. Accessing data mapped but not pinned in the cache may lead to unpredictable results.

CcMdlRead

This routine is typically used by a file system to obtain an MDL describing the cache buffer. Since an MDL can only be used by a kernel-mode component, this is typically only used by kernel-mode applications, such as a file server. By using an MDL describing the cache, however, the kernel-resident code can avoid a data copy between a buffer and the cache. The prototype for this function is:

NTKERNELAPI VOID CcMdlRead (
    IN PFILE_OBJECT FileObject,
    IN PLARGE_INTEGER FileOffset,
    IN ULONG Length,
    OUT PMDL *MdlChain,
    OUT PIO_STATUS_BLOCK IoStatus
    );

Typically, an FSD will use this routine to satisfy an IRP_MJ_READ request with a minor function of IRP_MN_MDL. The MDL returned by this routine can then be subsequently released using either CcMdlReadComplete or FsRtlMdlReadCompleteDev.) Because there are potential ramifications involved when using these values, developers should carefully consider their requirements before choosing one particular function versus another.

The MDL is returned to the caller in the IRP (as the MdlAddress field.)

CcMdlReadComplete

This routine is used as the compliment of CcMdlRead. The prototype for this function is:

NTKERNELAPI VOID CcMdlReadComplete (
    IN PFILE_OBJECT FileObject,
    IN PMDL MdlChain
    );

An FSD uses this in response to an IRP_MJ_READ request with a minor function code of IRP_MN_MDL_COMPLETE.

Note that in Windows NT 4.0 through Service Pack 3, this call is implemented by calling the Fast I/O entry point MdlRead. If this routine is not implemented, then FsRtlMdlReadCompleteDev is called. If this routine is implemented its return value is ignored. This can cause problems with layered filter drivers, such as the examples included in the Microsoft IFS Kit.

The MDL to be released is normally the one provided to the FSD as the MdlAddress field of the IRP.

CcMdlWriteComplete

This routine is used as the compliment of CcMdlWrite. The prototype for this function is:

NTKERNELAPI VOID CcMdlWriteComplete (
    IN PFILE_OBJECT FileObject,
    IN PLARGE_INTEGER FileOffset,
    IN PMDL MdlChain
    );

An FSD uses this in response to an IRP_MJ_READ request with a minor function code of IRP_MN_MDL_COMPLETE.

Note that in Windows NT 4.0 through Service Pack 3, this call is implemented by calling the Fast I/O entry point MdlRead. If this routine is not implemented, then FsRtlMdlReadCompleteDev is called. If this routine is implemented its return value is ignored. This can cause problems with layered filter drivers, such as the examples included in the Microsoft IFS Kit.

The MDL to be released is normally the one provided to the FSD as the MdlAddress field of the IRP.

CcPinMappedData

This call is used to ensure that data mapped into memory via a call to CcMapData is pinned in memory so that it can be used by the file system. The prototype for this call is:

NTKERNELAPI BOOLEAN CcPinMappedData (
    IN PFILE_OBJECT FileObject,
    IN PLARGE_INTEGER FileOffset,

    IN ULONG Length,
    IN BOOLEAN Wait,
    IN OUT PVOID *Bcb
    );

The FileObject, FileOffset, and Length arguments identify the specifics of what is being pinned in memory. The Wait parameter indicates if the caller is willing to block (for synchronization objects) while making this call. If Wait is FALSE and locks cannot be immediately acquired, then this call will return FALSE to the caller, who may attempt the call at a later time and/or in a different context.

The Bcb argument is the BCB pointer returned to the FSD via the earlier call to CcMapData. The FSD is responsible for releasing this BCB once it has finished accessing the pinned data.

CcPinRead

This call is used to read, map, and pin data into the cache in a single operation. Its prototype is:

NTKERNELAPI BOOLEAN CcPinRead (
    IN PFILE_OBJECT FileObject,
    IN PLARGE_INTEGER FileOffset,
    IN ULONG Length,
    IN BOOLEAN Wait,

    OUT PVOID *Bcb,
    OUT PVOID *Buffer
    );

Functionally, this is equivalent to calling CcMapData and CcPinMappedData via a single operation.

The FSD must unpin the data once the buffer is no longer needed.

CcPrepareMdlWrite

This routine is typically used by a file system to obtain an MDL describing the cache buffer. Since an MDL can only be used by a kernel-mode component, this is typically only used by kernel-mode applications, such as a file server. By using an MDL describing the cache, however, the kernel-resident code can avoid a data copy between a buffer and the cache. The prototype for this function is:

NTKERNELAPI VOID CcPrepareMdlWrite (
    IN PFILE_OBJECT FileObject,
    IN PLARGE_INTEGER FileOffset,
    IN ULONG Length,
    OUT PMDL *MdlChain,
    OUT PIO_STATUS_BLOCK IoStatus
    );

Typically, an FSD will use this routine to satisfy an IRP_MJ_WRITE request with a minor function of IRP_MN_MDL. Note that because the data is to be written, it is only read from disk if necessary. It would be necessary if, for instance, the offset and length indicate that only a portion of a physical memory page will be modified. The data currently in that area of the file is fetched from disk.

Because of this, data need not be present in the buffer when this call returns.

CcPreparePinWrite

This call is used to map, and pin data into the cache in a single operation that is subsequently going to be modified. Its prototype is:

NTKERNELAPI BOOLEAN CcPreparePinWrite (
    IN PFILE_OBJECT FileObject,
    IN PLARGE_INTEGER FileOffset,
    IN ULONG Length,
    IN BOOLEAN Zero,
    IN BOOLEAN Wait,
    OUT PVOID *Bcb,
    OUT PVOID *Buffer
    );

Because the pages are going to be modified, they need not be read from disk, except when a portion of a page is being modified.

The caller is responsible for releasing the Bcb once the modifications have been made to the data.

CcPurgeCacheSection

This routine is used by an FSD to attempt to purge any mappings of the pages within the cache. Data being purged is discarded from memory. If the data had been modified prior to the purge operation, the updates to the data are lost. The prototype for this function is:

NTKERNELAPI BOOLEAN CcPurgeCacheSection (
    IN PSECTION_OBJECT_POINTERS SectionObjectPointer,

    IN PLARGE_INTEGER FileOffset OPTIONAL,
    IN ULONG Length,
    IN BOOLEAN UninitializeCacheMaps
    );

The SectionObjectPointer structure identifies the cache data structures to use as part of this operation. The FileOffset is a pointer to a variable containing the file offset where the purge should begin. The Length indicates the number of bytes to purge, beginning with the FileOffset. The UninitializeCacheMaps argument indicates that all file objects that are maintaining private cache map information must be uninitialized prior to the actual purge operation taking place.

If the section objects specified by the SectionObjectPointer structure are in use to map the file in anything other than the cache itself, this call will fail. Typically, this is the case when a file has been memory mapped by an application program and hence cannot be purged so long as those mappings persist.

The FileOffset and Length parameter interact together to advise the Cache Manager what should be purged.

FileOffset

Length

Effect

NULL

Any Value

Length is ignored and the whole file is purged

Any Value

NULL

The file is purged from the byte indicated by FileOffset through the end of file.

Any Value

Any Value

The file is purged beginning with the byte indicated by FileOffset for Length bytes.

Note that the FSD must be able to handle the case where this routine return FALSE as this will be the case under certain circumstances. In such a case, it is not possible to purge the data in the cache.

CcRepinBcb

This routine is used by an FSD to increment the reference count on a previously created BCB. The prototype for this call is:

NTKERNELAPI VOID CcRepinBcb (
    IN PVOID Bcb
    );

An FSD may find that it is necessary to use a previously created buffer control block. In such circumstances this routine is used to ensure that the Bcb remains valid for the duration of the operation.

The FSD is responsible for releasing that reference count using the CcUnpinRepinnedBcb call.

CcSetAdditionalCacheAttributes

This routine is used by an FSD to enable or disable read ahead and write behind for a given file. The prototype for this call is:

NTKERNELAPI VOID CcSetAdditionalCacheAttributes (
    IN PFILE_OBJECT
FileObject,
   
IN
BOOLEAN DisableReadAhead,
    IN BOOLEAN DisableWriteBehind
    );

The FileObject is the file for which the additional attributes are to be established. The Cache Manager uses this information to determine the behavior of the cache when the file is being accessed. Thus, if DisableReadAhead is TRUE the Cache Manager will not perform read ahead for I/O operations done by this particular file. Similarly, if DisableWriteBehind is TRUE the Cache Manager will disable caching dirty data. Instead, writes will be done through the cache (so the data is available for subsequent reads) but the write does not complete until such time as the data is on the disk.

This impacts the behavior of the Cache Manager calls CcCopyRead and CcCopyWrite.

CcSetBcbOwnerPointer

The Cache Manager uses this information in order to determine the "owner" of the ERESOURCE embedded within a buffer control block. While not generally useful, there are odd circumstances under which that ERESOURCE might be obtained on behalf of one thread and be released by a different thread. The prototype for this call is:

NTKERNELAPI VOID CcSetBcbOwnerPointer (
    IN PVOID Bcb,
    IN PVOID OwnerPointer
    );

The Bcb argument indicates the BCB containing the ERESOURCE in question. The OwnerPointer is a pointer to the ETHREAD structure of the thread that is the new owner.

CcSetDirtyPageThreshold

An FSD may limit the total amount of dirty data the Cache Manager will maintain for a given file. The prototype for this call is:

NTKERNELAPI VOID CcSetDirtyPageThreshold (
    IN PFILE_OBJECT FileObject,
    IN ULONG DirtyPageThreshold
    );

Once the number of dirty pages being cached for a particular file exceeds the DirtyPageThreshold subsequent writes will block as data is flushed from the cache to disk. Once the number of dirty pages has dropped below the threshold, new writes are allowed to proceed.

There is no requirement that an FSD set this limit. The default is to allow the Cache Manager and Memory Manager to control the write-behind policy.

CcSetDirtyPinnedData

This call is used to indicate that data in a cache memory region described by a previously pinned BCB should be marked dirty, whether or not any changes were made to that data. The prototype for this call is:

NTKERNELAPI VOID CcSetDirtyPinnedData (
    IN PVOID Bcb,
    IN PLARGE_INTEGER Lsn OPTIONAL
    );

The Lsn parameter should be passed as NULL for file systems not taking advantage of the Windows NT log mechanism.

This routine could be used by an FSD to force a data region to be written to disk even though it had not been modified.

CcSetFileSizes

The Cache Manager relies upon the FSD to advise it whenever the size of a file actually changes. This routine is used by an FSD to indicate to the Cache Manager that a file size is changing. The prototype for this call is:

NTKERNELAPI VOID CcSetFileSizes (
    IN PFILE_OBJECT FileObject,
    IN PCC_FILE_SIZES FileSizes
    );

The FileObject identifies the specific file that is changing size. The FileSizes indicate the new file sizes. The CC_FILE_SIZES data structure is related to the file size information contained within the common header structure used by all Windows NT file systems.

Of these sizes provided by an FSD, the two critical sizes are the AllocationSize and FileSize of the file. The AllocationSize is the maximum amount of data that may be stored in the allocated space and is used by the VM system to indicate the size of the section describing that file. The FileSize indicates the amount of data currently present within the file. This is used by the VM system to indicate the size of the mapped view describing that file.

The AllocationSize must be larger than the FileSize.

CcSetLogHandleForFile

This routine is listed here for completeness. It is used within file systems that take advantage of the internal logging mechanism within Windows NT. It is not generally useful for file systems. The prototype for this function is:

NTKERNELAPI VOID CcSetLogHandleForFile (
    IN PFILE_OBJECT FileObject,
    IN PVOID LogHandle,
    IN PFLUSH_TO_LSN FlushToLsnRoutine
    );

CcSetReadAheadGranularity

This routine is used by an FSD to control the read-ahead policy of the Cache Manager. The prototype for this function is:

NTKERNELAPI VOID CcSetReadAheadGranularity (
    IN PFILE_OBJECT FileObject,
    IN ULONG Granularity
    );

The default read-ahead size used by the Windows NT Cache Manager is 4K, although it appears that all the Windows NT file systems set their own default to be 64K.

Granularity must be 2N * PAGE_SIZE, for N„d 0. Otherwise, your results will be unpredictable.

Note that the Memory Manager has a hard-coded limitation of 64KB when reading from disk drives. Thus, even if your FSD establishes a read-ahead size larger than 64KB it will be satisfied via a series of 64KB read-ahead units.

CcUninitializeCacheMap

NTKERNELAPI BOOLEAN CcUninitializeCacheMap (
    IN PFILE_OBJECT FileObject,
    IN PLARGE_INTEGER TruncateSize OPTIONAL,

    IN PCACHE_UNINITIALIZE_EVENT UninitializeCompleteEvent OPTIONAL
    );

Uninitializing the cache map is normally done when the file object has been closed by the user application ¡V as the result of an IRP_MJ_CLEANUP request arriving in the underlying file system. At that time there are certain operations which must be performed in order to make sure the cache is torn down properly.

For a normal file, the two optional parameters are omitted. The code for this would look like:

CcUninitializeCacheMap(FileObject, 0, 0);

Literally, this is a request to cease caching on behalf of this file.

If this function returns TRUE, this is the last open instance of this file and the Cache Manager has deleted the shared cache map (otherwise, the shared cache map is still in use by other open instances of the file.)

There are a few interesting side effects for this function:

If the file is being deleted, the TruncateSize parameter will point to a LARGE_INTEGER containing the truncated size of the file (typically zero.) This will tell the Cache Manager that any dirty data associated with this file need not be written back to disk ¡V although there is no guarantee, since the Memory Manager might decide to write it back independently of the file Cache Manager.

If the file system wishes to block for the cache map to be torn down, it can optionally provide an event that can be used to wait for the final destruction of the shared cache map.

This routine may be safely called for all file objects, even for those file objects for which the file system did not call CcInitializeCacheMap.

CcUnpinData

This routine is used to release a previously pinned BCB. The prototype for this call is:

NTKERNELAPI VOID CcUnpinData (
    IN PVOID Bcb
    );

For each call made by an FSD to the routines CcPinRead,CcPreparePinWrite, and CcPinMappedData this call releases the pinning done on the given Bcb. When the reference count on the Bcb drops to zero, it can be freed by the Cache Manager so that the range in the Cache Manager's address space can be reused.

CcUnpinDataForThread

This routine is used to allow a thread, other than the one that initially acquired the BCB, to release the ERESOURCE within the BCB. The prototype for this call is:

NTKERNELAPI VOID CcUnpinDataForThread (
    IN PVOID Bcb,
    IN ERESOURCE_THREAD ResourceThreadId
    );

CcUnpinRepinnedBcb

This call is used by an FSD to release a BCB previously pinned by a call to CcRepinBcb. The prototype for this call is:

NTKERNELAPI VOID CcUnpinRepinnedBcb (
    IN PVOID
Bcb,
   
IN
BOOLEAN WriteThrough,
    OUT PIO_STATUS_BLOCK IoStatus
    );

The WriteThrough option indicates if the FSD wishes to ensure that any dirty data in the region of the file described by the BCB be committed to disk prior to completion of this call. If WriteThrough is TRUE then upon completion of this routine the IoStatus will be set to indicate the results of any write operations. If WriteThrough is FALSE the dirty data will be written at a later time by the Cache Manager's Lazy Writer.

CcZeroData

This routine is used by an FSD to ensure that a range within memory is set to zero. The prototype for this call is:

NTKERNELAPI BOOLEAN CcZeroData (
    IN PFILE_OBJECT FileObject,
    IN PLARGE_INTEGER StartOffset,
    IN PLARGE_INTEGER
EndOffset,
   
IN
BOOLEAN Wait
    );

Typically, an FSD uses this routine to zero new data areas within a file so that any detritus left from previous uses of that memory are obliterated and not available to application programs.

The FileObject indicates which file is to be zeroed, while the StartOffset and EndOffset indicate the range within the file that is to be zeroed. The Wait parameter indicates if the caller is willing to wait while any necessary synchronization objects are acquired.

In general, this call does not result in disk I/O. If the offsets specified are not on even page boundaries, a page fault will be triggered to fetch the page so that the data not being modified will be preserved properly. Additionally, if the FileObject indicates the file was opened with write-through semantics, then as the pages are zeroed they will be written back to disk.

CcZeroEndOfLastPage

This routine is used to zero the portion of the last page of the file past the last valid byte (the end of file) and the end of the physical page. The prototype for this call is:

NTKERNELAPI VOID CcZeroEndOfLastPage(
    IN PFILE_OBJECT FileObject
    );

This ensures that if the file is extended in size, no data from previous usage of the memory becomes accessible. This call is not normally used by an FSD, but is included here for completeness.

FstRtlMdlReadcompleteDev

This routine is useful for file system filter drivers because it does not exhibit the re-entrant behavior of CcMdlReadComplete that can cause data loss if the file system filter driver returns FALSE from its MdlReadComplete fast I/O entry point. The prototype for this call is:

NTKERNELAPI BOOLEAN FsRtlMdlReadCompleteDev( PFILE_OBJECT FileObject,
    PMDL MdlChain,
    PDEVICE_OBJECT DeviceObject
    );

This turns out to be a wrapper around the Cache Manager routine CcMdlReadComplete2, which has identical semantics to those of CcMdlReadComplete in NT 3.51.

Note that this call is not present in the NT 4.0 IFS Kit.

FstRtlMdlWriteCompleteDev

This routine is useful for file system filter drivers because it does not exhibit the reentrant behavior of CcMdlWriteComplete that can cause data loss if the file system filter driver returns FALSE from its MdlWriteComplete fast I/O entry point. The prototype for this call is:

NTKERNELAPI BOOLEAN FsRtlMdlWriteCompleteDev( PFILE_OBJECT FileObject,
    PLARGE_INTEGER FileOffset,
    PMDL MdlChain,
    PDEVICE_OBJECT DeviceObject
    );

This turns out to be a wrapper around the Cache Manager routine CcMdlWriteComplete2, which has identical semantics to those of CcMdlWriteComplete in NT 3.51.

[Note that this call is not present in the NT 4.0 IFS Kit.]

Related Articles
Windows NT Virtual Memory (Part II)
Windows NT Virtual Memory (Part I)
Caching in the Pentium 4 Processor

User Comments
Rate this article and give us feedback. Do you find anything missing? Share your opinion with the community!
Post Your Comment

"CC mananger"
nice and simple article to understand and implement. But there are lot of cc functions which are undocumented but they fails for some illegal operation and we do not any clue. for example one function "CcGetVacbLargeOffset". If more description is there, it would be really helpful for file system developers.

Rating:
02-Jul-08, suresh vishnoi


Post Your Comments.
Print this article.
Email this article.
bottom nav links