AVStream: using pre-allocated frame buffers in custom allocator

Bernard_Willaert-2 · May 16, 2014, 3:29pm

We want to connect the single output pin of our kernel-mode capture filter based on AVSHWS to the input pin of a user-mode filter based on “dump”.
Our capture filter does scatter/gather DMA of video frames into the buffers presented at the output pin and in the “dump” filter, the Receive method gets them as iMediaSamples, where we can again access the data in the buffer.
We now want to use a number of buffers that the dump filter has pre-allocated in system memory, instead of the default allocated frame buffers.
Do we need to implement a custom allocator at the input pin of the dump filter ?
Like described here: http://msdn.microsoft.com/en-us/library/dd377477(v=vs.85).aspx ?
How can we be sure that the data transport between these 2 filters will use the custom allocator ?
Are there any examples around that describe this custom frame buffer allocation ?

Thank You in advance for any help.

Bernard Willaert
Software design engineer
Barco - Healthcare division
Belgium

Tim_Roberts · May 18, 2014, 3:50am

On May 16, 2014, at 12:28 PM, xxxxx@barco.com wrote:

We want to connect the single output pin of our kernel-mode capture filter based on AVSHWS to the input pin of a user-mode filter based on “dump”.
Our capture filter does scatter/gather DMA of video frames into the buffers presented at the output pin and in the “dump” filter, the Receive method gets them as iMediaSamples, where we can again access the data in the buffer.
We now want to use a number of buffers that the dump filter has pre-allocated in system memory, instead of the default allocated frame buffers.

Why? The graph is perfectly capable of allocating system memory on its own.

Do we need to implement a custom allocator at the input pin of the dump filter ?

Yes.

Like described here: http://msdn.microsoft.com/en-us/library/dd377477(v=vs.85).aspx ?
How can we be sure that the data transport between these 2 filters will use the custom allocator ?

You can?t. The allocator process is mostly just providing suggestions and recommendations. The graph manager will usually use your allocator if you provide one, but they don?t actually guarantee it.

Are there any examples around that describe this custom frame buffer allocation ?

The MSDN page you referred to above is the key document.

Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Bernard_Willaert-2 · May 18, 2014, 1:23pm

Thanks for your input, Tim

Why? The graph is perfectly capable of allocating system memory on its own.
We have a proptietary library that makes a composition of several incoming streams.
This library / framework allocates ist own frame buffers, in which the capture filter is supposed to DMA its frames into. So, from a DirectShow point of view, it would be nice if the connected filter to our capture output pin = the compositor filter - would provide the frame buffer S/G lists for its frame buffer directly.
The alternative way would be to send down an IOCTL to the capture filter describing the frame buffers to DMA into…
We just want a clean bridging solution between a HW capture filter that works fine with other filters in the DirectShow filter collection ( like a renderer…) and another library that allocates frame buffers to make a composition of several incoming streams. We want to expose these pre-allocated buffers to our capture filter to DMA into.
?
Thanks,

Bernard Willaert

Tim_Roberts · May 19, 2014, 2:56am

On May 18, 2014, at 10:23 AM, xxxxx@barco.com wrote:

> Why? The graph is perfectly capable of allocating system memory on its own.
We have a proptietary library that makes a composition of several incoming streams.
This library / framework allocates ist own frame buffers, in which the capture filter is supposed to DMA its frames into. So, from a DirectShow point of view, it would be nice if the connected filter to our capture output pin = the compositor filter - would provide the frame buffer S/G lists for its frame buffer directly.

OK, wait a minute. You certainly cannot pass scatter/gather lists into the filter through any conceivable DirectShow interface. The allocator works with virtual addresses ONLY.

What you?re talking about, I think, is a bit tricky. If you?re doing horizontal concatenation, then that means you need your capture filter to be able to handle very large strides. That is, the hardware has to be able to skip over the parts of the scan lines that don?t belong to this stream. if there are three streams across, then stream one gets pixels 0 to 639, then it has to skip over 1280 pixels before it can write the next scan line. It turns out there is no way in a BITMAPINFOHEADER or a VIDEOINFOHEADER or a memory allocator to describe that situation.

Having said that, there is a precedent for what you are describing. When the Video Mixing Renderers are using an overlay surface or a texture surface, those surfaces often have rather unusual alignment requirements (like multiple of 256 bytes). If the VMR is directly connected to a capture filter, what it will do is configure the stream initially for the fully packed format (i.e., a 640x480 DIB). Then, as soon as streaming starts, it will send a change-of-format notice to the capture pin, changing the format to, for example, 768x480. The capture filter knows that it doesn?t really want a new format, it?s merely saying ?OK, I?m switching to the overlay surface now and the stride is 768 pixels?. The capture filter is required to accept this ? VMR doesn?t even check whether it failed.

So, you could either try to emulate what VMR does, or you could implement your own private communication scheme.

Do you expect to have several capture filters feeding into your compositor?

Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Bernard_Willaert-2 · May 20, 2014, 9:23am

Thanks for the inputs, Tim !
I did a couple of experiments with the “dump” filter, by adding a custom allocator to its memoryInputPin.
When we connect the capture filter output to it, a single video stream is captured and transferred frame by frame to the dump filter.
The custom allocator is used, all of its callbacks are handled and I also found out that there is no S/G list, just a virtual address as you mention.
This is of course what we would like to have: allocate the required memory once in the “dump” filter and propagate its S/G list to the capture filter, instead of just the mediasample buffer pointer. Just like the standard allocator does…

We will also have a closer look at the VMRallocator.

Thanks again,

Bernard Willaert