NVIDIA Quadro Dual Copy Engines
The copy engine featured in Quadro solutions provides real GPU-asynchronous texture
downloads. Texture data can be downloaded or uploaded in parallel with 3D rendering.
... supported Quadro solutions1  add an additional DMA engine
making it now possible to overlap download, processing, and readback. To take
advantage of this, one thread (channel) is used for rendering, one is used for download
and the third is used for upload, and all transfers are done via PBOs. When partitioned
this way, the render thread will run on the graphics engine and the transfer threads on
the copy engines in parallel and completely asynchronous. These are fully functional GL
contexts so that non-DMA commands can be issued in the transfer threads but will time
slice with the rendering thread. Copy engines can also handle format conversions and
swizzling for same data types without CPU intervention, in contrast to previous
hardware constraints where the input data formats had to be GPU native.

