PCIeSpeedTest provides a synthetic benchmark to measure CPU->GPU and
GPU->CPU transfer speed using ATI CAL. This is done by allocating a
local resource on the GPU using calResAllocLocal2D() and allocating a
remote resource on the CPU using calResAllocLocal2D(). The remote
resource allocation is done in uncached memory space to optimize
transfer performance. calMemCopy() is used to issue the data transfer.

Allocation sizes start at 16 bytes and is incremented by 2 for every
data point. Allocation continues until the physical memory size is
reached on the device or the allocation routine returns unsuccessfully.
It is expected that you will not be able to allocate the entire memory
space as various system allocations exist on the device that are not
visible to the CAL user.

To amortize away moment to moment system variations, PCIeSpeedTest
queues 100 calMemCopy()s back to back before checking on the completion

PCIeSpeedTest_random is also provided to help generate a complete
performance view by generating random data size points. It will round
robin through all of the available compatible GPUs in the system. It
will also use variable binning to spread the data points across the
entire data size range. By default, PCIeSpeedTest_random will generate
100 data points for each GPU.

