So, this new benchmark I'm making. Let's call it gpuCompute.
The idea is to have a benchmark that exploits the capabilities of GPGPU and have nothing to do with graphics. So mathematics.
I did some tests designing a program that goes through every number from 1 to N, and determines if N is prime. This means, in general:
for i = 1 to N {
if isPrime(i) {
//prime++;
}
}
isprime(k) {
for i = 2 to k {
if k%i == 0 {return false;}
}
return true;
}
However, this can be shortened to go from 2 to sqrt(k), to make it go faster.
Well this benchmark checked 160 billion numbers in 3.3 seconds on my GTX280. If I stuck in an output (i.e. de-commented the prime++;) then it did 1 million in 0.2 seconds due to bad memory coalescence. When overclocking the GFX card, it only responded to an OC on the Core frequency. Hmm, not good.
Here's how it looked:
After conversing with the guys at BenchTec, they wanted a 2-3 minute benchmark. So utilising some functions I've done at work with 2D finite element simulation, I programmed some of that in.
In a nutshell, a 2D finite element simulation takes a grid of R*Z nodes, then computes each note in a new timestep as the average of the nodes around it of the previous timestep. So a grid of 1000x1000 has one million nodes - or one million threads for a cuda device. Then to increase the length of the simulation, add more time steps or increase the grid size.
This kernel function uses liberal amounts of texture memory, and 2*4*R*Z bytes of graphics memory. So I need something for the smaller CUDA cards to run, so I've limited the grid to take up at most 64MB of graphics memory and just increased the timesteps. Due to the liberal use of memory and reasonable levels of calculation per thread, this system responds very nicely to any sort of core or mem overclock. Niice :)
So all that's left to do is:
a) Change the program so multiple GPUs can be used.
b) Implement some sort of checksum to stop cheating.
c) Move it to OpenCL so ATi cards can use it too.
No comments:
Post a Comment