The documentation is a bit sparse but readable. You simply define the function you want to execute and the dimensions of the problem. You can specify one, two, or three dimensions, as suits your problem space. When you execute the associated function it will try to run the kernels on your GPU in parallel. If it can’t, it will still get the right answer, just slowly.
... read the whole story at hackaday.com.