*It is time to start taking advantage of the computing power of GPUs...*

A while ago I wrote an article about how to use the Microsoft Parallel Patterns Library (PPL) to render the Mandelbrot fractal using multiple CPU cores. That article can be found here.

This new article will make the Mandelbrot renderer multiple times faster by using a new Microsoft technology called **C++ AMP: Accelerated Massive Parallelism**, introduced in the Visual Preview.

The code in the previous article showed each line of the fractal immediately after it was calculated. For this article, this is changed. The Mandelbrot image will be rendered completely off-screen, and only shown when the entire image has been calculated. This is to reduce the overhead of displaying the fractal line-by-line, especially with the C++ AMP version, which will be so fast that this overhead could become pretty substantial.

I will also switch to single precision floating point numbers, because the GPU and GPU driver combination on my Windows 7 machine does not support double precision floating point numbers. C++ AMP itself supports both single- and double precision floating point arithmetic. However, whether or not double precision arithmetic works depends on your specific GPU hardware and your specific drivers from your GPU vendor. A side effect of using single precision arithmetic in the Mandelbrot renderer is that the image will get blocky at big zoom levels. At the end of this article, a piece of code is shows how to check if an accelerator in your system supports double precision arithmetic or not. You could use that to decide at runtime whether to use a single precision or double precision implementation. This is left as an exercise for the reader.

### Single-Threaded Implementation

All implementations, the single-threaded, the PPL, and the C++ AMP Mandelbrot version, use the following setup:

const int halfHeight = int(floor(m_nBuffHeight/2.0)); const int halfWidth = int(floor(m_nBuffWidth/2.0)); const int maxiter = 1024; const float escapeValue = 4.0f; float zoomLevel = float(m_zoomLevel); float view_i = float(m_view_i); float view_r = float(m_view_r); float invLogOf2 = 1 / log(2.0f); if (m_buffers[m_nRenderToBufferIndex].empty()) return; unsigned __int32* pBuffer = &(m_buffers[m_nRenderToBufferIndex][0]);

Here is the single-threaded implementation from my previous article, but updated to use single precision floating point arithmetic, and to render one Mandelbrot image to the buffer before displaying it:

for (int y = -halfHeight; y < halfHeight; ++y) { // Formula: zi = z^2 + z0 float Z0_i = view_i + y * zoomLevel; for (int x = -halfWidth; x < halfWidth; ++x) { float Z0_r = view_r + x * zoomLevel; float Z_r = Z0_r; float Z_i = Z0_i; float res = 0.0f; for (int iter = 0; iter < maxiter; ++iter) { float Z_rSquared = Z_r * Z_r; float Z_iSquared = Z_i * Z_i; if (Z_rSquared + Z_iSquared > escapeValue) { // We escaped res = iter + 1 - log(log(sqrt(Z_rSquared + Z_iSquared))) * invLogOf2; break; } Z_i = 2 * Z_r * Z_i + Z0_i; Z_r = Z_rSquared - Z_iSquared + Z0_r; } unsigned __int32 result = RGB(res * 50, res * 50, res * 50); pBuffer[(y + halfHeight) * m_nBuffWidth + (x + halfWidth)] = result; } }

### Multi-Threaded Implementation (PPL)

Parallelizing this implementation using the Microsoft Parallel Patterns Library (PPL) is shown in my previous article. Again, this implementation has been updated to use single precision arithmetic and to render one whole frame to the buffer before displaying it. The updated code is as follows:

parallel_for(-halfHeight, halfHeight, 1, [&](int y) { // Formula: zi = z^2 + z0 float Z0_i = view_i + y * zoomLevel; for (int x = -halfWidth; x < halfWidth; ++x) { float Z0_r = view_r + x * zoomLevel; float Z_r = Z0_r; float Z_i = Z0_i; float res = 0.0f; for (int iter = 0; iter < maxiter; ++iter) { float Z_rSquared = Z_r * Z_r; float Z_iSquared = Z_i * Z_i; if (Z_rSquared + Z_iSquared > escapeValue) { // We escaped res = iter + 1 - log(log(sqrt(Z_rSquared + Z_iSquared))) * invLogOf2; break; } Z_i = 2 * Z_r * Z_i + Z0_i; Z_r = Z_rSquared - Z_iSquared + Z0_r; } unsigned __int32 result = RGB(res * 50, res * 50, res * 50); pBuffer[(y + halfHeight) * m_nBuffWidth + (x + halfWidth)] = result; } });