When you have a formula, you could see if some of the operations could be optimized and you could see if the integer to float conversions could be simplifying for the formula. I will demonstrate this by using the alphablend formula in MSDN. This may be useful if you are writing your own custom-optimized alphablend function.

Note in the formula that S is one of the source primary colors(RGB), D is one of the destination primary colors(RGB) and A is the alpha channel. S, D and A are byte integers (that is, value is from 0 to 255).

First, you should attempt to remove the expensive integer to float conversion through rearrangement. Below is the original BLENDFUNCTION formula in MSDN.

As shown below, the subtraction is executed to invert the alpha first, before the division by 255.0.

Next, the multiplications are done before the divisions. If the multiplications are done first, the results of the divisions will not be 0.0 to 1.0 but 0 to 255, thus you can eliminate the float values and the integer to float conversions. You may ask, since S, D and A are byte integers, will the byte integer overflow if the multiplications execute first? The answer is no. Because in C/C++, byte and short integers are promoted to full integers before any computation begins.

Since both divisions are the same (both are divided by 255), they can be grouped together as shown below. In that case, one division has been eliminated.

As a further optimization, since 255 is close to 256, the division could be replaced by 255 with shift to the right by 8 as shown below. Of course, this optimization would sacrifice some accuracy, for alphablend operations, this is okay.

I have made a benchmark application to benchmark the alphablending of each formula for 1000 times. As shown in the screenshot below, the "Unoptimized" option is the original MSDN formula, the "Optimized" option is the new improved formula and the "Very Optimized" option is the new improved formula with right shifting to replace the division.

The results for a release build on my machine, are as follows

Unoptimized: 10030 milliseconds

Optimized: 3790 milliseconds

Very Optimized: 2821 milliseconds