When making the heart pixel badge, I made a simple expanding circle mode.

Polar ripple
Polar ripple

The idea is that when you tap on the badge, it triggers an animation where a circle of LEDs expands from the center.

The way I calculate this is I precompute the polar coordinates of every LED in a unit circle where 1.0 = 255. To create a circle expanding from the center I simply increment a variable from zero to 255 and for each LED position I use simple arithmatic to tell whether its precomputed radius is close enough to the sweep variable

I began experimenting with rudimentary particle effects, like this raindrop/matrix mode:

Particle raindrops
Particle raindrops

I wanted to make a mode where circles could start and expand from any point, like raindrops falling on water. But in order to do that I would need to calculate the Euclidean distance of each point from the centroid of each circle — sqrt(x1 * x2 + y1 * y2).

This is equivalent to the L2 norm or the magnitude of the 2D vector formed by the difference between the point coordinates. The L2 norm represents the shortest possible distance between two points in a Euclidean space, which is what gives circles their perfect round shape. Unlike the L1 norm (Manhattan distance) or L∞ norm (Chebyshev distance), the L2 norm requires a square root operation.

I would need to repeat this for every LED for every particle. But I’m using an embedded system with an extremely primitive ALU. I don’t have floating point or a division operator. That sqrt() operation is expensive on my system, which barely even has hardware multiplication (so Newton’s method cannot be used).

I didn’t think it would be fast enough. I already had to seriously underclock the CPU on the chip in order to run at the voltage of a small watch battery, and it was already occupied refreshing the LEDs using charlieplexing, which requires an extremely fast refresh rate to work at all.

Then I discovered the very literally named alpha max plus beta min algorithm.

Alpha max plus beta min algorithm

The alpha max plus beta min algorithm provides a computationally efficient approximation of the Euclidean distance (L2 norm) of a vector. Instead of calculating the exact value using the costly square root operation, it uses a weighted sum of the maximum and minimum components of the vector.

For a 2D vector (x, y), the formula is:

magnitude ≈ α * max(|x|, |y|) + β * min(|x|, |y|)

Where α and β are constants chosen to minimize the approximation error across the entire input range.

The beauty of this algorithm lies in its simplicity and minimal computational requirements. It completely eliminates the need for squaring, addition, and square root operations normally required for Euclidean distance calculation. Instead, it uses only comparison, multiplication (which can be further optimized to bit shifts for certain values), and addition.

The algorithm provides a remarkable balance between computational efficiency and accuracy. Depending on the chosen values for α and β, the maximum error typically ranges from 4% to less than 1%.

Uh, ELI5?

Imagine you’re trying to measure the diagonal distance between two points on a piece of graph paper.

Normally, you’d use the Pythagorean theorem, but suppose we want something faster and don’t need enough accuracy to land on the moon.

This algorithm gives you a shortcut:

  1. Look at how far apart the points are horizontally (x) and vertically (y)
  2. Figure out which distance is bigger (the “max”) and which is smaller (the “min”)
  3. Take most of the bigger distance (like 90% of it) and add a bit of the smaller distance (like 40% of it)

For example, if you’re 3 steps right and 4 steps up apart:

  • The bigger distance is 4
  • The smaller distance is 3
  • We take most of 4 (maybe 0.96 × 4 = 3.84)
  • We take some of 3 (maybe 0.4 × 3 = 1.2)
  • Add them: 3.84 + 1.2 = 5.04

The exact distance would be 5. So our quick method is off by just a tiny bit.

Example

uint8_t distance(uint8_t dx, uint8_t dy) {
    if (dx > dy) {
        dy >>= 1;
    } else {
        dx >> 1;
    }

    return dx + dy;
}

I felt like I had discovered some magic. I tested the function, which to my disbelief worked with high accuracy. With 49 leds and 3 particles I had eliminated 147 sqrt() operations per frame update.

This let me accomplish the particle effect I was looking for:

Particle ripples using approximate magnitude calculation
Particle ripples using approximate magnitude calculation

Simple Approximation: |z| = α Max + β Min

α β Code Equivalent Alternative (Shifts Only)
1/1 1/2 max_val + (min_val >> 1) max_val + (min_val >> 1)
1/1 1/4 max_val + (min_val >> 2) max_val + (min_val >> 2)
1/1 3/8 max_val + ((min_val * 3) >> 3) max_val + (min_val >> 2) + (min_val >> 3)
7/8 7/16 ((max_val * 7) >> 3) + ((min_val * 7) >> 4) max_val - (max_val >> 3) + (min_val >> 1) - (min_val >> 4)
15/16 15/32 ((max_val * 15) >> 4) + ((min_val * 15) >> 5) max_val - (max_val >> 4) + (min_val >> 1) - (min_val >> 5)

Notice that all of the values of alpha and beta a fractions of powers of two — so they can be performed with simple bit shifts. I’ve included versions of the formula accomplished without multiplication, for systems where that might be more efficient. This will also prevent some cases of intermediate overflows (you should test with your compiler and input ranges to be sure.)

Return max

The first improvement we can make is, for any of the values of alpha and beta, the magnitude cannot be smaller than the max value. The hypotenuse of a triangle must be longer than either of the right sides.

So if the value of alpha is less than 1, we can return max(max_val, alpha * max_val + beta * min_val) to improve our accuracy.

α β Code Equivalent Alternative (Shifts Only)
7/8 7/16 max(max_val, ((max_val * 7) >> 3) + ((min_val * 7) >> 4)) max(max_val, max_val - (max_val >> 3) + (min_val >> 1) - (min_val >> 4))
15/16 15/32 max(max_val, ((max_val * 15) >> 4) + ((min_val * 15) >> 5)) max(max_val, max_val - (max_val >> 4) + (min_val >> 1) - (min_val >> 5))

Retune

We can retune the values of alpha and beta given the above. Let’s introduce a new term α₀ set to a value of 1.

α₀ α₁ β Code Equivalent Alternative (Shifts Only)
1 7/8 17/32 max(max_val, ((7 * max_val) >> 3) + ((17 * min_val) >> )) max(max_val, (max_val - (max_val >> 3)) + (min_val >> 1) + (min_val >> 5))
1 29/32 61/128 max(max_val, ((29 * max_val) >> 5) + ((61 * min_val) >> 7)) max(max_val, (max_val - (max_val >> 5) - (max_val >> 4)) + (min_val >> 1) + (min_val >> 3) + (min_val >> 5) + (min_val >> 6))

Four parameter version: |z| = max(|z₀|, |z₁|)

Using the max value as above is effectively splitting the line in two, with the max value as our best guess in part of the range. We can improve accuracy further by coming up with a better initial guess for this part of the range by adding yet another parameter.

α₀ β₀ α₁ β₁ Code Implementation Alternative (Shifts Only)
1 1/8 7/8 33/64 max(max_val + (min_val >> 3), ((max_val * 7) >> 3) + ((min_val * 33) >> 6)) max(max_val + (min_val >> 3), max_val - (max_val >> 3) + (min_val >> 1) + (min_val >> 6))
1 5/32 27/32 71/128 max(max_val + ((min_val * 5) >> 5), ((max_val * 27) >> 5) + ((min_val * 71) >> 7)) max(max_val + (min_val >> 5) + (min_val >> 7), max_val - (max_val >> 5) - (max_val >> 6) + (min_val >> 1) + (min_val >> 3) - (min_val >> 7))
127/128 3/16 27/32 71/128 max(((max_val * 127) >> 7) + ((min_val * 3) >> 4), ((max_val * 27) >> 5) + ((min_val * 71) >> 7)) max(max_val - (max_val >> 7) + (min_val >> 2) - (min_val >> 4), max_val - (max_val >> 5) - (max_val >> 6) + (min_val >> 1) + (min_val >> 3) - (min_val >> 7))

Floating point

If you have the luxury of floating point, there are floating point parameters that are more accurate than any of the integer parameters above.

α₀ β₀ α₁ β₁ Code
0 0 0.960433870103 0.397824734759 max_val * 0.960433870103 + min_val * 0.397824734759
1 0 0.898204193266868 0.485968200201465 max(max_val, max_val * 0.898204193266868 + min_val * 0.485968200201465)

3D version

It’s lesser known that this algorithm can be extended to 3D extension of this technique.

For a 3D vector (x, y, z), we sort the absolute values of the components in descending order: a ≥ b ≥ c.

Then, we can apply a formula of the form:

magnitude ≈ α * a + β * b + γ * c

Here are some values I’ve found for the parameters of the 3D version.

α β γ Implementation
1 1/2 1/4 a + (b >> 1) + (c >> 2)
15/16 6/16 5/16 (15 * a + 6 * b + 5 * c) >> 4

I wasn’t able to find a lot of references for the 3D versions.

The improvement of returning a if it’s larger than the rest of the fomula also applies here.

Interactive demo

I’ve made an interactive demo sampling random points in a limited range to give some idea of the accuracy, based on the formulas defined above.

Conclusion

This algorithm reminds of the Fast Inverse Square Root algorithm, and it’s actually solving a similar problem, though the computers of the 1990’s the authors of Quake were designing for were vastly more powerful than the microcontrollers I use today, with the luxury of native floating point.

I’m always on the lookout for ways I can do linear algebra and trigonometry efficiently on 8 bit systems. I’m really glad I found this algorithm and it let me make some cool animations.