A hand holds a glowing skull PCB.

I decided to refactor the Skully badge to use Bit Angle Modulation.

Background

The intent of the Skully badge is to achieve:

  • A realistic-looking fire effect that behaves like real flames
  • Motion control – tilt to change the direction of the flames
  • Interactive – tap to create sparks and intensify the fire
  • Battery efficient – run for hours on a small battery

Technical Challenges

Fire Animation Algorithm explanation

After lots of experimentation, I implemented a cellular automaton-inspired approach based on the Fire2012 algorithm:

// Step 1. Cool down every cell a little
for (uint8_t i = 0; i < NUM_LEDS; i++) {
  s_heat[i] = qsub8(s_heat[i], random8(0, ((COOLING * 10) / NUM_LEDS) + 2));
}

The animation simulates heat diffusion and convection, with parameters for “cooling” and “sparking” that control the fire behavior. The direction of the flame is controlled by the orientation of the device, which required some clever bit manipulation:

uint8_t bottom_led = map(roll, 0, 0xffff, 0, NUM_LEDS);
// Create a direction register for which way the fire should spread
s_direction_register = 0;
for (uint8_t i = 0; i < NUM_LEDS / 2; ++i) {
  bitSet(s_direction_register, addmod8(bottom_led, i, NUM_LEDS));
}

From Software PWM to Binary Code Modulation: A Performance Journey

When I first built this project, I used traditional software PWM to control the brightness of each LED. While it worked, the performance was terrible. With 18 LEDs each requiring individual control cycles, while it looked good enough to the human eye, the framerate made it difficult to film or photograph.

So I decided to refactor the entire LED control system to use Bit Angle Modulation (BAM).

The Problem with Traditional Software PWM

With my original software PWM implementation, I had to:

  1. Iterate through each LED individually
  2. Check if it should be on or off at the current time
  3. Set its state accordingly

This process had to be repeated many times per second to create the illusion of variable brightness. With 18 LEDs, that meant a lot of individual pin manipulations, eating up precious CPU cycles and limiting the animation frame rate.

What is Binary Code Modulation?

Binary Code Modulation takes a fundamentally different approach:

  • Instead of equal-width time slices used in PWM, BCM uses time slices with binary-weighted durations (1, 2, 4, 8, 16, 32, 64, 128 units of time)
  • Each bit in the brightness value controls whether the LED is on or off during its corresponding time slice
  • The total perceived brightness is the sum of all the active time slices

The Game-Changing Port Manipulation Advantage

The biggest performance advantage came from BCM’s ability to manipulate entire port registers at once:

// Update all ports with their current timeslice, preserving unused pins
PORTB = (PORTB & ~PORT_B_MASK) | (g_timeslice_b[s_bitpos] & PORT_B_MASK);
PORTC = (PORTC & ~PORT_C_MASK) | (g_timeslice_c[s_bitpos] & PORT_C_MASK);
PORTD = (PORTD & ~PORT_D_MASK) | (g_timeslice_d[s_bitpos] & PORT_D_MASK);

With just three operations, I can update all 18 LEDs simultaneously, regardless of how many LEDs are on each port. This is drastically more efficient than individually toggling each pin, which would require at least 18 separate operations.

To make this work, I pre-compute the LED states for each bit position and store them in lookup tables:

// For each LED
for (uint8_t led = 0; led < NUM_LEDS; led++) {
  // For each bit position in the brightness value
  for (uint8_t bitpos = 0, bitmask = 1; bitpos < 8; bitpos++, bitmask <<= 1) {
    if (intensity[led] & bitmask) {
      // If this bit is set, turn on the LED for this timeslice
      uint8_t pin_mask = ~bit_mask_cache(LED_MAP[led].pin);  // Inverted because LOW = ON
      g_timeslice[LED_MAP[led].port][bitpos] &= pin_mask;
    }
  }
}

Performance Results of the Refactoring

The performance improvement from this refactoring was dramatic:

  1. Frame rate: The animation frame rate looks much smoother on camera
  2. CPU usage: The MCU now had plenty of cycles left for the fire simulation itself
  3. Visual quality: The fire effect looked more realistic, with smoother gradients and transitions

Bonus: Better on Camera

BCM’s variable-length time slices create a more irregular pattern that’s less likely to interfere with camera frame rates, resulting in more natural-looking fire effects in photos and videos.

BCM Implementation in Hardware

In my implementation, I used Timer2 in the ATmega328p to generate the precisely timed interrupts needed for BCM:

ISR(TIMER2_COMPA_vect) {
  static uint8_t s_bitpos = 0;

  s_bitpos++;
  s_bitpos &= 7;  // Cycle through bit positions 0-7

  // Update all ports with their current timeslice
  PORTB = (PORTB & ~PORT_B_MASK) | (g_timeslice_b[s_bitpos] & PORT_B_MASK);
  PORTC = (PORTC & ~PORT_C_MASK) | (g_timeslice_c[s_bitpos] & PORT_C_MASK);
  PORTD = (PORTD & ~PORT_D_MASK) | (g_timeslice_d[s_bitpos] & PORT_D_MASK);

  // Double OCR2A for the next time slice to implement Binary Code Modulation.
  // Time slice durations will be proportional to 2^0, 2^1, 2^2, ..., 2^7
  OCR2A <<= 1;
  if (s_bitpos == 0) {
    OCR2A = 1;  // Reset OCR2A to 1 at the start of each 8-bit BCM cycle
  }
}

The trick is in the line OCR2A <<= 1, which doubles the duration of each successive time slice. The output compare register (OCR2A) determines how long each time slice lasts, creating the binary-weighted pattern.

So the timer interrupt service routine actually modifies its own period each iteration.

Calculating BCM Parameters

One tricky part was calculating the right prescaler and timing values to ensure the BCM refresh rate was fast enough to avoid visible flickering. I wrote helper Python scripts for offline calculations, such as to analyze different combinations:

def calculate_prescaler(f_cpu, animation_ms, target_cycles):
    bcm_ticks = 263  # Sum of all BCM time slice durations (1+2+4+8+16+32+64+128+1)
    ideal_prescaler = (f_cpu / 1000 * animation_ms) / (target_cycles * bcm_ticks)

For my setup with a 2MHz clock, I found that a prescaler of 64 gave the best balance of BCM cycles per animation frame (approximately 15) with an effective LED refresh rate of about 118.8Hz.

Accelerometer Integration and Angle Calculation

For this project I used the LIS3DH accelerometer. The computational challenge here was converting the raw accelerometer data into a usable angle. I didn’t want to use floating-point math on the AVR (too slow and memory-hungry), so I used a fixed-point arctangent function:

uint16_t get_angle(bool reset) {
  // ...
  s_angle = fxpt_atan2(get_filter_value(s_x_filter), get_filter_value(s_y_filter)) + ACCEL_ANGLE_OFFSET;
  // ...
  return s_angle;
}

The fixed-point fxpt_atan2() implementation was adapted from an open-source algorithm, providing good accuracy with much better performance than floating-point math. This frees up processor cycles.

Tap Detection

I wanted users to be able to “spark” the fire by tapping the device. After trying several approaches, I settled on using the accelerometer’s built-in interrupt capabilities:

// Configure INT1_CFG: Enable XHIE, YHIE, ZHIE with OR logic
send_to_accel(LIS3DH_INT1_CFG, lis3dh_reg_t{.int1_cfg = {.xlie = 0,
                                                      .xhie = 1,
                                                      .ylie = 0,
                                                      .yhie = 1,
                                                      .zlie = 0,
                                                      .zhie = 1,
                                                      ._6d = 0,
                                                      .aoi = 0}});

This approach triggers an interrupt when acceleration exceeds a threshold in any direction, which I can interpret as a tap. The interrupt line from the accelerometer connects to the ATmega’s INT0 pin, and I set up an interrupt handler to process the taps. This also requires virtually no signal processing on the microcontroller.

Conclusion

This project combined several interesting challenges: creating a convincing physical simulation, implementing efficient brightness control, working with sensors, and optimizing for constrained hardware.