Red heart-shaped LED light displaying two smaller illuminated hearts inside.

I’ve always wanted to make a wearable that displays the user’s heart rate, but didn’t want the complexity of having the user have to wear a device with an on-skin sensor and/or use bluetooth to communicate with such a sensor, the only way to really do it accurately.

I wondered if I could use a simple accelerometer to try to predict heart rate with any degree of accuracy.

I thought about the cost and complexity of using a microcontroller with an FPU (most don’t have one), which is not the kind I’m used to.

Instead I thought I could figure out some ways to use even a very under-spec’ed microcontroller without floating point math, based on some other things I had learned about smoothing variables.

The Technical Challenge

The constraints of our target system include:

  • ATtiny84a microcontroller: 8KB flash, 512 bytes RAM
  • No floating-point hardware: All calculations must use fixed-point arithmetic
  • Limited math instructions: No native integer division or multiplication, much less DSP instructions
  • Minimal power consumption: Required for wearable applications
  • Real-time processing: Heart rate estimation must occur continuously

These constraints eliminate conventional machine learning approaches like neural networks or complex statistical models. Instead, we must rely on carefully designed features and a lightweight linear model that can function within these limitations.

Data Collection and Analysis

A head-mounted device with a small LCD screen displaying text, connected to various circuit boards and wires.
the data collection system

I decided to collect training data using my own heart.

I built an NRF52 stack using adafruit dev boards and stuck it in an arm phone-holder intended for jogging. I pinned a LIS3DH accelerometer directly to my chest and wired it to the NRF52. The system board connected via Bluetooth to a Polar H9 chest strap that provided the ground truth heart rate data. Surprisingly, the bluetooth code worked on the first try. The accelerometer and chest strap data were logged to a microSD card.

I added a small OLED screen with a few small buttons to be able to control the device in the field.

I logged data from a variety of activities – running flat out, jogging, running and walking up and down stairs, standing around, sitting on the couch, etc.

The Power of Fixed-Point Arithmetic in Ultra-Constrained Environments

One of the most important aspects of this project is avoiding floating-point calculations using using only fixed-point integer arithmetic. (Remember, our target microcontroller does not support floating point or integer division.) We can achieve simple but surprisingly robust signal processing using only a few tricks of integer math.

Fixed-Point Exponentially Weighted Moving Averages

The cornerstone of the system is the fixed-point implementation of exponentially weighted moving averages (EWMA). While these are expressed more straight-forwardly using multiplication by a fractional alpha value (requiring floating-point), our implementation uses only bit shifts and integer addition:

void smoothInt(uint16_t sample, uint8_t bits, int32_t *filter) {
  int32_t local_sample = ((int32_t) sample) << 16;
  *filter += (local_sample - *filter) >> bits;
}

int16_t getFilterValue(int32_t filter) {
  return (int16_t)((filter + 0x8000) >> 16);
}

This translates the EWMA formula α·x + (1-α)·previous into fixed-point operations. The bits parameter acts as our alpha value (where α = 1/2^bits), and the shift operations handle the fractional arithmetic. This approach is not only computationally efficient but also numerically stable, as it avoids the accumulation of rounding errors.

Linear Regression with Fixed-Point Math

With our understanding of fixed-point arithmetic, we can now see how the linear regression model is implemented:

uint32_t getPrediction() {
  return (
    + 259
    + scale16(getFilterValue(mov10), 808)
    + scale16(getFilterValue(mov12), 1193)
    + scale16(getFilterValue(mov06), 619)
    // Additional terms...
  ) << 8;
}

This represents a linear combination of features (Y = β₀ + β₁X₁ + β₂X₂ + …) implemented entirely with fixed-point arithmetic:

  1. Each coefficient (like 808, 1193, 619) is a fixed-point representation of the floating-point coefficient from our trained model
  2. The scale16() function performs fixed-point multiplication using bit shifts and addition
  3. The final << 8 adjusts the scale of the result to get our heart rate output (which will be truncated later)

Features

I decided to use only the absolute magnitude of the vector of acceleration rather than any of its components. The latter would triple the number of features I would need to use and it would introduce variance based on how the device was arbitrarily tilted. That is, if the device slightly oriented differently different weights would be learned for X and Y for those examples despite the difference being completely arbitrary. I would need vastly more training data to compensate.

I considered separating the Z component but decided against it for simplicity.

I reviewed an example of another project where the designer tried to isolate the acceleration in the direction of gravity (i.e. up and down). I couldn’t find a convincing reason why this was better than total acceleration.

Some somewhat more advanced accelerometers have some built-in signal processing features for things step counters (i.e. for pedometers), the rate of which could clearly be used as a feature, but the accelerometer I used doesn’t have this.

The frequency of the accelerometer as I understand it does not matter. Accelerometers have a configurable frequency of say 1hz to 400hz. All that happens is that it samples more or less often for the intended sake of power management, but the samples do not affect each other (as long as any internal high-pass filters are not activated).

Low-pass filters

The most basic feature is

smoothInt(acceleration, N);

Where features in this family have different N values to impelement a variety of periods. Short periods should be high variance and reactive and long periods are more stable but react more slowly.

High-pass filters

I considered implementing features using high-pass filters, but rejected the idea. The definition of a high-pass filter is just current_value - low_pass_filter(current_value). In practice this should remove the gravity component from the acceleration vector.

However:

  • The momentary acceleration value will end up only affecting the heart rate prediction for that particular moment. And momentary acceleration is extremely noisy.
  • To counteract this, we should instead subtract low pass filters with different periods: a slow period vs a long period.
  • If I just make a bunch of low pass filters with different periods and throw it at linear regression, that’s what it will end up doing anyway.

So I did not implement any high-pass filters directly but instead just created a bunch of low-pass filters and let linear regression figure it out, which should accomplish the same thing.

2. Time-since features

As described above, our efficient low-pass filtering implementation forms the basis for many features:

One key feature tracks the time between signal crossings of a moving average:

if (acceleration > avg + crossing_bias) {
  if (crossing_bias >= 0) {
    smoothInt(millis() - timeSinceCross, 5, &avgTimeSinceCross);
    timeSinceCross = millis();
  }
  crossing_bias = -(avg >> 3);
} else {
  crossing_bias = (avg >> 3);
}

This creates a feature that captures rhythm in movement, which often correlates with heart rate during activities like walking or running.

3. Acceleration Buckets

Another elegant application of exponentially weighted moving averages is the implementation of “acceleration buckets” - a lightweight alternative to histograms that tracks how long acceleration stays within specific ranges:

uint8_t found_bucket = 6;  // default to top bucket
for (uint8_t i = 0; i < 6; ++i) {
  if (acceleration < bucket_limits[i]) {
    found_bucket = i;
    break;
  }
}

for (uint8_t i = 0; i < 7; ++i) {
  if (i == found_bucket) {
    bucketSmoothUp(&(buckets[i]));
  } else {
    bucketSmoothDown(&(buckets[i]));
  }
}

Where the smoothing functions are simply:

void bucketSmoothUp(long *filter) {
  smoothInt(1000, 6, filter);  // Apply EWMA with value 1000
}

void bucketSmoothDown(long *filter) {
  smoothInt(0, 6, filter);     // Apply EWMA with value 0
}

This is a reuse of our core EWMA technique - when acceleration falls within a bucket’s range, we feed a high value (1000) to that bucket’s filter and zero to all others. The result gives us a measure of how consistently the acceleration stays within each range, using the exact same computational mechanism as our other features. These buckets effectively capture the distribution of acceleration values over time using minimal memory and consistent mathematical techniques.

4. Peak Detection

I also experimented with a basic peak detection algorithm identifies local maxima and minima in the filtered acceleration signal:

if ((t2_to_t1, t1_to_t) == (1, -1)) {  // we peaked and are now falling
  peak = 1;  // maxima
} else if ((t2_to_t1, t1_to_t) == (-1, 1)) {  // we were falling and are now increasing
  peak = -1;  // minima
} else {
  peak = 0;
}

Combined with time tracking between peaks, this creates features that correlate with rhythmic body movements.

Model Training and Fixed-Point Conversion

From the beginning, linear regression was the clear choice for our constrained hardware. The need for minimal computational requirements and memory usage made more complex models impractical. After experimenting with various feature combinations in Python, we trained a simple linear regression model that could translate our carefully engineered features into heart rate estimations.

Understanding Fixed-Point Arithmetic

Machine learning frameworks generally operate with floating-point numbers, but many microcontrollers lack floating-point hardware. Instead we use fixed-point arithmetic where we essentially scale our values by a constant factor and work with integers.

In our 16.8 format:

  • 16 bits are allocated for the integer part
  • 8 bits are allocated for the fractional part

This means we multiply our floating-point values by 2^8 (256) and store them as integers. For example:

  • The floating-point value 1.5 becomes 1.5 × 256 = 384 (integer)
  • The floating-point value 0.25 becomes 0.25 × 256 = 64 (integer)

When we need to multiply two fixed-point numbers, we have to account for the scaling factor:

  1. Multiply the integers (which gives us a result scaled by 2^16)
  2. Shift right by 8 bits to get back to our 16.8 format (scaled by 2^8)

For example, to multiply 1.5 × 0.25 in fixed-point:

  1. 384 × 64 = 24,576
  2. 24,576 » 8 = 96
  3. 96 ÷ 256 = 0.375 (which is the correct result of 1.5 × 0.25)

This is why our scale functions look like this:

// Multiplies a 16.8 fixed-point number by 0.0763 (approx 808/2^13)
int32_t SCALE_808(uint32_t x) { 
  return ((x << 9) + (x << 8) + (x << 3)) >> 8; 
}

Instead of directly multiplying by a fractional coefficient, we decompose the coefficient into powers of 2 that can be implemented with bit shifts, then add them together.

The value 808 represents approximately 0.0763 × 2^13 (scaled to work with our bit shift approach). The bit shifts effectively compute:

  • (x × 2^9) + (x × 2^8) + (x × 2^3)
  • = x × (2^9 + 2^8 + 2^3)
  • = x × 808
  • Then we shift right by 8 to return to our 16.8 format

This approach avoids expensive multiplication and division operations while maintaining sufficient precision for our model.

Implementing Fixed-Point Multiplication with scale16

I wrote code to “compile” coefficients into shift-and-add formulas like SCALE_808, the actual implementation uses a more general approach with a function called scale16 borrowed from FastLED. This function handles all coefficients through a single interface:

int32_t scale16(int16_t value, uint16_t coefficient) {
  return ((int32_t)value * coefficient) >> 8;
}

This approach:

  • Promotes the 16-bit value to 32 bits to prevent overflow
  • Multiplies by the fixed-point coefficient
  • Shifts right by 8 to adjust back to the proper fixed-point format

The coefficients (like 808, 1193, etc.) are stored as 16-bit integers representing the value multiplied by 2^8. For example, the coefficient 808 represents approximately 0.0763 × 2^8 = 19.5328, rounded to 808 ÷ 42 = 19.2381.

While we could have generated specific functions for each coefficient (like the SCALE_808 approach), using a generic scale16 function resulted in cleaner, more maintainable code with less complexity. In theory the compiler might be able to compile multiplication operations in the same way if it chooses to inline the scale16 function.

Feature Engineering for Constrained Systems

Building on our fixed-point approach, the feature engineering was designed specifically for extremely limited computational resources:

Implementation and Visualization

The final implementation drove an LED display representing the heart in different segments, with each segment corresponding to walls in the four chambers of the heart. The brightness and animation speed were controlled by the estimated heart rate. The entire system, including feature extraction, model inference, and display control, fits within the extremely tight constraints of the ATtiny84.

void setBrightnesses(uint32_t incrementor, uint8_t fadeOut) {
  static uint32_t lastMillis = 0;
  static fixed16_16 timing = { .full = 0 };

  uint16_t diff = millis() - lastMillis;
  if (!diff) return;
  
  lastMillis += diff;
  for (uint8_t i = 0; i < diff; ++i) {
    timing.full += incrementor;
  }
  
  while (timing.part.integer > 800) {
    timing.part.integer -= 800;
  }
  
  uint16_t& tnow = timing.part.integer;
  
  // Control different heart segments with appropriate timing
  if (tnow <= 255) {
    setSectionBrightnessLA(ease(tnow));
  }
  if (tnow >= 50 && tnow <= 305) {
    setSectionBrightnessRA(ease(tnow - 50));
  }
  // More sections...
}

Results and Limitations

The system achieves a reasonable heart rate approximation, especially during rhythmic activities like walking or running. The correlation between predicted and actual heart rate was strongest during these activities, with an R² value of approximately 0.7-0.8.

The primary limitations include:

  1. Lower accuracy during irregular movements or when transitioning between activities
  2. Reduced performance at extreme heart rates (very low or very high)
  3. Need for a calibration period to allow the features to stabilize

I found the predictions tend to under-react, and the heart rate is resistant to change from the baseline. That is, if you shook the device, creating a lot of acceleration, the heart rate wouldn’t change much; you had to shake it for a sustained period of like a minute to see a noticeable change. This was disappointing because you want a wearable to be maximally reactive in order to be fun. This might either accurately relfect the way that heart rate tends to change over time, it might be because the model is being conservative, or it might be that you can’t really visually tell the difference with a range of moderate heart rates of like 70-90 bpm.

Despite these limitations, the system demonstrates that useful machine learning can be implemented on extremely constrained hardware with thoughtful feature engineering and implementation.

Conclusion

  1. Fixed-point arithmetic is surprisingly powerful: Even sophisticated mathematical techniques like exponentially weighted moving averages and linear regression can be implemented efficiently without floating-point operations
  2. Feature engineering matters: Carefully designed features can reduce model complexity
  3. Bit manipulation is powerful: Shifts and adds can replace more expensive operations
  4. Linear models can be sufficient: Complex models aren’t always necessary
  5. Memory optimization is crucial: Every byte counts in ultra-constrained environments

I loved the idea that I could give people a wearable that would emulate what my heart would do if I were them. I love creating things that have some emotional meaning rather than just a list of technical features. I think a wearable should do one thing and do it well that should be instantly easy to understand.

I have a separate project page for the project here with more photos.