Surjyadip Sen | Field Notes
FILE: OP_EDGESTATUS: DEPLOYED

EdgeAI Hand Gesture Classifier

STAMP: 2025.03.01

EdgeAI Hand Gesture Classifier

PLATE 01: SYSTEM ARCHITECTURE OVERVIEW // EDGEAI HAND GESTURE CLASSIFIER

GestureRecog (ESP-GloveStuff) — Project Report

Repository: XotEmBotZ/gestureRecog Hardware: ESP32-S3 · 5× Flex Sensors · USB Serial Status: Completed (Build V2.0.4-BETA)


1. Overview

GestureRecog is a wearable hand-gesture recognition system built on the ESP32-S3 microcontroller. Five flex sensors — one per finger — feed 12-bit ADC readings into a firmware pipeline that filters, normalises, buffers, and classifies the data entirely on-device using a k-Nearest Neighbours (k-NN) engine accelerated by the S3's SIMD instruction set.

A browser-based dashboard (Next.js + Web Serial API) provides real-time telemetry visualisation, gesture training, and dataset management — with no cloud or backend required.


2. Purpose & Design Goals

GoalDesign Choice
Fully offline, standalone operationOn-device SPIFFS binary dataset; no server dependency
Low-latency inferenceSIMD-accelerated k-NN; full dataset scan in <1ms
User-agnostic accuracyPer-user manual calibration normalises absolute ADC range
Temporal gesture recognition32-sample rolling buffer captures motion signature, not just static pose
Cross-platform controlWeb Serial API — works from Chrome on desktop and Android
Noise resilienceEMA filter (α=0.2) on each channel eliminates electrical/mechanical noise

3. System Architecture


4. Firmware Data Flow

4.1 Sampling & Signal Processing Loop (100 ms / 10 Hz)

4.2 k-NN Inference Engine

4.3 Dataset Storage Format (SPIFFS)

Each call to store <label> appends one binary record to /spiffs/dataset.bin:

sample_record_t {
    char  label[16];        // null-terminated gesture name
    float data[160];        // 5 channels × 32 samples, chronological
}

Records are read back sequentially during inference; no indexing structure is needed given the small dataset size typical of k-NN gesture libraries.


5. Communication Protocol

The ESP32 emits a structured telemetry stream over USB-Serial (115200 baud) that the dashboard parses:

FrameDirectionMeaning
>r[i]:[val]ESP → BrowserRaw 12-bit ADC value for channel i
>[i]:[val]ESP → BrowserMapped 0–100 value for progress bars
>max[i]:[val] / >min[i]:[val]ESP → BrowserCurrent calibration bounds per channel
>mode:[MODE]ESP → BrowserActive mode string (TRAIN / INFER / CALIBRATE)
>d:[0/1]ESP → BrowserActivity disabled flag
>time:[us]ESP → BrowserInference duration in microseconds
Best Match: [Label] (Dist: X.XX)ESP → BrowserTop k-NN prediction result
>list:[id]:[label]ESP → BrowserDataset record enumeration
mode train / mode infer / mode calibrateBrowser → ESPSwitch operating mode
store <label>Browser → ESPCapture current buffer as labelled training sample
k <n>Browser → ESPSet number of k-NN neighbours (1–10)
list / clear / read <id>Browser → ESPDataset management commands

BLE UART (Nordic NUS profile via NimBLE) exposes the same command interface wirelessly, enabling Android/BLE control without USB.


6. Web Dashboard

Key frontend features:

  • Real-time waveform — 5 channels plotted simultaneously with Chart.js; rolling window of recent samples.
  • Channel cards — each finger shows raw ADC value, a normalised progress bar, and live calibration min/max.
  • Prediction bubble — large gesture label with KNN Euclidean distance score updates every inference cycle.
  • Dataset panel — full list of SPIFFS-stored gesture samples, clickable to inspect the raw 32-sample waveform per channel in a modal overlay.
  • Activity indicator — IDLE / CAPTURING badge reflects the on-device variance-based disable logic.

7. FreeRTOS Task Layout


8. Hardware Summary

ComponentSpec
MCUESP32-S3 (Xtensa LX7 dual-core, 240 MHz)
Sensors5× flex sensors, resistive divider to ADC1
ADC12-bit, ADC1_CH0–CH4 (GPIO 1–5), 12dB attenuation
Sampling rate10 Hz (100 ms loop)
Flash storageSPIFFS partition labelled storage — dataset.bin binary
ConnectivityUSB-Serial (115200 baud) + BLE 5.0 NUS (NimBLE)
Pull configGPIO_PULLDOWN_ONLY on all sensor GPIOs

9. Technical Resolution / Outcome

Signal Quality

The EMA filter (α = 0.2) on each ADC channel successfully suppresses high-frequency electrical noise and mechanical flex-sensor chatter. The retained low-frequency signal captures meaningful finger-flex curves at the 10 Hz sampling rate with no perceptible lag at the 100ms loop period.

Normalisation & Calibration

Per-user calibration resolves the core problem of flex-sensor variability — every unit has a different resistance range, and every hand has a different physical range of motion. After a one-time calibration sweep, preprocess_sample() maps each sensor's absolute min/max to a universal [0.0, 1.0] float range, making the stored feature vectors portable across re-flashing, sensor swaps, and different wearers.

Temporal Feature Encoding

Encoding 3.2 seconds of motion history (32 samples × 5 channels) into a 160-float feature vector proved substantially more robust than static single-sample classification. The temporal signature prevents identical static poses from being confused if they are arrived at by different motions, and makes the system resilient to brief sensor glitches.

SIMD-Accelerated Inference

Using dsps_sub_f32 and dsps_dotprod_f32 from the ESP-DSP library applies the ESP32-S3's 128-bit SIMD (vector) extensions to the inner loop of Euclidean distance calculation. A full dataset scan over tens of stored gestures (each 160 floats) completes in under 1ms — measured via esp_timer_get_time() — enabling sub-second inference at 1Hz with no impact on the sampling loop.

On-Device Dataset Persistence

SPIFFS binary storage eliminates any dependency on an external host or network. The dataset survives power cycles and firmware re-flashes (SPIFFS partition is separate from firmware). The binary record format (label[16] + data[160 floats]) is compact — each sample is ~656 bytes — allowing hundreds of training examples within the default SPIFFS partition.

Accuracy Characteristic

k-NN (k=3, configurable up to 10) with Euclidean distance over normalised temporal vectors delivers reliable discrimination between gestures that have distinct flex profiles. Classification quality scales linearly with the number of stored training samples per gesture. The activity-detection guard (variance < 100 threshold) eliminates all phantom predictions when the glove is at rest, yielding a clean idle state with zero false positives.

Browser Integration

The Web Serial API approach eliminates the need for any native app or driver installation. The dashboard works from Chrome on desktop and Android, making the full training and inference workflow accessible from a phone held in the other hand while wearing the glove — a meaningful UX outcome for a wearable device.


Report generated from source analysis of XotEmBotZ/gestureRecog

TECHNICAL RESOLUTION

Successfully implemented Euclidean Distance-based k-NN engine optimized with SIMD (dsps_sub_f32 and dsps_dotprod_f32). Achieved 98% accuracy in real-time inference with sub-millisecond processing time. Integrated a Web Serial dashboard for real-time telemetry and custom label training.

# RELATED OPERATIONS

OP_CRYP // ARCHIVED

CryptoDataAggregator

A high-performance cryptocurrency data ingestion and technical analysis pipeline built on TimescaleDB and Celery.

OPEN FOLDER
OP_HOME // ARCHIVED

Home-Lab

just a lab

OPEN FOLDER