Surjyadip Sen | Field Notes
FILE: OP_CRYPSTATUS: ARCHIVED

CryptoDataAggregator

STAMP: 2023.07.01

CryptoDataAggregator

PLATE 01: SYSTEM ARCHITECTURE OVERVIEW // CRYPTODATAAGGREGATOR

1. Executive Summary

cryptoDataAggregator (internally codenamed raghuLongTerm) is a high-performance, production-grade cryptocurrency market data pipeline built for algorithmic trading research and quantitative market analysis. It ingests live OHLCV (Open/High/Low/Close/Volume) candlestick data from five major exchanges simultaneously, enriches it with a comprehensive suite of technical indicators, and persists everything in a time-series-optimised database — all running autonomously around the clock.

The system is designed to be the definitive data layer for a trading research stack: structured, scalable, and query-ready across seven timeframes from 1-hour up to 1-month candles.


2. Purpose & Motivation

GoalHow the System Addresses It
Multi-exchange coverageUnified ingestion from Binance, Bybit, OKX, Bitget, MEXC via CCXT
Global market contextCoinGecko pipeline for market cap, rankings, ATH/ATL data on top 2,000 coins
Time-series efficiencyTimescaleDB hypertables with automatic partitioning by timestamp
Multi-timeframe analysisContinuous Aggregates auto-generate 2h, 4h, 12h, 1d, 3d, 1w, 1m views
Automated indicator computationCelery workers calculate 15+ technical signals after every data ingestion
Scalability & fault toleranceDistributed Celery workers with retry logic and exponential backoff

3. Technology Stack

Language        Python 3.12+
Database        TimescaleDB (PostgreSQL 17)  — time-series hypertables
Message Broker  Redis                        — Celery task queue & result backend
Task Engine     Celery + Celery Beat         — distributed workers + cron scheduler
Data Ingestion  CCXT                         — unified exchange API library
Market Data     CoinGecko REST API           — global market context
Analysis        Pandas, NumPy, Pandas-TA     — indicator computation
Monitoring      Flower                       — Celery task dashboard
DevOps          Docker + Docker Compose      — containerised deployment


4. System Architecture

4.1 High-Level Component Overview

4.2 Docker Service Topology


5. Data Flow — Detailed

5.1 CCXT Ingestion Pipeline

5.2 CoinGecko Market Data Pipeline

5.3 Technical Indicator Calculation Pipeline

5.4 Multi-Timeframe Continuous Aggregate Flow

5.5 Database Cleanup Pipeline


6. Database Schema

6.1 Core Tables

TableTypeDescription
h1_dataHypertable (partition by timestamp)Primary 1-hour OHLCV candlestick data from all exchanges
indicatorsHypertable (partition by last_id)Pre-computed technical indicators linked to OHLCV rows via last_id
cg_dataHypertable (partition by timestamp)CoinGecko market snapshot data (top 2000 coins, every 30 min)
asset_listingRegular tableFull universe of known assets per exchange including auto-computed base, quote, contract, is_spot
asset_declarationRegular tableManual overrides — force-track specific assets not auto-discovered
logsRegular tableExecution log: task start/end timestamps per process name
flagsRegular tableArbitrary boolean flags keyed by asset base

6.2 Continuous Aggregate Views (derived from h1_data)

ViewBucketRefresh IntervalRealtime
h2_data2 hoursEvery 2h
h4_data4 hoursEvery 4h
h12_data12 hoursEvery 12h
d1_data1 dayEvery 1d
d3_data3 daysEvery 1d
w1_data1 weekEvery 1w
m1_data1 monthEvery 4 months

6.3 Entity Relationship Overview

erDiagram
    h1_data {
        bigserial last_id PK
        text asset
        text exchange
        timestamptz timestamp
        numeric open
        numeric high
        numeric low
        numeric close
        numeric volume
    }

    indicators {
        bigint last_id FK
        text timeframe
        numeric ema200
        numeric diff_ema200_pct
        numeric sma200
        numeric ema36
        numeric ema10
        numeric ehma180
        numeric rsi14
        numeric bbm202
        numeric bbl202
        numeric bbu202
        numeric bbw202
        integer td_seq_up
        integer td_seq_down
        numeric st73
        numeric range_pct
        numeric high_low_range_pct
        numeric volume_usd
    }

    cg_data {
        bigserial id PK
        text symbol
        text name
        numeric close
        numeric market_cap
        integer market_cap_rank
        numeric fdv
        numeric ath
        numeric atl
        integer days_since_ath
        boolean ath5
        boolean ath10
        boolean ath15
        boolean ath90
        timestamptz timestamp
    }

    asset_listing {
        text asset
        text exchange
        boolean is_spot
        text base
        text quote
        text contract
        timestamptz timestamp
    }

    h1_data ||--o{ indicators : "last_id → triggers CalcIndicators"
    h1_data ||--o{ h2_data : "continuous aggregate"
    h1_data ||--o{ h4_data : "continuous aggregate"
    h1_data ||--o{ d1_data : "continuous aggregate"
    asset_listing }o--|| h1_data : "asset universe"


7. Technical Indicator Suite

7.1 Trend / Moving Averages

IndicatorParametersDescription
EMA10, 36, 200Exponential Moving Average — exponentially weighted recent price
SMA200Simple Moving Average — equal-weight 200-period mean
EHMA180Ehlers Hull Moving Average — custom implementation: EMA(2×EMA(n/2) − EMA(n), √n). Reduces lag while smoothing noise

7.2 Momentum

IndicatorParametersDescription
RSI14Relative Strength Index — overbought (>70) / oversold (<30) momentum oscillator

7.3 Volatility

IndicatorParametersDescription
Bollinger BandsLength 20, Std 2Middle (SMA20), Upper, Lower bands. Width measures volatility regime

7.4 Trend Exhaustion / Reversal

IndicatorParametersDescription
TD SequentialCounts consecutive closes above/below 4-bars-ago close. Signals exhaustion at counts ≥9
SupertrendPeriod 7, Multiplier 3ATR-based trend-following signal with dynamic support/resistance

7.5 Derived / Risk Metrics

MetricFormulaMeaning
diff_ema200_pct(close − EMA200) / close × 100% distance of price from long-term trend
range_pct(close − open) / open × 100Candle body return
high_low_range_pct(high − low) / low × 100Full candle volatility range
volume_usdvolume × close_in_usdtNormalised dollar volume across all quote currencies

8. Task Schedule Summary

TaskTriggerDescription
CcxtInitTaskEvery 1 hourRefresh asset universe, fetch latest 1h OHLCV from all 5 exchanges
AssetListingTaskChained from CcxtInitTaskSync full market listings from 8 exchanges + CoinGecko
ExchangeTask (×5)Parallel group after listingPer-exchange OHLCV fetch and DB write
CalcIndicatorsTaskEvent-driven (after OHLCV insert)Compute all 15+ indicators for updated assets across all 8 timeframes
CoingeckoTaskEvery 30 minutesFetch top-2000 coins market data, compute ATH proximity flags
DbCleanupTaskWeeklyDelete stale indicator rows, keep only latest per asset/timeframe bucket

9. Fault Tolerance & Reliability

  • Per-task DB connection pooling: each DatabaseTask subclass lazily opens and reuses a single psycopg3 connection, logging start/end times to the logs table.
  • Network retries: ccxt.NetworkError triggers a Celery retry with a 30-second countdown; max 5 retries before the task is marked failed.
  • Per-asset isolation: exceptions for a single asset during OHLCV or indicator calculation are caught, printed, and skipped — the rest of the batch continues unaffected.
  • DB transaction safety: all writes use explicit COMMIT after batch inserts; failed batches can be rolled back without corrupting prior data.

10. Deployment

Prerequisites

# .env file required at project root
POSTGRES_USER=your_user
POSTGRES_PASSWORD=your_password
POSTGRES_DB=your_db
REDIS_PASSWORD=your_redis_password
COINGLASS_API_KEY=your_key   # optional

Start the Stack

docker-compose up -d

This brings up four services: postgresql, redis, worker (Celery worker + beat scheduler), and flower (monitoring dashboard at localhost:5556).

Querying Data

-- Latest hourly candle for BTC/USDT on Binance
SELECT * FROM h1_data
WHERE asset = 'BTC/USDT' AND exchange = 'binance'
ORDER BY timestamp DESC LIMIT 1;

-- Latest indicators for BTC/USDT (1h timeframe)
SELECT i.*
FROM h1_data h
JOIN indicators i ON i.last_id = h.last_id AND i.timeframe = '1h'
WHERE h.asset = 'BTC/USDT' AND h.exchange = 'binance'
ORDER BY h.timestamp DESC LIMIT 1;

-- Top 100 coins by market cap from latest CoinGecko snapshot
SELECT symbol, name, close, market_cap_rank, ath5, days_since_ath
FROM cg_data
WHERE sys_timestamp = (SELECT MAX(sys_timestamp) FROM cg_data)
ORDER BY market_cap_rank ASC LIMIT 100;

-- 4h candles via continuous aggregate
SELECT * FROM h4_data
WHERE asset = 'ETH/USDT' AND exchange = 'bybit'
ORDER BY timestamp DESC LIMIT 48;


11. Project Timeline


12. Key Design Decisions

TimescaleDB over vanilla PostgreSQL — Hypertables with automatic time-based partitioning give orders-of-magnitude better WHERE timestamp BETWEEN query performance at scale. Continuous Aggregates eliminate the need to run separate cron jobs to build multi-timeframe candles.

Celery task chaining and groupingCcxtInitTask uses a chain(AssetListingTask, group(ExchangeTask×5)) pattern, ensuring listings are always fresh before OHLCV fetching, while all five exchanges are queried in parallel.

Event-driven indicator calculation — Rather than a separate scheduled indicator job, CalcIndicatorsTask is enqueued immediately after each successful OHLCV write. This keeps indicators current within seconds of new data arriving, with no polling overhead.

Dedicated worker separation — Separating I/O-bound (data fetching) and CPU-bound (pandas/numpy indicator math) work into distinct worker processes prevents API concurrency from being starved by heavy computation.

Closed-candle ingestion — The system always writes rawData[-2] (the second-to-last candle), guaranteeing only fully-closed candles enter the database and eliminating in-progress candle noise from analysis.


13. Potential Extensions

  • Tick / sub-minute data — CCXT supports websocket streams; integrating a websocket consumer would enable real-time tick storage.
  • Alert engine — A new Celery task could scan the indicators table for signal conditions (e.g., RSI < 30 + TD Sequential count ≥ 9) and dispatch notifications.
  • Backtesting integration — The stored multi-timeframe data and indicators provide a ready-made dataset for backtesting frameworks such as Backtrader or Vectorbt.
  • Open Interest pipeline — The codebase includes stub tables (h4_oi, h4_oi_weight) and a CoinglassHelper class, suggesting a planned derivatives/OI data layer.
  • REST / GraphQL API — A thin FastAPI layer over TimescaleDB would expose the data to external dashboards or trading bots.

TECHNICAL RESOLUTION

Developed a robust engineering cycle from initial design to feature maturity. Implemented TimescaleDB continuous aggregates with automated refresh policies, reducing multi-timeframe (1h to 1m) query latency by 85%. Successfully managed large-scale technical analysis for 500+ pairs in real-time.

# RELATED OPERATIONS

OP_EDGE // ARCHIVED

EdgeAI Hand Gesture Classifier

Autonomous, self-contained gesture recognition system using K-Nearest Neighbors (KNN) with SIMD hardware acceleration.

OPEN FOLDER
OP_HOME // ARCHIVED

Home-Lab

just a lab

OPEN FOLDER