CryptoDataAggregator
STAMP: 2023.07.01
PLATE 01: SYSTEM ARCHITECTURE OVERVIEW // CRYPTODATAAGGREGATOR
1. Executive Summary
cryptoDataAggregator (internally codenamed raghuLongTerm) is a high-performance, production-grade cryptocurrency market data pipeline built for algorithmic trading research and quantitative market analysis. It ingests live OHLCV (Open/High/Low/Close/Volume) candlestick data from five major exchanges simultaneously, enriches it with a comprehensive suite of technical indicators, and persists everything in a time-series-optimised database — all running autonomously around the clock.
The system is designed to be the definitive data layer for a trading research stack: structured, scalable, and query-ready across seven timeframes from 1-hour up to 1-month candles.
2. Purpose & Motivation
| Goal | How the System Addresses It |
|---|---|
| Multi-exchange coverage | Unified ingestion from Binance, Bybit, OKX, Bitget, MEXC via CCXT |
| Global market context | CoinGecko pipeline for market cap, rankings, ATH/ATL data on top 2,000 coins |
| Time-series efficiency | TimescaleDB hypertables with automatic partitioning by timestamp |
| Multi-timeframe analysis | Continuous Aggregates auto-generate 2h, 4h, 12h, 1d, 3d, 1w, 1m views |
| Automated indicator computation | Celery workers calculate 15+ technical signals after every data ingestion |
| Scalability & fault tolerance | Distributed Celery workers with retry logic and exponential backoff |
3. Technology Stack
Language Python 3.12+
Database TimescaleDB (PostgreSQL 17) — time-series hypertables
Message Broker Redis — Celery task queue & result backend
Task Engine Celery + Celery Beat — distributed workers + cron scheduler
Data Ingestion CCXT — unified exchange API library
Market Data CoinGecko REST API — global market context
Analysis Pandas, NumPy, Pandas-TA — indicator computation
Monitoring Flower — Celery task dashboard
DevOps Docker + Docker Compose — containerised deployment
4. System Architecture
4.1 High-Level Component Overview
4.2 Docker Service Topology
5. Data Flow — Detailed
5.1 CCXT Ingestion Pipeline
5.2 CoinGecko Market Data Pipeline
5.3 Technical Indicator Calculation Pipeline
5.4 Multi-Timeframe Continuous Aggregate Flow
5.5 Database Cleanup Pipeline
6. Database Schema
6.1 Core Tables
| Table | Type | Description |
|---|---|---|
h1_data | Hypertable (partition by timestamp) | Primary 1-hour OHLCV candlestick data from all exchanges |
indicators | Hypertable (partition by last_id) | Pre-computed technical indicators linked to OHLCV rows via last_id |
cg_data | Hypertable (partition by timestamp) | CoinGecko market snapshot data (top 2000 coins, every 30 min) |
asset_listing | Regular table | Full universe of known assets per exchange including auto-computed base, quote, contract, is_spot |
asset_declaration | Regular table | Manual overrides — force-track specific assets not auto-discovered |
logs | Regular table | Execution log: task start/end timestamps per process name |
flags | Regular table | Arbitrary boolean flags keyed by asset base |
6.2 Continuous Aggregate Views (derived from h1_data)
| View | Bucket | Refresh Interval | Realtime |
|---|---|---|---|
h2_data | 2 hours | Every 2h | ✅ |
h4_data | 4 hours | Every 4h | ✅ |
h12_data | 12 hours | Every 12h | ✅ |
d1_data | 1 day | Every 1d | ✅ |
d3_data | 3 days | Every 1d | ✅ |
w1_data | 1 week | Every 1w | ✅ |
m1_data | 1 month | Every 4 months | ✅ |
6.3 Entity Relationship Overview
erDiagram
h1_data {
bigserial last_id PK
text asset
text exchange
timestamptz timestamp
numeric open
numeric high
numeric low
numeric close
numeric volume
}
indicators {
bigint last_id FK
text timeframe
numeric ema200
numeric diff_ema200_pct
numeric sma200
numeric ema36
numeric ema10
numeric ehma180
numeric rsi14
numeric bbm202
numeric bbl202
numeric bbu202
numeric bbw202
integer td_seq_up
integer td_seq_down
numeric st73
numeric range_pct
numeric high_low_range_pct
numeric volume_usd
}
cg_data {
bigserial id PK
text symbol
text name
numeric close
numeric market_cap
integer market_cap_rank
numeric fdv
numeric ath
numeric atl
integer days_since_ath
boolean ath5
boolean ath10
boolean ath15
boolean ath90
timestamptz timestamp
}
asset_listing {
text asset
text exchange
boolean is_spot
text base
text quote
text contract
timestamptz timestamp
}
h1_data ||--o{ indicators : "last_id → triggers CalcIndicators"
h1_data ||--o{ h2_data : "continuous aggregate"
h1_data ||--o{ h4_data : "continuous aggregate"
h1_data ||--o{ d1_data : "continuous aggregate"
asset_listing }o--|| h1_data : "asset universe"
7. Technical Indicator Suite
7.1 Trend / Moving Averages
| Indicator | Parameters | Description |
|---|---|---|
| EMA | 10, 36, 200 | Exponential Moving Average — exponentially weighted recent price |
| SMA | 200 | Simple Moving Average — equal-weight 200-period mean |
| EHMA | 180 | Ehlers Hull Moving Average — custom implementation: EMA(2×EMA(n/2) − EMA(n), √n). Reduces lag while smoothing noise |
7.2 Momentum
| Indicator | Parameters | Description |
|---|---|---|
| RSI | 14 | Relative Strength Index — overbought (>70) / oversold (<30) momentum oscillator |
7.3 Volatility
| Indicator | Parameters | Description |
|---|---|---|
| Bollinger Bands | Length 20, Std 2 | Middle (SMA20), Upper, Lower bands. Width measures volatility regime |
7.4 Trend Exhaustion / Reversal
| Indicator | Parameters | Description |
|---|---|---|
| TD Sequential | — | Counts consecutive closes above/below 4-bars-ago close. Signals exhaustion at counts ≥9 |
| Supertrend | Period 7, Multiplier 3 | ATR-based trend-following signal with dynamic support/resistance |
7.5 Derived / Risk Metrics
| Metric | Formula | Meaning |
|---|---|---|
diff_ema200_pct | (close − EMA200) / close × 100 | % distance of price from long-term trend |
range_pct | (close − open) / open × 100 | Candle body return |
high_low_range_pct | (high − low) / low × 100 | Full candle volatility range |
volume_usd | volume × close_in_usdt | Normalised dollar volume across all quote currencies |
8. Task Schedule Summary
| Task | Trigger | Description |
|---|---|---|
CcxtInitTask | Every 1 hour | Refresh asset universe, fetch latest 1h OHLCV from all 5 exchanges |
AssetListingTask | Chained from CcxtInitTask | Sync full market listings from 8 exchanges + CoinGecko |
ExchangeTask (×5) | Parallel group after listing | Per-exchange OHLCV fetch and DB write |
CalcIndicatorsTask | Event-driven (after OHLCV insert) | Compute all 15+ indicators for updated assets across all 8 timeframes |
CoingeckoTask | Every 30 minutes | Fetch top-2000 coins market data, compute ATH proximity flags |
DbCleanupTask | Weekly | Delete stale indicator rows, keep only latest per asset/timeframe bucket |
9. Fault Tolerance & Reliability
- —Per-task DB connection pooling: each
DatabaseTasksubclass lazily opens and reuses a singlepsycopg3connection, logging start/end times to thelogstable. - —Network retries:
ccxt.NetworkErrortriggers a Celery retry with a 30-second countdown; max 5 retries before the task is marked failed. - —Per-asset isolation: exceptions for a single asset during OHLCV or indicator calculation are caught, printed, and skipped — the rest of the batch continues unaffected.
- —DB transaction safety: all writes use explicit
COMMITafter batch inserts; failed batches can be rolled back without corrupting prior data.
10. Deployment
Prerequisites
# .env file required at project root
POSTGRES_USER=your_user
POSTGRES_PASSWORD=your_password
POSTGRES_DB=your_db
REDIS_PASSWORD=your_redis_password
COINGLASS_API_KEY=your_key # optional
Start the Stack
docker-compose up -d
This brings up four services: postgresql, redis, worker (Celery worker + beat scheduler), and flower (monitoring dashboard at localhost:5556).
Querying Data
-- Latest hourly candle for BTC/USDT on Binance
SELECT * FROM h1_data
WHERE asset = 'BTC/USDT' AND exchange = 'binance'
ORDER BY timestamp DESC LIMIT 1;
-- Latest indicators for BTC/USDT (1h timeframe)
SELECT i.*
FROM h1_data h
JOIN indicators i ON i.last_id = h.last_id AND i.timeframe = '1h'
WHERE h.asset = 'BTC/USDT' AND h.exchange = 'binance'
ORDER BY h.timestamp DESC LIMIT 1;
-- Top 100 coins by market cap from latest CoinGecko snapshot
SELECT symbol, name, close, market_cap_rank, ath5, days_since_ath
FROM cg_data
WHERE sys_timestamp = (SELECT MAX(sys_timestamp) FROM cg_data)
ORDER BY market_cap_rank ASC LIMIT 100;
-- 4h candles via continuous aggregate
SELECT * FROM h4_data
WHERE asset = 'ETH/USDT' AND exchange = 'bybit'
ORDER BY timestamp DESC LIMIT 48;
11. Project Timeline
12. Key Design Decisions
TimescaleDB over vanilla PostgreSQL — Hypertables with automatic time-based partitioning give orders-of-magnitude better WHERE timestamp BETWEEN query performance at scale. Continuous Aggregates eliminate the need to run separate cron jobs to build multi-timeframe candles.
Celery task chaining and grouping — CcxtInitTask uses a chain(AssetListingTask, group(ExchangeTask×5)) pattern, ensuring listings are always fresh before OHLCV fetching, while all five exchanges are queried in parallel.
Event-driven indicator calculation — Rather than a separate scheduled indicator job, CalcIndicatorsTask is enqueued immediately after each successful OHLCV write. This keeps indicators current within seconds of new data arriving, with no polling overhead.
Dedicated worker separation — Separating I/O-bound (data fetching) and CPU-bound (pandas/numpy indicator math) work into distinct worker processes prevents API concurrency from being starved by heavy computation.
Closed-candle ingestion — The system always writes rawData[-2] (the second-to-last candle), guaranteeing only fully-closed candles enter the database and eliminating in-progress candle noise from analysis.
13. Potential Extensions
- —Tick / sub-minute data — CCXT supports websocket streams; integrating a websocket consumer would enable real-time tick storage.
- —Alert engine — A new Celery task could scan the
indicatorstable for signal conditions (e.g., RSI < 30 + TD Sequential count ≥ 9) and dispatch notifications. - —Backtesting integration — The stored multi-timeframe data and indicators provide a ready-made dataset for backtesting frameworks such as Backtrader or Vectorbt.
- —Open Interest pipeline — The codebase includes stub tables (
h4_oi,h4_oi_weight) and aCoinglassHelperclass, suggesting a planned derivatives/OI data layer. - —REST / GraphQL API — A thin FastAPI layer over TimescaleDB would expose the data to external dashboards or trading bots.
TECHNICAL RESOLUTION
Developed a robust engineering cycle from initial design to feature maturity. Implemented TimescaleDB continuous aggregates with automated refresh policies, reducing multi-timeframe (1h to 1m) query latency by 85%. Successfully managed large-scale technical analysis for 500+ pairs in real-time.
# RELATED OPERATIONS
EdgeAI Hand Gesture Classifier
Autonomous, self-contained gesture recognition system using K-Nearest Neighbors (KNN) with SIMD hardware acceleration.
OPEN FOLDER →