Industrial ETL pipeline for financial data processing — built in Python with TDD, SOLID principles, and hexagonal architecture.
EXTRACT ← CSV files | Yahoo Finance REST API (live) | PostgreSQL ↓ DataFrame chunk (10,000 rows) CLEAN DataCleaner — remove invalid amounts and unknown currencies TRANSFORM CurrencyNormalizer — USD / GBP / CHF → EUR Deduplicator — remove duplicate transaction IDs DateStandardizer — normalize dates to ISO 8601 ↓ clean DataFrame chunk VALIDATE QualityEngine (5 rules) ├─ NotNullRule required fields must not be null ├─ PositiveAmountRule amount must be > 0 ├─ ValidCurrencyRule currency must be EUR/USD/GBP/CHF ├─ NoDuplicateRule transaction id must be unique └─ DateRangeRule date within acceptable range → quality score ≥ 80% to proceed ↓ validated DataFrame chunk LOAD PostgreSQLStorage — bulk insert via psycopg2 execute_values ON CONFLICT DO NOTHING — idempotent runs