172 million rows per second. 5 queries. 167M rows. Under 1 second — on a single machine. I connect directly to your S3, Azure Blob, or GCP bucket and deliver results — no cluster, no vendor lock-in, no $15k/month bill.
48 Apache Parquet files — every NYC Yellow Cab trip from January 2022 through December 2025. Five analytical queries. Cold NVMe. One consumer workstation. No JVM. No executor heap. No cluster. No warmup.
| Query | Description | Time |
|---|---|---|
| Q1 — Row Count | COUNT(*) per year across all 4 years | 29ms |
| Q2 — Fare & Distance YoY | 6 aggregates (avg fare, distance, tip, total, passengers, trips) per year · 143M filtered rows · full column decompression | 401ms |
| Q3 — Monthly Pivot | Trip volume by year×month — 48 cells · native DuckDB PIVOT · 2025 hottest year on record | 312ms |
| Q4 — Payment Type Shift | Cash collapse: 19.6% → 9.6% · Credit card peak 2023 at 77.9% · full 4-year scan | 158ms |
| Q5 — CBD Congestion Fee | 2025-only schema column · NYC congestion pricing live Jan 2025 · 72.8% of trips charged · $25.03M captured | 64ms |
| Total | 167,858,646 rows · 48 Parquet files · ~685M total row-scans across all 5 queries combined | 971ms |
DuckDB is CPU + NVMe bound. The workstation number is measured and verified. Bare-metal and cloud NVMe figures are informed estimates based on published hardware specs — not yet run.
| Configuration | Hardware | DuckDB Throughput | Est. Monthly Cost |
|---|---|---|---|
| Workstation (measured ✓) | Intel Ultra 7 265KF · NVMe · 64GB RAM | 172M rows/sec · 971ms | $0 (owned) |
| Hetzner AX102 bare-metal | AMD Ryzen 9950X · NVMe · 192GB RAM | ~200–250M rows/sec (est.) | ~$250/mo |
| AWS i4i.4xlarge | Intel Xeon · 3.75TB local NVMe SSD · 128GB RAM | ~250–350M rows/sec (est.) | ~$900/mo on-demand |
| Databricks cluster | Spark · JVM heap · DBU licensing · S3 | ~6–40M rows/sec (measured Spark baseline: 6.88M) | $5,000–$25,000/mo |
Workstation result is verified — measured 2026-04-08 on local NVMe, cold reads, no warmup. Hetzner and AWS i4i estimates based on published NVMe throughput specs vs measured workstation baseline. Spark baseline of 6.88M rows/sec measured on identical workstation hardware, 50GB heap pre-warmed.
Queried live from 48 Parquet files by DuckDB. 167,858,646 rows. Four charts. All computed on a single workstation in 971ms.
Data: NYC TLC Trip Record Data (open dataset) · Engine: DuckDB 1.4.4 · Hardware: Intel Ultra 7 265KF · NVMe
You grant scoped read access to your cloud storage. I run DuckDB on a high-performance node, deliver results as Parquet, CSV, or a live dashboard — then recommend the right long-term architecture for your data volume and budget.
Most analytics workloads at mid-market companies fit on a single modern server. DuckDB processes columnar Parquet in-process — no serialization, no cluster coordination, no $15k/month bill.
| Capability | Databricks / Spark | DuckDB (single node) |
|---|---|---|
| 685M row scan + 5 queries | $8–40 cluster cost per run | $0 — local process |
| Read from S3 / Azure / GCP | Yes (cluster required) | Yes — httpfs / azure / gcs extension, no cluster |
| Monthly platform cost | $3,000–$25,000+ | $20–200 (VPS or local workstation) |
| Time to first query | 5–15 min (cluster startup) | < 1 second |
| SQL compatibility | SparkSQL (HiveQL dialect) | Standard SQL + PIVOT, ASOF JOIN, LIST agg, UNPIVOT |
| Python / Streamlit integration | PySpark (heavyweight) | Native Python API — duckdb.query(sql).df() |
| Operational complexity | Cluster mgmt, DBUs, autoscale, networking | Zero — one binary, embed anywhere |
skr8tr is a sovereign, masterless distributed systems framework built on post-quantum cryptography. Every node authenticates via ML-DSA-65 signed tokens — no certificate authorities, no central broker, no single point of failure. Commands propagate across a UDP mesh, each packet carrying a post-quantum signature verified on arrival. Designed from first principles for air-gapped, regulated, and adversarial environments where you cannot afford to trust the network.
Five queries. Real data from the NYC TLC open dataset. No proprietary runtime. Run it yourself.
Full benchmark: scripts/benchmark_nyc_4year.sh
Remote consulting. Fixed-scope engagements. AWS, Azure, and GCP. Results you can verify.
Review your Databricks / Synapse / BigQuery spend. Identify workloads that move to DuckDB on a single node. Written cost-reduction plan within 48 hours.
End-to-end columnar pipeline — ingest from S3, Azure Blob, or GCP Storage, transform with DuckDB SQL, write results back to your bucket. Reproducible scripts you own.
Interactive analytics dashboard backed by DuckDB. Reads live from your cloud storage. Deployed on a $20/month VPS or your existing infra. Zero cluster dependency.
S3 + Glue + Athena + DuckDB hybrid pipelines. Leveraging AWS Solutions Architect experience to build data lakes that scale without surprise bills.
ADLS Gen2, Azure Blob, Synapse, and DuckDB integration. Migrate heavyweight Spark jobs to single-node DuckDB where the data volume allows it.
GCP Cloud Storage + DuckDB pipelines. BigQuery cost reduction — identify queries that run faster and cheaper outside BigQuery on a local DuckDB node.
Port SparkSQL jobs to standard DuckDB SQL. Remove cluster dependency for batch workloads. Works with AWS EMR, Azure HDInsight, and GCP Dataproc migrations.
Async review of your current data pipelines — any cloud. Written recommendations with specific, actionable improvements. Delivered in 48 hours.
Audit your S3 data lake for workloads that belong on NVMe. Design the hot/cold split — NVMe for compute, object storage for archival. Get DuckDB throughput you can actually feel.
Every file I deliver — Parquet, CSV, report — is signed with ML-DSA-65 (NIST FIPS 204), the post-quantum digital signature standard. You get the data file, a detached .sig file, and a one-command verifier. Run it against my public key and know the file is authentic and untampered.
Designed for healthcare, finance, and government workloads that need provable chain-of-custody today and quantum resistance tomorrow.
All engagements start with a free 30-minute scoping call.
I respond to all serious inquiries within 24 hours.
GitHub: NixOSDude/DuckDB_Master