Welcome to PardoX
The Speed of Rust. The Simplicity of Python.
PardoX is a high-performance DataFrame engine designed for modern data engineering. It combines the safety and speed of a Rust Core with the ease of use of Python, PHP, and Node.js SDKs — allowing you to process massive datasets and integrate with any database without learning a new language.
🚀 Why PardoX?
- Zero-Copy Architecture: Data flows directly from disk or database into memory-mapped Rust buffers — no Python objects, no intermediate copies.
- SIMD Acceleration: Mathematical operations use AVX2/NEON CPU instructions for 5x–20x speedups vs. Python loops.
- Universal Compatibility: Runs natively on Windows, Linux, and macOS (Intel & Apple Silicon).
- Native Database Engine: Connect to PostgreSQL, MySQL, SQL Server, and MongoDB entirely through Rust — no Python database drivers required.
- Multi-SDK: A single Rust core powers identical APIs in Python, Node.js, and PHP.
- Native Format: The
.prdxbinary format enables ~4.6 GB/s read throughput for repeated workloads.
📚 Documentation
🏁 Getting Started
- Installation — Setup guide for Python, Node.js, and PHP.
- Quick Start — Build your first ETL pipeline in 5 minutes.
- Roadmap — What’s coming in v0.4 and beyond.
📘 User Guide
- Input / Output — Multi-threaded CSV, native .prdx format, Apache Arrow bridge.
- Databases — PostgreSQL, MySQL, SQL Server, MongoDB — read, write, and execute.
- Data Mutation — Vectorized arithmetic, type casting, sorting, data cleaning.
- Aggregations & Observer — Metrics, statistics, value counts, and full-DataFrame export.
- GPU Acceleration — GPU Bitonic sort and CPU fallback.
- ML Integration — Zero-copy NumPy bridge and Scikit-Learn compatibility.
⚙️ API Reference
- Full Reference — Detailed documentation of all classes, functions, and methods.
- FFI Exports Reference — All 181 C-ABI functions exported by the Rust core across 5 crates. Use this to build custom bindings or validate SDK integrations.
📂 Base Knowledge
- Base Knowledge — Validation scripts for all 30 feature gaps across Python, Node.js, and PHP SDKs.
- Python validation scripts — 29 files (
validate_gap1_sdk.py→validate_gap30_sdk.py) - Node.js validation scripts — 19 files (
validate_gap1_sdk.js→validate_gap30_sdk.js) - PHP validation scripts — 29 files (
validate_gap1_sdk.php→validate_gap30_sdk.php)
- Python validation scripts — 29 files (
📘 SDK Documentation
- SDK Documentation — In-depth guides for each SDK.
- Python SDK — API reference and v0.3.2 features
- Node.js SDK — API reference and v0.3.2 features
- PHP SDK — API reference and v0.3.2 features
- Database Integration — PostgreSQL, MySQL, SQL Server, MongoDB across all SDKs
- Universality — Cross-SDK design philosophy
📓 Examples & Notebooks
- Jupyter Notebooks — Interactive examples and real-world ETL scenarios.
- Benchmark Scripts — Processing 640 million rows and transforming data to
.prdxformat.
📦 Quick Install
pip install pardox
What’s New in v0.3.4
| Pillar | What was added |
|---|---|
| SQL Cursor API (Gap 30) | query_to_results(conn, query, batch_size) — streaming iterator over PostgreSQL results yielding DataFrame batches with O(batch) RAM. sql_to_parquet(conn, query, pattern, chunk_size) — stream SQL → PardoX binary files using {i} pattern. Validated: 3 SDKs × 11/11 tests. Requested by GitHub @Prussian1870 |
| 30 Gaps Total | Gap 30 (SQL Cursor API) added to all 3 SDKs — Python, JavaScript, PHP |
What’s New in v0.3.3
| Pillar | What was added |
|---|---|
| SQL Cursor API — Rust Core | SqlCursor struct with server-side PostgreSQL DECLARE ... NO SCROLL CURSOR. 5 new FFI exports: pardox_scan_sql_cursor_open, pardox_scan_sql_cursor_fetch, pardox_scan_sql_cursor_offset, pardox_scan_sql_cursor_close, pardox_scan_sql_to_parquet. Zero warnings, zero errors |
What’s New in v0.3.2
| Pillar | What was added |
|---|---|
| PRDX Streaming to PostgreSQL | write_sql_prdx() — stream any .prdx file directly to PostgreSQL via COPY FROM STDIN with O(block) RAM. Validated: 150M rows / 3.8 GB in ~490s at ~300k rows/s (Python/JS) |
| Gaps 1–5 — All SDKs | GroupBy, String & Date ops, Decimal type, Window functions, Lazy pipeline — validated across Python, JavaScript, and PHP SDKs |
| Gaps 7–14 — Python | GPU compute, Pivot & Melt, Time Series Fill, Nested Data (JSON), Spill to Disk, Universal Loader (PRDX), SQL over DataFrames |
| Gaps 15–29 — Python | Cloud Storage, Live Query, WebAssembly, Encryption, Data Contracts, Time Travel, Arrow Flight, Distributed Cluster, Linear Algebra, REST Connector |
| VAP31 & VAP32 | CSV→PostgreSQL and PRDX→PostgreSQL integrations validated in 3 SDKs |
| 29 Gaps Total | All 29 feature gaps from the original roadmap implemented in the Rust core |
| FFI Reference | Complete documentation of all 181 C-ABI exports across 5 crates |
What’s New in v0.3.1
| Pillar | What was added |
|---|---|
| Relational Conqueror | Native read/write/execute for PostgreSQL, MySQL, SQL Server, MongoDB via Rust drivers |
| The Observer | to_dict(), to_json(), value_counts(), unique() — full-DataFrame export with proper heap memory management |
| Native Math | df.add(), df.sub(), df.std(), df.min_max_scale(), df.sort_values() — pure Rust arithmetic |
| GPU Awakening | sort_values(gpu=True) — WebGPU Bitonic sort with automatic CPU fallback |
| ML Integration | Zero-copy NumPy bridge via __array__ protocol — direct pointer into Rust buffer |
| PHP & Node.js SDKs | Full parity with Python SDK across all new features |
Open Source Project distributed under the MIT License.
More info: www.pardox.io