Welcome to PardoX

The Speed of Rust. The Simplicity of Python.

PardoX is a high-performance DataFrame engine designed for modern data engineering. It combines the safety and speed of a Rust Core with the ease of use of Python, PHP, and Node.js SDKs — allowing you to process massive datasets and integrate with any database without learning a new language.

🚀 Why PardoX?

Zero-Copy Architecture: Data flows directly from disk or database into memory-mapped Rust buffers — no Python objects, no intermediate copies.
SIMD Acceleration: Mathematical operations use AVX2/NEON CPU instructions for 5x–20x speedups vs. Python loops.
Universal Compatibility: Runs natively on Windows, Linux, and macOS (Intel & Apple Silicon).
Native Database Engine: Connect to PostgreSQL, MySQL, SQL Server, and MongoDB entirely through Rust — no Python database drivers required.
Multi-SDK: A single Rust core powers identical APIs in Python, Node.js, and PHP.
Native Format: The .prdx binary format enables ~4.6 GB/s read throughput for repeated workloads.

📚 Documentation

🏁 Getting Started

Installation — Setup guide for Python, Node.js, and PHP.
Quick Start — Build your first ETL pipeline in 5 minutes.
Roadmap — What’s coming in v0.4 and beyond.

📘 User Guide

Input / Output — Multi-threaded CSV, native .prdx format, Apache Arrow bridge.
Databases — PostgreSQL, MySQL, SQL Server, MongoDB — read, write, and execute.
Data Mutation — Vectorized arithmetic, type casting, sorting, data cleaning.
Aggregations & Observer — Metrics, statistics, value counts, and full-DataFrame export.
GPU Acceleration — GPU Bitonic sort and CPU fallback.
ML Integration — Zero-copy NumPy bridge and Scikit-Learn compatibility.

⚙️ API Reference

Full Reference — Detailed documentation of all classes, functions, and methods.
FFI Exports Reference — All 181 C-ABI functions exported by the Rust core across 5 crates. Use this to build custom bindings or validate SDK integrations.

📂 Base Knowledge

Base Knowledge — Validation scripts for all 30 feature gaps across Python, Node.js, and PHP SDKs.
- Python validation scripts — 29 files (validate_gap1_sdk.py → validate_gap30_sdk.py)
- Node.js validation scripts — 19 files (validate_gap1_sdk.js → validate_gap30_sdk.js)
- PHP validation scripts — 29 files (validate_gap1_sdk.php → validate_gap30_sdk.php)

📘 SDK Documentation

SDK Documentation — In-depth guides for each SDK.
- Python SDK — API reference and v0.3.2 features
- Node.js SDK — API reference and v0.3.2 features
- PHP SDK — API reference and v0.3.2 features
- Database Integration — PostgreSQL, MySQL, SQL Server, MongoDB across all SDKs
- Universality — Cross-SDK design philosophy

📓 Examples & Notebooks

Jupyter Notebooks — Interactive examples and real-world ETL scenarios.
Benchmark Scripts — Processing 640 million rows and transforming data to .prdx format.

📦 Quick Install

pip install pardox

What’s New in v0.3.4

Pillar	What was added
SQL Cursor API (Gap 30)	`query_to_results(conn, query, batch_size)` — streaming iterator over PostgreSQL results yielding `DataFrame` batches with O(batch) RAM. `sql_to_parquet(conn, query, pattern, chunk_size)` — stream SQL → PardoX binary files using `{i}` pattern. Validated: 3 SDKs × 11/11 tests. Requested by GitHub @Prussian1870
30 Gaps Total	Gap 30 (SQL Cursor API) added to all 3 SDKs — Python, JavaScript, PHP

What’s New in v0.3.3

Pillar	What was added
SQL Cursor API — Rust Core	`SqlCursor` struct with server-side PostgreSQL `DECLARE ... NO SCROLL CURSOR`. 5 new FFI exports: `pardox_scan_sql_cursor_open`, `pardox_scan_sql_cursor_fetch`, `pardox_scan_sql_cursor_offset`, `pardox_scan_sql_cursor_close`, `pardox_scan_sql_to_parquet`. Zero warnings, zero errors

What’s New in v0.3.2

Pillar	What was added
PRDX Streaming to PostgreSQL	`write_sql_prdx()` — stream any `.prdx` file directly to PostgreSQL via `COPY FROM STDIN` with O(block) RAM. Validated: 150M rows / 3.8 GB in ~490s at ~300k rows/s (Python/JS)
Gaps 1–5 — All SDKs	GroupBy, String & Date ops, Decimal type, Window functions, Lazy pipeline — validated across Python, JavaScript, and PHP SDKs
Gaps 7–14 — Python	GPU compute, Pivot & Melt, Time Series Fill, Nested Data (JSON), Spill to Disk, Universal Loader (PRDX), SQL over DataFrames
Gaps 15–29 — Python	Cloud Storage, Live Query, WebAssembly, Encryption, Data Contracts, Time Travel, Arrow Flight, Distributed Cluster, Linear Algebra, REST Connector
VAP31 & VAP32	CSV→PostgreSQL and PRDX→PostgreSQL integrations validated in 3 SDKs
29 Gaps Total	All 29 feature gaps from the original roadmap implemented in the Rust core
FFI Reference	Complete documentation of all 181 C-ABI exports across 5 crates

What’s New in v0.3.1

Pillar	What was added
Relational Conqueror	Native read/write/execute for PostgreSQL, MySQL, SQL Server, MongoDB via Rust drivers
The Observer	`to_dict()`, `to_json()`, `value_counts()`, `unique()` — full-DataFrame export with proper heap memory management
Native Math	`df.add()`, `df.sub()`, `df.std()`, `df.min_max_scale()`, `df.sort_values()` — pure Rust arithmetic
GPU Awakening	`sort_values(gpu=True)` — WebGPU Bitonic sort with automatic CPU fallback
ML Integration	Zero-copy NumPy bridge via `__array__` protocol — direct pointer into Rust buffer
PHP & Node.js SDKs	Full parity with Python SDK across all new features

Open Source Project distributed under the MIT License.

More info: www.pardox.io