Open source repositories tagged with #data-engineering, ranked by health score.
🏭 The open-source Palantir Foundry alternative. Connect any data source, build ontologies, create pipelines, visualize with dashboards, and make AI-powered decisions. Self-hosted.
Incremental engine for long horizon agents 🌟 Star if you like it!
Blazing-fast Data-Wrangling toolkit
Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.
C++ accelerated data quality toolkit for Python: CSV parsing, cleaning, schema validation, profiling, and pandas integration.