Open source repositories tagged with #data-science, ranked by health score.
A Python package for interactive mapping and geospatial analysis with minimal coding in a Jupyter environment
Single-file memory layer for AI agents, sub mili-second RAG on Apple Silicon. Metal Optimized On-Device. No Server. No API. One File. Pure Swift
Python Client for Supabase. Query Postgres from Flask, Django, FastAPI. Python user authentication, security policies, edge functions, file storage, and realtime data streaming. Good first issue.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
C++ accelerated data quality toolkit for Python: CSV parsing, cleaning, schema validation, profiling, and pandas integration.
Free AI/ML course with 950+ Jupyter notebooks — Python, deep learning, LLMs, RAG, agents, prompt engineering, fine-tuning, MLOps
A visual-based graph node editor for training computer vision models.
Small scale machine learning projects to understand the core concepts . Give a Star 🌟If it helps you. BONUS: Interview Bank coming up..!
The Harmony Python library: a research tool for psychologists to harmonise data and questionnaire items. Open source.
Hierarchical divisive clustering algorithm execution, visualization and Interactive visualization.