Build software better, together

eugeneyan / testing-ml

🔍 Minimal examples of machine learning tests for implementation, behaviour, and performance.

testing machine-learning model-evaluation

Updated Sep 21, 2022
Python

metriculous-ml / metriculous

Measure and visualize machine learning model performance without the usual boilerplate.

python data-science machine-learning statistics deep-learning regression model-selection classification visual-analysis confusion-matrix roc-curve model-evaluation precision-recall-curve model-comparsion residual-plot

Updated Sep 13, 2024
Python

Lantianzz / Scorecard-Bundle

Star

A High-level Scorecard Modeling API | 评分卡建模尽在于此

model-evaluation woe feature-discretization chimerge scorecard-bundle scorecared credit-scorecard

Updated Oct 10, 2022
Python

Poyu123 / CNN-application-on-CRFAR10-picture-distinguish

Star

A hands-on TensorFlow image recognition project teaching a computer to identify 10 everyday objects, originally for a linear algebra class, with tools to train a CNN, auto-tune settings, and test accuracy on random internet images.

deep-learning neural-networks image-classification predictive-analysis data-preprocessing model-evaluation cnn-training tensorflow-keras cifar-10-dataset automated-tuning

Updated Mar 20, 2026
Python

Striveworks / valor

Star

Valor is a lightweight, numpy-based library designed for fast and seamless evaluation of machine learning models.

nlp computer-vision evaluation text-generation classification object-detection image-segmentation evaluation-metrics model-evaluation mlops llm-eval

Updated Feb 9, 2026
Python

TrentPierce / PolyCouncil

Sponsor

Star

PolyCouncil is an open-source multi-model deliberation engine for LM Studio. It runs multiple LLMs in parallel, gathers their answers, scores each response using a shared rubric, and produces a final, consensus-driven result. Designed for testing, comparing, and orchestrating local models with ease.

Updated Mar 24, 2026
Python

roboflow / cvevals

Star

Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, models hosted on Roboflow)

computer-vision model-evaluation

Updated Oct 18, 2023
Python

Zyjacya-In-love / Pedestrian-Detection-on-YOLOv3_Research-and-APP

Star

🎓 2020 Undergraduate Graduation Project in Jiangnan University ALL codes including Data-convert, keras-Train, model-Evaluate and Web-App

flask detection data-convert model-evaluation yolov3 model-weights keras-train

Updated Nov 22, 2022
Python

AmirhosseinHonardoust / Fraud-Detection-SQL-Supervised

Star

Detect and classify fraudulent transactions using SQL and Python. Generate behavioral features with SQLite, train a Logistic Regression model, and evaluate performance with AUC, precision, recall, and ROC analysis. A complete supervised fraud detection workflow.

python data-science machine-learning sql sqlite supervised-learning data-analysis logistic-regression roc-curve model-evaluation fraud-detection portfolio-project financial-analytics

Updated Oct 21, 2025
Python

metno / pyaerocom

Star

Python tools for climate and air quality model evaluation

aerosol air-quality climate-science earth-observation model-evaluation aerocom

Updated Jun 3, 2026
Python

npstorey / civic-ai-tools

Star

Open-source platform connecting AI assistants to government open data — MCP server, curated civic MCP directory, and anti-hallucination framework for all 559 Socrata portals

mcp open-data civic-tech ckan government-data socrata model-evaluation data-commons llm model-context-protocol

Updated Jun 3, 2026
Python

rohanmistry231 / ML-Interview-Preparation

Sponsor

Star

A comprehensive resource for machine learning interview preparation, featuring coding challenges, algorithm explanations, and practical Python examples. Covers supervised and unsupervised learning, model evaluation, and data preprocessing for technical interviews.

python data-science machine-learning algorithms data-preprocessing coding-challenges model-evaluation interview-preparation

Updated May 22, 2025
Python

animator / titus2

Star

Titus 2 : Portable Format for Analytics (PFA) implementation for Python 3.4+

python analytics inference scoring pmml scoring-engine pfa model-evaluation pfa-standard inference-engine model-deployment model-serving ml-engine titus

Updated Feb 9, 2023
Python

medoidai / skrobot

Star

skrobot is a Python module for designing, running and tracking Machine Learning experiments / tasks. It is built on top of scikit-learn framework.

python open-source data-science machine-learning scikit-learn feature-selection artificial-intelligence model-selection feature-engineering hyperparameter-tuning model-evaluation model-training model-tuning predictive-modelling

Updated Sep 18, 2024
Python

Climate-REF / climate-ref

Star

Rapid Evaluation Framework for climate data

climate model-evaluation cmip model-benchmarking

Updated Jun 3, 2026
Python

AmirhosseinHonardoust / Cognitivelens-AI-Human-Comparison

Star

CognitiveLens is a Streamlit-powered analytics tool for exploring alignment between human and AI decisions. It visualizes fairness, calibration, and interpretability through metrics like Cohen’s κ, AUC, and Brier score. Designed for ethical AI, bias auditing, and decision transparency in machine learning systems.

Updated Nov 5, 2025
Python

AmirhosseinHonardoust / Financial-Fraud-Risk-Engine

Star

A complete end-to-end fraud detection system for financial transactions, featuring data pipelines, cost-sensitive ML modeling, explainability with SHAP, threshold optimization, batch scoring, and an interactive Streamlit dashboard. Designed to simulate real-world fintech fraud-risk workflows.

Updated Dec 4, 2025
Python

AmirhosseinHonardoust / AI-Assistant-Satisfaction-Prediction-Engine

Star

A complete machine-learning system that predicts AI assistant user satisfaction using behavioral signals such as device, usage category, time features, session metrics, and model metadata. Includes full ML pipeline, SHAP explainability, evaluation suite, and an interactive Streamlit analytics dashboard.

Updated Dec 5, 2025
Python

AmirhosseinHonardoust / Underwriting-Decision-Safety-Lab

Star

A decision-safety lab for loan approval: trains a baseline classifier, calibrates probabilities (ECE/Brier), sweeps confidence thresholds to build a coverage, quality frontier and outputs a defensible abstention policy (auto-decide vs review). Includes a Streamlit dashboard for report cards, triage UI, and data quality checks.

Updated May 30, 2026
Python

Khanz9664 / TrustLens

Sponsor

Star

Open-source Python library for evaluating ML model reliability beyond accuracy — with calibration, failure, and fairness diagnostics for informed deployment decisions.

python data-science machine-learning opensource calibration fairness python-package model-evaluation ai-safety evaluation-framework explainable-ai bias-detection mlops fairness-ml model-monitoring trustworthy-ai model-reliability

Updated Jun 3, 2026
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model-evaluation

Here are 325 public repositories matching this topic...

eugeneyan / testing-ml

metriculous-ml / metriculous

Lantianzz / Scorecard-Bundle

Poyu123 / CNN-application-on-CRFAR10-picture-distinguish

Striveworks / valor

TrentPierce / PolyCouncil

roboflow / cvevals

Zyjacya-In-love / Pedestrian-Detection-on-YOLOv3_Research-and-APP

AmirhosseinHonardoust / Fraud-Detection-SQL-Supervised

metno / pyaerocom

npstorey / civic-ai-tools

rohanmistry231 / ML-Interview-Preparation

animator / titus2

medoidai / skrobot

Climate-REF / climate-ref

AmirhosseinHonardoust / Cognitivelens-AI-Human-Comparison

AmirhosseinHonardoust / Financial-Fraud-Risk-Engine

AmirhosseinHonardoust / AI-Assistant-Satisfaction-Prediction-Engine

AmirhosseinHonardoust / Underwriting-Decision-Safety-Lab

Khanz9664 / TrustLens

Improve this page

Add this topic to your repo