Model leaderboard

This site is also a testbed. The same content tasks run through Cloudflare's Edge AI, Claude and Gemini; we score quality and measure latency and cost, then use the strongest outputs to tune the cheapest edge model.

Loading latest run…

How to read it

The cheap edge model (Llama 3.1 8B) is the tuning target: the gap to Claude/Gemini is what we close with better prompts, few-shot exemplars and (where supported) LoRA — then re-run this eval to confirm the lift.

← About 3d.2nth.ai