CLAW SCORE
Percentage of all evaluations resolved in ClawBench; Higher is Better
01
GLM-5-Turbo
93.902
Doubao-Seed-2.0-lite
93.103
GPT-5.4
92.204
MiniMax-M2.5
92.105
MiniMax-M2.7
91.706
GLM-5
91.707
Claude Opus 4.5
91.508
Qwen3.5-35B-A3B
91.409
MiMo-V2-Omni
91.210
Qwen3.5-397B-A17B
90.0SPEED
Time (s) to run all evaluations in the ClawBench; Lower is better
01
Grok 4.20 Beta
524s02
gpt-oss-20b
530s03
GPT-5.4 Mini
589s04
GPT-5.4 Nano
649s05
Gemini 3 Flash Preview
666s06
MiMo-V2-Omni
848s07
gpt-oss-120b
1218s08
GPT-5.4
1292s09
Nemotron 3 Nano
1298s10
GLM-5-Turbo
1317sCOST
Cost (USD) to run all evaluations in the ClawBench; Lower is better
01
gpt-oss-20b
$0.0802
GPT-5.4 Nano
$0.1703
gpt-oss-120b
$0.1804
Step 3.5 Flash
$0.2805
DeepSeek-V3.2(Non-thinking)
$0.3206
Doubao-Seed-2.0-lite
$0.3307
Grok 4.1 Fast
$0.3308
MiniMax-M2.5
$0.3809
MiniMax-M2.7
$0.4410
Claude Sonnet 4.5
$0.49Updated 03/23/2026

ClawBenchLLM Agent Benchmark
| Model | CLAW SCORE Percentage of all evaluations resolved in ClawBench; Higher is Better | Speed Time (s) to run all evaluations in the ClawBench; Lower is better | Cost Cost (USD) to run all evaluations in the ClawBench; Lower is better | Value CLAW SCORE/COST; Higher is Better | Report |
|---|---|---|---|---|---|
GLM-5-TurboProprietary Z.ai | 93.9 | 1317s | $0.83 | 113.1 | |
Doubao-Seed-2.0-liteProprietary ByteDance | 93.1 | 1793s | $0.33 | 282.1 | |
GPT-5.4Proprietary OpenAI | 92.2 | 1292s | $2.11 | 43.7 | |
MiniMax-M2.5Proprietary MiniMax | 92.1 | 1908s | $0.38 | 242.3 | |
MiniMax-M2.7Proprietary MiniMax | 91.7 | 2003s | $0.44 | 208.5 | |
GLM-5Open Weights Z.ai | 91.7 | 2377s | $1.30 | 70.5 | |
Claude Opus 4.5Proprietary Anthropic | 91.5 | 1556s | $9.85 | 9.3 | |
Qwen3.5-35B-A3BOpen Weights Alibaba | 91.4 | 1615s | $0.56 | 163.3 | |
MiMo-V2-OmniProprietary Xiaomi | 91.2 | 848s | $0.75 | 121.6 | |
Qwen3.5-397B-A17BOpen Weights Alibaba | 90.0 | 1661s | $0.85 | 105.8 | |
GPT-5.4 NanoProprietary OpenAI | 89.7 | 649s | $0.17 | 527.4 | |
Claude Haiku 4.5Proprietary Anthropic | 89.4 | 1860s | $2.16 | 41.4 | |
MiMo-V2-ProProprietary Xiaomi | 89.3 | 1713s | $5.31 | 16.8 | |
Doubao-Seed-2.0-proProprietary ByteDance | 88.6 | 2293s | $1.00 | 88.6 | |
Grok 4.1 FastProprietary xAI | 88.6 | 1441s | $0.33 | 268.4 | |
Qwen3.5-Plus-2026-02-15Open Weights Alibaba | 88.4 | 2794s | $1.17 | 75.6 | |
Claude Opus 4.6Proprietary Anthropic | 88.2 | 1524s | $6.49 | 13.6 | |
Claude Sonnet 4.5Proprietary Anthropic | 88.1 | 1676s | $0.49 | 179.8 | |
Gemini 3.1 Pro PreviewProprietary Google | 87.7 | 1891s | $2.12 | 41.4 | |
Qwen3.5-122B-A10BOpen Weights Alibaba | 86.0 | 1431s | $1.00 | 86.0 |
https://clawbenchlabs.com
Multidimensional Analysis

ClawBenchLLM Agent Benchmark
Multidimensional Analysis: Office Collaboration
98
97
96
94
94
94
94
94
94
93
93
93
93
93
91
91
91
91
90
89
87
87
83
83
83
82
82
82
80
78
77
77
76
75
73
73
72
69
67
58
GLM-5-Turbo
Claude Opus 4.6
GPT-5.4
MiniMax-M2.7
Qwen3.5-35B-A3B
Qwen3.5-397B-A17B
MiMo-V2-Pro
Claude Sonnet 4.5
Kimi K2 Thinking
Doubao-Seed-2.0-lite
MiniMax-M2.5
MiMo-V2-Omni
Doubao-Seed-2.0-pro
Qwen3.5-Plus-2026-02-15
GLM-5
Step 3.5 Flash
DeepSeek-V3.2(Non-thinking)
ERNIE-5.0-Thinking-Preview
Kimi K2.5
DeepSeek-V3.2(Thinking)
Claude Opus 4.5
GPT-5.4 Nano
Gemini 3.1 Pro Preview
Gemini 2.5 Pro
Qwen3.5-27B
Claude Haiku 4.5
Mistral Large 3 2512
Claude Sonnet 4.6
Grok 4.1 Fast
GPT-5.4 Mini
Qwen3.5-122B-A10B
gpt-oss-20b
Gemini 3 Flash Preview
Nemotron 3 Nano
Qwen3-Coder-Next
Nova 2 Lite
Grok 4.20 Beta
MiMo-V2-Flash
gpt-oss-120b
Nemotron 3 Super
ClawBench Multidimensional Analysis

https://clawbenchlabs.com