Skip to content

Leaderboard

Leaderboard (lite)

Name Repos Resolved (/16) Tests Passed (Total: 3628) Test Duration (s) Date Analysis Github
Reference (Gold) 10 100.00% 21.00 NA Analysis Github
OpenHands (subset of all) 2 41.24% 116.76 11/25/2024 Analysis Github
Claude Sonnet 3.5 - Fill-in + Unit Test Feedback 0 30.59% 552.79 09/25/2024 Analysis Github
Claude Sonnet 3.5 - Fill-in 0 18.63% 22.47 09/25/2024 Analysis Github
Claude Sonnet 3.5 - Base 0 18.38% 16.83 09/25/2024 Analysis Github
Claude Sonnet 3.5 - Fill-in (subset of all) 0 15.79% 64.49 09/25/2024 Analysis Github
SWE-Agent (subset of all) 0 9.70% 17.96 11/26/2024 Analysis Github

Leaderboard (all)

Name Repos Resolved (/56) Tests Passed (Total: 140926) Test Duration (s) Date Analysis Github
Reference (Gold) 19 98.29% 5467.81 NA Analysis Github
OpenHands 2 15.08% 408.27 11/25/2024 Analysis Github
Claude Sonnet 3.5 - Fill-in 0 5.98% 629.19 09/25/2024 Analysis Github
SWE-Agent 0 2.99% 62.35 11/26/2024 Analysis Github