Leaderboard
Leaderboard (lite)
Name | Repos Resolved (/16) | Tests Passed (Total: 3628) | Test Duration (s) | Date | Analysis | Github |
---|---|---|---|---|---|---|
Reference (Gold) | 10 | 100.00% | 21.00 | NA | Analysis | Github |
OpenHands (subset of all ) |
2 | 41.24% | 116.76 | 11/25/2024 | Analysis | Github |
Claude Sonnet 3.5 - Fill-in + Unit Test Feedback | 0 | 30.59% | 552.79 | 09/25/2024 | Analysis | Github |
Claude Sonnet 3.5 - Fill-in | 0 | 18.63% | 22.47 | 09/25/2024 | Analysis | Github |
Claude Sonnet 3.5 - Base | 0 | 18.38% | 16.83 | 09/25/2024 | Analysis | Github |
Claude Sonnet 3.5 - Fill-in (subset of all ) |
0 | 15.79% | 64.49 | 09/25/2024 | Analysis | Github |
SWE-Agent (subset of all ) |
0 | 9.70% | 17.96 | 11/26/2024 | Analysis | Github |
Leaderboard (all)
Name | Repos Resolved (/56) | Tests Passed (Total: 140926) | Test Duration (s) | Date | Analysis | Github |
---|---|---|---|---|---|---|
Reference (Gold) | 19 | 98.29% | 5467.81 | NA | Analysis | Github |
OpenHands | 2 | 15.08% | 408.27 | 11/25/2024 | Analysis | Github |
Claude Sonnet 3.5 - Fill-in | 0 | 5.98% | 629.19 | 09/25/2024 | Analysis | Github |
SWE-Agent | 0 | 2.99% | 62.35 | 11/26/2024 | Analysis | Github |