1.4 KiB
1.4 KiB
File Operations Test Results
Highscores
Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|---|---|---|---|
| file-inclusion | google/gemini-2.0-flash-exp:free | 2950 | 2.95 |
| file-inclusion | openai/gpt-4o-mini | 3954 | 3.95 |
| file-inclusion | openrouter/quasar-alpha | 6527 | 6.53 |
Summary
- Total Tests: 12
- Passed: 2
- Failed: 10
- Success Rate: 16.67%
- Average Duration: 10845ms (10.84s)
Failed Tests
file-inclusion - openai/gpt-4o-mini
- Prompt:
What animals are shown in these images? Return as JSON array. - Expected:
["cat","fox"] - Actual:
["cat", "fox"] - Duration: 3954ms (3.95s)
- Reason: Expected ["cat","fox"], but got ["cat", "fox"]
- Timestamp: 4/4/2025, 6:42:07 PM
file-inclusion - openrouter/quasar-alpha
- Prompt:
What animals are shown in these images? Return as JSON array. - Expected:
["cat","fox"] - Actual:
[ "cat", "fox" ] - Duration: 6527ms (6.53s)
- Reason: Expected ["cat","fox"], but got [ "cat", "fox" ]
- Timestamp: 4/4/2025, 6:42:14 PM
file-inclusion - google/gemini-2.0-flash-exp:free
- Prompt:
What animals are shown in these images? Return as JSON array. - Expected:
["cat","fox"] - Actual:
["cat", "fox"] - Duration: 2950ms (2.95s)
- Reason: Expected ["cat","fox"], but got ["cat", "fox"]
- Timestamp: 4/4/2025, 6:42:17 PM
Passed Tests
No passed tests