1.4 KiB
1.4 KiB
File Operations Test Results
Highscores
Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|---|---|---|---|
| file-inclusion | openrouter/quasar-alpha | 1876 | 1.88 |
| file-inclusion | google/gemini-2.0-flash-exp:free | 2449 | 2.45 |
| file-inclusion | openai/gpt-4o-mini | 3323 | 3.32 |
Summary
- Total Tests: 12
- Passed: 2
- Failed: 10
- Success Rate: 16.67%
- Average Duration: 1578ms (1.58s)
Failed Tests
file-inclusion - openai/gpt-4o-mini
- Prompt:
What animals are shown in these images? Return as JSON array. - Expected:
["cat","fox"] - Actual:
["wildcat", "fox"] - Duration: 3323ms (3.32s)
- Reason: Expected ["cat","fox"], but got ["wildcat", "fox"]
- Timestamp: 4/4/2025, 6:13:31 PM
file-inclusion - openrouter/quasar-alpha
- Prompt:
What animals are shown in these images? Return as JSON array. - Expected:
["cat","fox"] - Actual:
[ "cat", "fox" ] - Duration: 1876ms (1.88s)
- Reason: Expected ["cat","fox"], but got [ "cat", "fox" ]
- Timestamp: 4/4/2025, 6:13:33 PM
file-inclusion - google/gemini-2.0-flash-exp:free
- Prompt:
What animals are shown in these images? Return as JSON array. - Expected:
["cat","fox"] - Actual:
["cat", "fox"] - Duration: 2449ms (2.45s)
- Reason: Expected ["cat","fox"], but got ["cat", "fox"]
- Timestamp: 4/4/2025, 6:13:35 PM
Passed Tests
No passed tests