2.4 KiB
2.4 KiB
Basic Operations Test Results
Highscores
Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|---|---|---|---|
| addition | openai/gpt-4o-mini | 514 | 0.51 |
| addition | openai/gpt-3.5-turbo | 771 | 0.77 |
| multiplication | openai/gpt-3.5-turbo | 624 | 0.62 |
| multiplication | openai/gpt-4o-mini | 721 | 0.72 |
| division | openai/gpt-3.5-turbo | 513 | 0.51 |
| division | openai/gpt-4o-mini | 895 | 0.90 |
| web_content | openai/gpt-3.5-turbo | 220 | 0.22 |
| web_content | openai/gpt-4o-mini | 4358 | 4.36 |
Summary
- Total Tests: 8
- Passed: 7
- Failed: 1
- Success Rate: 87.50%
- Average Duration: 1077ms (1.08s)
Failed Tests
web_content - openai/gpt-3.5-turbo
- Prompt:
Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not. - Expected:
yes - Actual: ``
- Duration: 220ms (0.22s)
- Reason: Model returned empty response
- Timestamp: 6/5/2025, 8:46:11 PM
Passed Tests
addition - openai/gpt-3.5-turbo
- Prompt:
add 5 and 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 771ms (0.77s)
- Timestamp: 6/5/2025, 8:46:08 PM
addition - openai/gpt-4o-mini
- Prompt:
add 5 and 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 514ms (0.51s)
- Timestamp: 6/5/2025, 8:46:08 PM
multiplication - openai/gpt-3.5-turbo
- Prompt:
multiply 8 and 3. Return only the number, no explanation. - Expected:
24 - Actual:
24 - Duration: 624ms (0.62s)
- Timestamp: 6/5/2025, 8:46:09 PM
multiplication - openai/gpt-4o-mini
- Prompt:
multiply 8 and 3. Return only the number, no explanation. - Expected:
24 - Actual:
24 - Duration: 721ms (0.72s)
- Timestamp: 6/5/2025, 8:46:09 PM
division - openai/gpt-3.5-turbo
- Prompt:
divide 15 by 3. Return only the number, no explanation. - Expected:
5 - Actual:
5 - Duration: 513ms (0.51s)
- Timestamp: 6/5/2025, 8:46:10 PM
division - openai/gpt-4o-mini
- Prompt:
divide 15 by 3. Return only the number, no explanation. - Expected:
5 - Actual:
5 - Duration: 895ms (0.90s)
- Timestamp: 6/5/2025, 8:46:11 PM
web_content - openai/gpt-4o-mini
- Prompt:
Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not. - Expected:
yes - Actual:
Yes - Duration: 4358ms (4.36s)
- Timestamp: 6/5/2025, 8:46:15 PM