2.4 KiB
2.4 KiB
Basic Operations Test Results
Highscores
Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|---|---|---|---|
| addition | openai/gpt-4o-mini | 657 | 0.66 |
| addition | openai/gpt-3.5-turbo | 783 | 0.78 |
| multiplication | openai/gpt-3.5-turbo | 566 | 0.57 |
| multiplication | openai/gpt-4o-mini | 670 | 0.67 |
| division | openai/gpt-4o-mini | 609 | 0.61 |
| division | openai/gpt-3.5-turbo | 2385 | 2.38 |
| web_content | openai/gpt-3.5-turbo | 290 | 0.29 |
| web_content | openai/gpt-4o-mini | 7277 | 7.28 |
Summary
- Total Tests: 8
- Passed: 7
- Failed: 1
- Success Rate: 87.50%
- Average Duration: 1655ms (1.65s)
Failed Tests
web_content - openai/gpt-3.5-turbo
- Prompt:
Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not. - Expected:
yes - Actual: ``
- Duration: 290ms (0.29s)
- Reason: Model returned empty response
- Timestamp: 6/3/2025, 11:33:01 PM
Passed Tests
addition - openai/gpt-3.5-turbo
- Prompt:
add 5 and 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 783ms (0.78s)
- Timestamp: 6/3/2025, 11:32:56 PM
addition - openai/gpt-4o-mini
- Prompt:
add 5 and 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 657ms (0.66s)
- Timestamp: 6/3/2025, 11:32:57 PM
multiplication - openai/gpt-3.5-turbo
- Prompt:
multiply 8 and 3. Return only the number, no explanation. - Expected:
24 - Actual:
24 - Duration: 566ms (0.57s)
- Timestamp: 6/3/2025, 11:32:57 PM
multiplication - openai/gpt-4o-mini
- Prompt:
multiply 8 and 3. Return only the number, no explanation. - Expected:
24 - Actual:
24 - Duration: 670ms (0.67s)
- Timestamp: 6/3/2025, 11:32:58 PM
division - openai/gpt-3.5-turbo
- Prompt:
divide 15 by 3. Return only the number, no explanation. - Expected:
5 - Actual:
5 - Duration: 2385ms (2.38s)
- Timestamp: 6/3/2025, 11:33:00 PM
division - openai/gpt-4o-mini
- Prompt:
divide 15 by 3. Return only the number, no explanation. - Expected:
5 - Actual:
5 - Duration: 609ms (0.61s)
- Timestamp: 6/3/2025, 11:33:01 PM
web_content - openai/gpt-4o-mini
- Prompt:
Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not. - Expected:
yes - Actual:
yes - Duration: 7277ms (7.28s)
- Timestamp: 6/3/2025, 11:33:09 PM