3.5 KiB
3.5 KiB
Basic Operations Test Results
Highscores
Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|---|---|---|---|
| addition | openai/gpt-4o-mini | 709 | 0.71 |
| addition | openrouter/quasar-alpha | 803 | 0.80 |
| addition | openai/gpt-3.5-turbo | 1182 | 1.18 |
| addition | deepseek/deepseek-r1-distill-qwen-14b:free | 2473 | 2.47 |
| multiplication | openai/gpt-3.5-turbo | 916 | 0.92 |
| multiplication | openai/gpt-4o-mini | 1232 | 1.23 |
| multiplication | openrouter/quasar-alpha | 5757 | 5.76 |
| multiplication | deepseek/deepseek-r1-distill-qwen-14b:free | 6224 | 6.22 |
| division | openai/gpt-4o-mini | 680 | 0.68 |
| division | openai/gpt-3.5-turbo | 893 | 0.89 |
| division | openrouter/quasar-alpha | 1031 | 1.03 |
| division | deepseek/deepseek-r1-distill-qwen-14b:free | 1612 | 1.61 |
Summary
- Total Tests: 12
- Passed: 11
- Failed: 1
- Success Rate: 91.67%
- Average Duration: 1959ms (1.96s)
Failed Tests
addition - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
add 5 and 3. Return only the number, no explanation. - Expected:
8 - Actual: `The result of adding 5 and 3 is:
5 + 3 = \boxed{8}
\]`
- Duration: 2473ms (2.47s)
- Reason: Expected 8, but got the result of adding 5 and 3 is:
\[
5 + 3 = \boxed{8}
- Timestamp: 4/6/2025, 5:30:36 PM
Passed Tests
addition - openai/gpt-3.5-turbo
- Prompt:
add 5 and 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 1182ms (1.18s)
- Timestamp: 4/6/2025, 5:30:34 PM
addition - openai/gpt-4o-mini
- Prompt:
add 5 and 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 709ms (0.71s)
- Timestamp: 4/6/2025, 5:30:37 PM
addition - openrouter/quasar-alpha
- Prompt:
add 5 and 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 803ms (0.80s)
- Timestamp: 4/6/2025, 5:30:38 PM
multiplication - openai/gpt-3.5-turbo
- Prompt:
multiply 8 and 3. Return only the number, no explanation. - Expected:
24 - Actual:
24 - Duration: 916ms (0.92s)
- Timestamp: 4/6/2025, 5:30:39 PM
multiplication - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
multiply 8 and 3. Return only the number, no explanation. - Expected:
24 - Actual:
24 - Duration: 6224ms (6.22s)
- Timestamp: 4/6/2025, 5:30:45 PM
multiplication - openai/gpt-4o-mini
- Prompt:
multiply 8 and 3. Return only the number, no explanation. - Expected:
24 - Actual:
24 - Duration: 1232ms (1.23s)
- Timestamp: 4/6/2025, 5:30:46 PM
multiplication - openrouter/quasar-alpha
- Prompt:
multiply 8 and 3. Return only the number, no explanation. - Expected:
24 - Actual:
24 - Duration: 5757ms (5.76s)
- Timestamp: 4/6/2025, 5:30:52 PM
division - openai/gpt-3.5-turbo
- Prompt:
divide 15 by 3. Return only the number, no explanation. - Expected:
5 - Actual:
5 - Duration: 893ms (0.89s)
- Timestamp: 4/6/2025, 5:30:53 PM
division - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
divide 15 by 3. Return only the number, no explanation. - Expected:
5 - Actual:
5 - Duration: 1612ms (1.61s)
- Timestamp: 4/6/2025, 5:30:55 PM
division - openai/gpt-4o-mini
- Prompt:
divide 15 by 3. Return only the number, no explanation. - Expected:
5 - Actual:
5 - Duration: 680ms (0.68s)
- Timestamp: 4/6/2025, 5:30:55 PM
division - openrouter/quasar-alpha
- Prompt:
divide 15 by 3. Return only the number, no explanation. - Expected:
5 - Actual:
5 - Duration: 1031ms (1.03s)
- Timestamp: 4/6/2025, 5:30:56 PM