3.5 KiB
3.5 KiB
Basic Operations Test Results
Highscores
Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|---|---|---|---|
| addition | openrouter/quasar-alpha | 729 | 0.73 |
| addition | openai/gpt-4o-mini | 927 | 0.93 |
| addition | openai/gpt-3.5-turbo | 1197 | 1.20 |
| addition | deepseek/deepseek-r1-distill-qwen-14b:free | 11570 | 11.57 |
| multiplication | openai/gpt-3.5-turbo | 863 | 0.86 |
| multiplication | openrouter/quasar-alpha | 960 | 0.96 |
| multiplication | openai/gpt-4o-mini | 1105 | 1.10 |
| multiplication | deepseek/deepseek-r1-distill-qwen-14b:free | 16310 | 16.31 |
| division | openrouter/quasar-alpha | 749 | 0.75 |
| division | openai/gpt-4o-mini | 856 | 0.86 |
| division | openai/gpt-3.5-turbo | 901 | 0.90 |
| division | deepseek/deepseek-r1-distill-qwen-14b:free | 11412 | 11.41 |
Summary
- Total Tests: 12
- Passed: 11
- Failed: 1
- Success Rate: 91.67%
- Average Duration: 3965ms (3.96s)
Failed Tests
division - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
divide 15 by 3. Return only the number, no explanation. - Expected:
5 - Actual: `15 ÷ 3 equals 5.
5`
- Duration: 11412ms (11.41s)
- Reason: Expected 5, but got 15 ÷ 3 equals 5.
5
- Timestamp: 4/6/2025, 4:21:00 PM
Passed Tests
addition - openai/gpt-3.5-turbo
- Prompt:
add 5 and 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 1197ms (1.20s)
- Timestamp: 4/6/2025, 4:20:15 PM
addition - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
add 5 and 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 11570ms (11.57s)
- Timestamp: 4/6/2025, 4:20:27 PM
addition - openai/gpt-4o-mini
- Prompt:
add 5 and 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 927ms (0.93s)
- Timestamp: 4/6/2025, 4:20:28 PM
addition - openrouter/quasar-alpha
- Prompt:
add 5 and 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 729ms (0.73s)
- Timestamp: 4/6/2025, 4:20:28 PM
multiplication - openai/gpt-3.5-turbo
- Prompt:
multiply 8 and 3. Return only the number, no explanation. - Expected:
24 - Actual:
24 - Duration: 863ms (0.86s)
- Timestamp: 4/6/2025, 4:20:29 PM
multiplication - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
multiply 8 and 3. Return only the number, no explanation. - Expected:
24 - Actual:
24 - Duration: 16310ms (16.31s)
- Timestamp: 4/6/2025, 4:20:45 PM
multiplication - openai/gpt-4o-mini
- Prompt:
multiply 8 and 3. Return only the number, no explanation. - Expected:
24 - Actual:
24 - Duration: 1105ms (1.10s)
- Timestamp: 4/6/2025, 4:20:47 PM
multiplication - openrouter/quasar-alpha
- Prompt:
multiply 8 and 3. Return only the number, no explanation. - Expected:
24 - Actual:
24 - Duration: 960ms (0.96s)
- Timestamp: 4/6/2025, 4:20:48 PM
division - openai/gpt-3.5-turbo
- Prompt:
divide 15 by 3. Return only the number, no explanation. - Expected:
5 - Actual:
5 - Duration: 901ms (0.90s)
- Timestamp: 4/6/2025, 4:20:48 PM
division - openai/gpt-4o-mini
- Prompt:
divide 15 by 3. Return only the number, no explanation. - Expected:
5 - Actual:
5 - Duration: 856ms (0.86s)
- Timestamp: 4/6/2025, 4:21:01 PM
division - openrouter/quasar-alpha
- Prompt:
divide 15 by 3. Return only the number, no explanation. - Expected:
5 - Actual:
5 - Duration: 749ms (0.75s)
- Timestamp: 4/6/2025, 4:21:02 PM