6.3 KiB
6.3 KiB
Math Operations Test Results
Highscores
Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|---|---|---|---|
| quadratic | openai/gpt-4o-mini | 943 | 0.94 |
| quadratic | openrouter/quasar-alpha | 1105 | 1.10 |
| quadratic | openai/gpt-3.5-turbo | 1229 | 1.23 |
| quadratic | deepseek/deepseek-r1-distill-qwen-14b:free | 11633 | 11.63 |
| factorial | openai/gpt-3.5-turbo | 838 | 0.84 |
| factorial | openrouter/quasar-alpha | 840 | 0.84 |
| factorial | openai/gpt-4o-mini | 920 | 0.92 |
| factorial | deepseek/deepseek-r1-distill-qwen-14b:free | 7825 | 7.83 |
| fibonacci | openrouter/quasar-alpha | 701 | 0.70 |
| fibonacci | openai/gpt-4o-mini | 935 | 0.94 |
| fibonacci | openai/gpt-3.5-turbo | 1195 | 1.20 |
| fibonacci | deepseek/deepseek-r1-distill-qwen-14b:free | 11358 | 11.36 |
| square_root | openai/gpt-3.5-turbo | 793 | 0.79 |
| square_root | openai/gpt-4o-mini | 1012 | 1.01 |
| square_root | openrouter/quasar-alpha | 1535 | 1.53 |
| square_root | deepseek/deepseek-r1-distill-qwen-14b:free | 16332 | 16.33 |
| power | openai/gpt-3.5-turbo | 922 | 0.92 |
| power | openai/gpt-4o-mini | 1004 | 1.00 |
| power | openrouter/quasar-alpha | 1567 | 1.57 |
| power | deepseek/deepseek-r1-distill-qwen-14b:free | 7091 | 7.09 |
Summary
- Total Tests: 20
- Passed: 16
- Failed: 4
- Success Rate: 80.00%
- Average Duration: 3489ms (3.49s)
Failed Tests
quadratic - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation. - Expected:
-2,-3 - Actual: `The solutions to the quadratic equation x² + 5x + 6 = 0 are -3, -2.
Answer: -3, -2`
- Duration: 11633ms (11.63s)
- Reason: Expected -2,-3, but got the solutions to the quadratic equation x² + 5x + 6 = 0 are -3, -2.
answer: -3, -2
- Timestamp: 4/4/2025, 2:38:24 PM
quadratic - openai/gpt-4o-mini
- Prompt:
Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation. - Expected:
-2,-3 - Actual:
-2, -3 - Duration: 943ms (0.94s)
- Reason: Expected -2,-3, but got -2, -3
- Timestamp: 4/4/2025, 2:38:25 PM
fibonacci - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation. - Expected:
8 - Actual:
5 - Duration: 11358ms (11.36s)
- Reason: Expected 8, but got 5
- Timestamp: 4/4/2025, 2:38:49 PM
fibonacci - openai/gpt-4o-mini
- Prompt:
Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation. - Expected:
8 - Actual:
5 - Duration: 935ms (0.94s)
- Reason: Expected 8, but got 5
- Timestamp: 4/4/2025, 2:38:50 PM
Passed Tests
quadratic - openai/gpt-3.5-turbo
- Prompt:
Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation. - Expected:
-2,-3 - Actual:
-2,-3 - Duration: 1229ms (1.23s)
- Timestamp: 4/4/2025, 2:38:12 PM
quadratic - openrouter/quasar-alpha
- Prompt:
Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation. - Expected:
-2,-3 - Actual:
-2,-3 - Duration: 1105ms (1.10s)
- Timestamp: 4/4/2025, 2:38:26 PM
factorial - openai/gpt-3.5-turbo
- Prompt:
Calculate 5! (factorial of 5). Return only the number, no explanation. - Expected:
120 - Actual:
120 - Duration: 838ms (0.84s)
- Timestamp: 4/4/2025, 2:38:27 PM
factorial - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Calculate 5! (factorial of 5). Return only the number, no explanation. - Expected:
120 - Actual:
120 - Duration: 7825ms (7.83s)
- Timestamp: 4/4/2025, 2:38:34 PM
factorial - openai/gpt-4o-mini
- Prompt:
Calculate 5! (factorial of 5). Return only the number, no explanation. - Expected:
120 - Actual:
120 - Duration: 920ms (0.92s)
- Timestamp: 4/4/2025, 2:38:35 PM
factorial - openrouter/quasar-alpha
- Prompt:
Calculate 5! (factorial of 5). Return only the number, no explanation. - Expected:
120 - Actual:
120 - Duration: 840ms (0.84s)
- Timestamp: 4/4/2025, 2:38:36 PM
fibonacci - openai/gpt-3.5-turbo
- Prompt:
Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 1195ms (1.20s)
- Timestamp: 4/4/2025, 2:38:37 PM
fibonacci - openrouter/quasar-alpha
- Prompt:
Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 701ms (0.70s)
- Timestamp: 4/4/2025, 2:38:50 PM
square_root - openai/gpt-3.5-turbo
- Prompt:
Calculate the square root of 16. Return only the number, no explanation. - Expected:
4 - Actual:
4 - Duration: 793ms (0.79s)
- Timestamp: 4/4/2025, 2:38:51 PM
square_root - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Calculate the square root of 16. Return only the number, no explanation. - Expected:
4 - Actual:
4 - Duration: 16332ms (16.33s)
- Timestamp: 4/4/2025, 2:39:08 PM
square_root - openai/gpt-4o-mini
- Prompt:
Calculate the square root of 16. Return only the number, no explanation. - Expected:
4 - Actual:
4 - Duration: 1012ms (1.01s)
- Timestamp: 4/4/2025, 2:39:09 PM
square_root - openrouter/quasar-alpha
- Prompt:
Calculate the square root of 16. Return only the number, no explanation. - Expected:
4 - Actual:
4 - Duration: 1535ms (1.53s)
- Timestamp: 4/4/2025, 2:39:10 PM
power - openai/gpt-3.5-turbo
- Prompt:
Calculate 2 raised to the power of 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 922ms (0.92s)
- Timestamp: 4/4/2025, 2:39:11 PM
power - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Calculate 2 raised to the power of 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 7091ms (7.09s)
- Timestamp: 4/4/2025, 2:39:18 PM
power - openai/gpt-4o-mini
- Prompt:
Calculate 2 raised to the power of 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 1004ms (1.00s)
- Timestamp: 4/4/2025, 2:39:19 PM
power - openrouter/quasar-alpha
- Prompt:
Calculate 2 raised to the power of 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 1567ms (1.57s)
- Timestamp: 4/4/2025, 2:39:21 PM