6.5 KiB
6.5 KiB
Math Operations Test Results
Highscores
Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|---|---|---|---|
| quadratic | openrouter/quasar-alpha | 880 | 0.88 |
| quadratic | openai/gpt-4o-mini | 1127 | 1.13 |
| quadratic | openai/gpt-3.5-turbo | 1193 | 1.19 |
| quadratic | deepseek/deepseek-r1-distill-qwen-14b:free | 10584 | 10.58 |
| factorial | openai/gpt-4o-mini | 738 | 0.74 |
| factorial | openai/gpt-3.5-turbo | 771 | 0.77 |
| factorial | openrouter/quasar-alpha | 843 | 0.84 |
| factorial | deepseek/deepseek-r1-distill-qwen-14b:free | 5106 | 5.11 |
| fibonacci | deepseek/deepseek-r1-distill-qwen-14b:free | 589 | 0.59 |
| fibonacci | openai/gpt-4o-mini | 881 | 0.88 |
| fibonacci | openai/gpt-3.5-turbo | 1018 | 1.02 |
| fibonacci | openrouter/quasar-alpha | 1293 | 1.29 |
| square_root | deepseek/deepseek-r1-distill-qwen-14b:free | 421 | 0.42 |
| square_root | openai/gpt-4o-mini | 882 | 0.88 |
| square_root | openai/gpt-3.5-turbo | 950 | 0.95 |
| square_root | openrouter/quasar-alpha | 1064 | 1.06 |
| power | deepseek/deepseek-r1-distill-qwen-14b:free | 454 | 0.45 |
| power | openai/gpt-4o-mini | 783 | 0.78 |
| power | openrouter/quasar-alpha | 783 | 0.78 |
| power | openai/gpt-3.5-turbo | 793 | 0.79 |
Summary
- Total Tests: 20
- Passed: 12
- Failed: 8
- Success Rate: 60.00%
- Average Duration: 1558ms (1.56s)
Failed Tests
quadratic - openai/gpt-3.5-turbo
- Prompt:
Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation. - Expected:
-2,-3 - Actual:
-2, -3 - Duration: 1193ms (1.19s)
- Reason: Expected -2,-3, but got -2, -3
- Timestamp: 4/6/2025, 5:42:04 PM
quadratic - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation. - Expected:
-2,-3 - Actual: `The solutions to the quadratic equation
x^2 + 5x + 6 = 0arex = -2andx = -3.
-2,-3`
- Duration: 10584ms (10.58s)
- Reason: Expected -2,-3, but got the solutions to the quadratic equation
x^2 + 5x + 6 = 0arex = -2andx = -3.
-2,-3
- Timestamp: 4/6/2025, 5:42:15 PM
quadratic - openai/gpt-4o-mini
- Prompt:
Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation. - Expected:
-2,-3 - Actual:
-2, -3 - Duration: 1127ms (1.13s)
- Reason: Expected -2,-3, but got -2, -3
- Timestamp: 4/6/2025, 5:42:16 PM
quadratic - openrouter/quasar-alpha
- Prompt:
Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation. - Expected:
-2,-3 - Actual:
-2, -3 - Duration: 880ms (0.88s)
- Reason: Expected -2,-3, but got -2, -3
- Timestamp: 4/6/2025, 5:42:17 PM
fibonacci - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation. - Expected:
8 - Actual: ``
- Duration: 589ms (0.59s)
- Reason: Model returned empty response
- Timestamp: 4/6/2025, 5:42:26 PM
fibonacci - openai/gpt-4o-mini
- Prompt:
Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation. - Expected:
8 - Actual:
5 - Duration: 881ms (0.88s)
- Reason: Expected 8, but got 5
- Timestamp: 4/6/2025, 5:42:27 PM
square_root - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Calculate the square root of 16. Return only the number, no explanation. - Expected:
4 - Actual: ``
- Duration: 421ms (0.42s)
- Reason: Model returned empty response
- Timestamp: 4/6/2025, 5:42:29 PM
power - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Calculate 2 raised to the power of 3. Return only the number, no explanation. - Expected:
8 - Actual: ``
- Duration: 454ms (0.45s)
- Reason: Model returned empty response
- Timestamp: 4/6/2025, 5:42:33 PM
Passed Tests
factorial - openai/gpt-3.5-turbo
- Prompt:
Calculate 5! (factorial of 5). Return only the number, no explanation. - Expected:
120 - Actual:
120 - Duration: 771ms (0.77s)
- Timestamp: 4/6/2025, 5:42:18 PM
factorial - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Calculate 5! (factorial of 5). Return only the number, no explanation. - Expected:
120 - Actual:
120 - Duration: 5106ms (5.11s)
- Timestamp: 4/6/2025, 5:42:23 PM
factorial - openai/gpt-4o-mini
- Prompt:
Calculate 5! (factorial of 5). Return only the number, no explanation. - Expected:
120 - Actual:
120 - Duration: 738ms (0.74s)
- Timestamp: 4/6/2025, 5:42:23 PM
factorial - openrouter/quasar-alpha
- Prompt:
Calculate 5! (factorial of 5). Return only the number, no explanation. - Expected:
120 - Actual:
120 - Duration: 843ms (0.84s)
- Timestamp: 4/6/2025, 5:42:24 PM
fibonacci - openai/gpt-3.5-turbo
- Prompt:
Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 1018ms (1.02s)
- Timestamp: 4/6/2025, 5:42:25 PM
fibonacci - openrouter/quasar-alpha
- Prompt:
Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 1293ms (1.29s)
- Timestamp: 4/6/2025, 5:42:28 PM
square_root - openai/gpt-3.5-turbo
- Prompt:
Calculate the square root of 16. Return only the number, no explanation. - Expected:
4 - Actual:
4 - Duration: 950ms (0.95s)
- Timestamp: 4/6/2025, 5:42:29 PM
square_root - openai/gpt-4o-mini
- Prompt:
Calculate the square root of 16. Return only the number, no explanation. - Expected:
4 - Actual:
4 - Duration: 882ms (0.88s)
- Timestamp: 4/6/2025, 5:42:30 PM
square_root - openrouter/quasar-alpha
- Prompt:
Calculate the square root of 16. Return only the number, no explanation. - Expected:
4 - Actual:
4 - Duration: 1064ms (1.06s)
- Timestamp: 4/6/2025, 5:42:31 PM
power - openai/gpt-3.5-turbo
- Prompt:
Calculate 2 raised to the power of 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 793ms (0.79s)
- Timestamp: 4/6/2025, 5:42:32 PM
power - openai/gpt-4o-mini
- Prompt:
Calculate 2 raised to the power of 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 783ms (0.78s)
- Timestamp: 4/6/2025, 5:42:34 PM
power - openrouter/quasar-alpha
- Prompt:
Calculate 2 raised to the power of 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Duration: 783ms (0.78s)
- Timestamp: 4/6/2025, 5:42:34 PM