4.9 KiB
4.9 KiB
Math Operations Test Results
Highscores
Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|---|---|---|---|
| quadratic | openai/gpt-4o-mini | 776 | 0.78 |
| quadratic | openrouter/quasar-alpha | 965 | 0.96 |
| quadratic | openai/gpt-3.5-turbo | 1286 | 1.29 |
| factorial | openai/gpt-4o-mini | 753 | 0.75 |
| factorial | openrouter/quasar-alpha | 762 | 0.76 |
| factorial | openai/gpt-3.5-turbo | 838 | 0.84 |
| fibonacci | openai/gpt-4o-mini | 769 | 0.77 |
| fibonacci | openai/gpt-3.5-turbo | 866 | 0.87 |
| fibonacci | openrouter/quasar-alpha | 930 | 0.93 |
| square_root | openai/gpt-3.5-turbo | 816 | 0.82 |
| square_root | openai/gpt-4o-mini | 927 | 0.93 |
| square_root | openrouter/quasar-alpha | 5005 | 5.00 |
| power | openai/gpt-3.5-turbo | 955 | 0.95 |
| power | openrouter/quasar-alpha | 1432 | 1.43 |
| power | openai/gpt-4o-mini | 1725 | 1.73 |
Summary
- Total Tests: 15
- Passed: 13
- Failed: 2
- Success Rate: 86.67%
- Average Duration: 1254ms (1.25s)
Failed Tests
quadratic - openai/gpt-3.5-turbo
- Prompt:
Solve the quadratic equation x² + 5x + 6 = 0. Respond ONLY with the solutions as comma-separated numbers (e.g., -3,-2). No other text. - Expected:
-3,-2 - Actual:
-2,-3 - Duration: 1286ms (1.29s)
- Reason: Expected -3,-2, but got -2,-3
- Timestamp: 4/7/2025, 7:08:26 PM
quadratic - openrouter/quasar-alpha
- Prompt:
Solve the quadratic equation x² + 5x + 6 = 0. Respond ONLY with the solutions as comma-separated numbers (e.g., -3,-2). No other text. - Expected:
-3,-2 - Actual:
-2,-3 - Duration: 965ms (0.96s)
- Reason: Expected -3,-2, but got -2,-3
- Timestamp: 4/7/2025, 7:08:28 PM
Passed Tests
quadratic - openai/gpt-4o-mini
- Prompt:
Solve the quadratic equation x² + 5x + 6 = 0. Respond ONLY with the solutions as comma-separated numbers (e.g., -3,-2). No other text. - Expected:
-3,-2 - Actual:
-3,-2 - Duration: 776ms (0.78s)
- Timestamp: 4/7/2025, 7:08:27 PM
factorial - openai/gpt-3.5-turbo
- Prompt:
Calculate 5! (factorial of 5). Respond ONLY with the final numerical answer. No explanation, no other text. - Expected:
120 - Actual:
120 - Duration: 838ms (0.84s)
- Timestamp: 4/7/2025, 7:08:29 PM
factorial - openai/gpt-4o-mini
- Prompt:
Calculate 5! (factorial of 5). Respond ONLY with the final numerical answer. No explanation, no other text. - Expected:
120 - Actual:
120 - Duration: 753ms (0.75s)
- Timestamp: 4/7/2025, 7:08:29 PM
factorial - openrouter/quasar-alpha
- Prompt:
Calculate 5! (factorial of 5). Respond ONLY with the final numerical answer. No explanation, no other text. - Expected:
120 - Actual:
120 - Duration: 762ms (0.76s)
- Timestamp: 4/7/2025, 7:08:30 PM
fibonacci - openai/gpt-3.5-turbo
- Prompt:
Calculate the 6th number in the Fibonacci sequence (assuming F(1)=1, F(2)=1). Respond ONLY with the final numerical answer. No other text. - Expected:
8 - Actual:
8 - Duration: 866ms (0.87s)
- Timestamp: 4/7/2025, 7:08:31 PM
fibonacci - openai/gpt-4o-mini
- Prompt:
Calculate the 6th number in the Fibonacci sequence (assuming F(1)=1, F(2)=1). Respond ONLY with the final numerical answer. No other text. - Expected:
8 - Actual:
8 - Duration: 769ms (0.77s)
- Timestamp: 4/7/2025, 7:08:32 PM
fibonacci - openrouter/quasar-alpha
- Prompt:
Calculate the 6th number in the Fibonacci sequence (assuming F(1)=1, F(2)=1). Respond ONLY with the final numerical answer. No other text. - Expected:
8 - Actual:
8 - Duration: 930ms (0.93s)
- Timestamp: 4/7/2025, 7:08:33 PM
square_root - openai/gpt-3.5-turbo
- Prompt:
Calculate the square root of 16. Respond ONLY with the final numerical answer. No other text. - Expected:
4 - Actual:
4 - Duration: 816ms (0.82s)
- Timestamp: 4/7/2025, 7:08:34 PM
square_root - openai/gpt-4o-mini
- Prompt:
Calculate the square root of 16. Respond ONLY with the final numerical answer. No other text. - Expected:
4 - Actual:
4 - Duration: 927ms (0.93s)
- Timestamp: 4/7/2025, 7:08:35 PM
square_root - openrouter/quasar-alpha
- Prompt:
Calculate the square root of 16. Respond ONLY with the final numerical answer. No other text. - Expected:
4 - Actual:
4 - Duration: 5005ms (5.00s)
- Timestamp: 4/7/2025, 7:08:40 PM
power - openai/gpt-3.5-turbo
- Prompt:
Calculate 2 raised to the power of 3. Respond ONLY with the final numerical answer. No other text. - Expected:
8 - Actual:
8 - Duration: 955ms (0.95s)
- Timestamp: 4/7/2025, 7:08:41 PM
power - openai/gpt-4o-mini
- Prompt:
Calculate 2 raised to the power of 3. Respond ONLY with the final numerical answer. No other text. - Expected:
8 - Actual:
8 - Duration: 1725ms (1.73s)
- Timestamp: 4/7/2025, 7:08:42 PM
power - openrouter/quasar-alpha
- Prompt:
Calculate 2 raised to the power of 3. Respond ONLY with the final numerical answer. No other text. - Expected:
8 - Actual:
8 - Duration: 1432ms (1.43s)
- Timestamp: 4/7/2025, 7:08:44 PM