113 lines
3.3 KiB
Markdown
113 lines
3.3 KiB
Markdown
# Math Operations Test Results
|
|
|
|
## Highscores
|
|
|
|
### Performance Rankings (Duration)
|
|
|
|
| Test | Model | Duration (ms) | Duration (s) |
|
|
|------|-------|--------------|--------------|
|
|
| quadratic | openai/gpt-4o-mini | 1088 | 1.09 |
|
|
| quadratic | openai/gpt-3.5-turbo | 1202 | 1.20 |
|
|
| factorial | openai/gpt-4o-mini | 481 | 0.48 |
|
|
| factorial | openai/gpt-3.5-turbo | 503 | 0.50 |
|
|
| fibonacci | openai/gpt-3.5-turbo | 503 | 0.50 |
|
|
| fibonacci | openai/gpt-4o-mini | 601 | 0.60 |
|
|
| square_root | openai/gpt-4o-mini | 539 | 0.54 |
|
|
| square_root | openai/gpt-3.5-turbo | 738 | 0.74 |
|
|
| power | openai/gpt-3.5-turbo | 592 | 0.59 |
|
|
| power | openai/gpt-4o-mini | 1103 | 1.10 |
|
|
|
|
## Summary
|
|
|
|
- Total Tests: 10
|
|
- Passed: 9
|
|
- Failed: 1
|
|
- Success Rate: 90.00%
|
|
- Average Duration: 735ms (0.73s)
|
|
|
|
## Failed Tests
|
|
|
|
### quadratic - openai/gpt-3.5-turbo
|
|
|
|
- Prompt: `Solve the quadratic equation x² + 5x + 6 = 0. Respond ONLY with the solutions as comma-separated numbers (e.g., -3,-2). No other text.`
|
|
- Expected: `-3,-2`
|
|
- Actual: `-2,-3`
|
|
- Duration: 1202ms (1.20s)
|
|
- Reason: Expected -3,-2, but got -2,-3
|
|
- Timestamp: 6/5/2025, 8:46:07 PM
|
|
|
|
## Passed Tests
|
|
|
|
### quadratic - openai/gpt-4o-mini
|
|
|
|
- Prompt: `Solve the quadratic equation x² + 5x + 6 = 0. Respond ONLY with the solutions as comma-separated numbers (e.g., -3,-2). No other text.`
|
|
- Expected: `-3,-2`
|
|
- Actual: `-3,-2`
|
|
- Duration: 1088ms (1.09s)
|
|
- Timestamp: 6/5/2025, 8:46:09 PM
|
|
|
|
### factorial - openai/gpt-3.5-turbo
|
|
|
|
- Prompt: `Calculate 5! (factorial of 5). Respond ONLY with the final numerical answer. No explanation, no other text.`
|
|
- Expected: `120`
|
|
- Actual: `120`
|
|
- Duration: 503ms (0.50s)
|
|
- Timestamp: 6/5/2025, 8:46:09 PM
|
|
|
|
### factorial - openai/gpt-4o-mini
|
|
|
|
- Prompt: `Calculate 5! (factorial of 5). Respond ONLY with the final numerical answer. No explanation, no other text.`
|
|
- Expected: `120`
|
|
- Actual: `120`
|
|
- Duration: 481ms (0.48s)
|
|
- Timestamp: 6/5/2025, 8:46:10 PM
|
|
|
|
### fibonacci - openai/gpt-3.5-turbo
|
|
|
|
- Prompt: `Calculate the 6th number in the Fibonacci sequence (assuming F(1)=1, F(2)=1). Respond ONLY with the final numerical answer. No other text.`
|
|
- Expected: `8`
|
|
- Actual: `8`
|
|
- Duration: 503ms (0.50s)
|
|
- Timestamp: 6/5/2025, 8:46:10 PM
|
|
|
|
### fibonacci - openai/gpt-4o-mini
|
|
|
|
- Prompt: `Calculate the 6th number in the Fibonacci sequence (assuming F(1)=1, F(2)=1). Respond ONLY with the final numerical answer. No other text.`
|
|
- Expected: `8`
|
|
- Actual: `8`
|
|
- Duration: 601ms (0.60s)
|
|
- Timestamp: 6/5/2025, 8:46:11 PM
|
|
|
|
### square_root - openai/gpt-3.5-turbo
|
|
|
|
- Prompt: `Calculate the square root of 16. Respond ONLY with the final numerical answer. No other text.`
|
|
- Expected: `4`
|
|
- Actual: `4`
|
|
- Duration: 738ms (0.74s)
|
|
- Timestamp: 6/5/2025, 8:46:11 PM
|
|
|
|
### square_root - openai/gpt-4o-mini
|
|
|
|
- Prompt: `Calculate the square root of 16. Respond ONLY with the final numerical answer. No other text.`
|
|
- Expected: `4`
|
|
- Actual: `4`
|
|
- Duration: 539ms (0.54s)
|
|
- Timestamp: 6/5/2025, 8:46:12 PM
|
|
|
|
### power - openai/gpt-3.5-turbo
|
|
|
|
- Prompt: `Calculate 2 raised to the power of 3. Respond ONLY with the final numerical answer. No other text.`
|
|
- Expected: `8`
|
|
- Actual: `8`
|
|
- Duration: 592ms (0.59s)
|
|
- Timestamp: 6/5/2025, 8:46:12 PM
|
|
|
|
### power - openai/gpt-4o-mini
|
|
|
|
- Prompt: `Calculate 2 raised to the power of 3. Respond ONLY with the final numerical answer. No other text.`
|
|
- Expected: `8`
|
|
- Actual: `8`
|
|
- Duration: 1103ms (1.10s)
|
|
- Timestamp: 6/5/2025, 8:46:14 PM
|
|
|