108 lines
2.7 KiB
Markdown
108 lines
2.7 KiB
Markdown
# Basic Operations Test Results
|
|
|
|
## Highscores
|
|
|
|
### Performance Rankings (Duration)
|
|
|
|
| Test | Model | Duration (ms) | Duration (s) |
|
|
|------|-------|--------------|--------------|
|
|
| addition | openai/gpt-4o-mini | 910 | 0.91 |
|
|
| addition | openai/gpt-3.5-turbo | 1484 | 1.48 |
|
|
| addition | deepseek/deepseek-r1-distill-qwen-14b:free | 8460 | 8.46 |
|
|
| multiplication | openai/gpt-3.5-turbo | 955 | 0.95 |
|
|
| multiplication | openai/gpt-4o-mini | 1095 | 1.09 |
|
|
| multiplication | deepseek/deepseek-r1-distill-qwen-14b:free | 7653 | 7.65 |
|
|
| division | openai/gpt-3.5-turbo | 816 | 0.82 |
|
|
| division | openai/gpt-4o-mini | 954 | 0.95 |
|
|
| division | deepseek/deepseek-r1-distill-qwen-14b:free | 16655 | 16.66 |
|
|
|
|
## Summary
|
|
|
|
- Total Tests: 9
|
|
- Passed: 8
|
|
- Failed: 1
|
|
- Success Rate: 88.89%
|
|
- Average Duration: 4331ms (4.33s)
|
|
|
|
## Failed Tests
|
|
|
|
### division - deepseek/deepseek-r1-distill-qwen-14b:free
|
|
|
|
- Prompt: `divide 15 by 3. Return only the number, no explanation.`
|
|
- Expected: `5`
|
|
- Actual: `15 divided by 3 is 5.
|
|
|
|
Answer: 5`
|
|
- Duration: 16655ms (16.66s)
|
|
- Reason: Expected 5, but got 15 divided by 3 is 5.
|
|
|
|
answer: 5
|
|
- Timestamp: 4/3/2025, 7:14:40 PM
|
|
|
|
## Passed Tests
|
|
|
|
### addition - openai/gpt-3.5-turbo
|
|
|
|
- Prompt: `add 5 and 3. Return only the number, no explanation.`
|
|
- Expected: `8`
|
|
- Actual: `8`
|
|
- Duration: 1484ms (1.48s)
|
|
- Timestamp: 4/3/2025, 7:14:04 PM
|
|
|
|
### addition - deepseek/deepseek-r1-distill-qwen-14b:free
|
|
|
|
- Prompt: `add 5 and 3. Return only the number, no explanation.`
|
|
- Expected: `8`
|
|
- Actual: `8`
|
|
- Duration: 8460ms (8.46s)
|
|
- Timestamp: 4/3/2025, 7:14:12 PM
|
|
|
|
### addition - openai/gpt-4o-mini
|
|
|
|
- Prompt: `add 5 and 3. Return only the number, no explanation.`
|
|
- Expected: `8`
|
|
- Actual: `8`
|
|
- Duration: 910ms (0.91s)
|
|
- Timestamp: 4/3/2025, 7:14:13 PM
|
|
|
|
### multiplication - openai/gpt-3.5-turbo
|
|
|
|
- Prompt: `multiply 8 and 3. Return only the number, no explanation.`
|
|
- Expected: `24`
|
|
- Actual: `24`
|
|
- Duration: 955ms (0.95s)
|
|
- Timestamp: 4/3/2025, 7:14:14 PM
|
|
|
|
### multiplication - deepseek/deepseek-r1-distill-qwen-14b:free
|
|
|
|
- Prompt: `multiply 8 and 3. Return only the number, no explanation.`
|
|
- Expected: `24`
|
|
- Actual: `24`
|
|
- Duration: 7653ms (7.65s)
|
|
- Timestamp: 4/3/2025, 7:14:22 PM
|
|
|
|
### multiplication - openai/gpt-4o-mini
|
|
|
|
- Prompt: `multiply 8 and 3. Return only the number, no explanation.`
|
|
- Expected: `24`
|
|
- Actual: `24`
|
|
- Duration: 1095ms (1.09s)
|
|
- Timestamp: 4/3/2025, 7:14:23 PM
|
|
|
|
### division - openai/gpt-3.5-turbo
|
|
|
|
- Prompt: `divide 15 by 3. Return only the number, no explanation.`
|
|
- Expected: `5`
|
|
- Actual: `5`
|
|
- Duration: 816ms (0.82s)
|
|
- Timestamp: 4/3/2025, 7:14:24 PM
|
|
|
|
### division - openai/gpt-4o-mini
|
|
|
|
- Prompt: `divide 15 by 3. Return only the number, no explanation.`
|
|
- Expected: `5`
|
|
- Actual: `5`
|
|
- Duration: 954ms (0.95s)
|
|
- Timestamp: 4/3/2025, 7:14:41 PM
|
|
|