170 lines
4.8 KiB
Markdown
170 lines
4.8 KiB
Markdown
# Basic Operations Test Results
|
|
|
|
## Highscores
|
|
|
|
### Performance Rankings (Duration)
|
|
|
|
| Test | Model | Duration (ms) | Duration (s) |
|
|
|------|-------|--------------|--------------|
|
|
| addition | openai/gpt-4o-mini | 726 | 0.73 |
|
|
| addition | openrouter/quasar-alpha | 814 | 0.81 |
|
|
| addition | openai/gpt-3.5-turbo | 1157 | 1.16 |
|
|
| addition | deepseek/deepseek-r1-distill-qwen-14b:free | 4214 | 4.21 |
|
|
| multiplication | openrouter/quasar-alpha | 684 | 0.68 |
|
|
| multiplication | openai/gpt-3.5-turbo | 826 | 0.83 |
|
|
| multiplication | openai/gpt-4o-mini | 856 | 0.86 |
|
|
| multiplication | deepseek/deepseek-r1-distill-qwen-14b:free | 6184 | 6.18 |
|
|
| division | openai/gpt-3.5-turbo | 790 | 0.79 |
|
|
| division | openai/gpt-4o-mini | 823 | 0.82 |
|
|
| division | openrouter/quasar-alpha | 855 | 0.85 |
|
|
| division | deepseek/deepseek-r1-distill-qwen-14b:free | 1502 | 1.50 |
|
|
| web_content | deepseek/deepseek-r1-distill-qwen-14b:free | 263 | 0.26 |
|
|
| web_content | openai/gpt-3.5-turbo | 3311 | 3.31 |
|
|
| web_content | openrouter/quasar-alpha | 8305 | 8.30 |
|
|
| web_content | openai/gpt-4o-mini | 10048 | 10.05 |
|
|
|
|
## Summary
|
|
|
|
- Total Tests: 16
|
|
- Passed: 12
|
|
- Failed: 4
|
|
- Success Rate: 75.00%
|
|
- Average Duration: 2585ms (2.58s)
|
|
|
|
## Failed Tests
|
|
|
|
### web_content - openai/gpt-3.5-turbo
|
|
|
|
- Prompt: `Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.`
|
|
- Expected: `yes`
|
|
- Actual: ``
|
|
- Duration: 3311ms (3.31s)
|
|
- Reason: Model returned empty response
|
|
- Timestamp: 4/6/2025, 5:42:26 PM
|
|
|
|
### web_content - deepseek/deepseek-r1-distill-qwen-14b:free
|
|
|
|
- Prompt: `Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.`
|
|
- Expected: `yes`
|
|
- Actual: ``
|
|
- Duration: 263ms (0.26s)
|
|
- Reason: Model returned empty response
|
|
- Timestamp: 4/6/2025, 5:42:26 PM
|
|
|
|
### web_content - openai/gpt-4o-mini
|
|
|
|
- Prompt: `Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.`
|
|
- Expected: `yes`
|
|
- Actual: ``
|
|
- Duration: 10048ms (10.05s)
|
|
- Reason: Model returned empty response
|
|
- Timestamp: 4/6/2025, 5:42:36 PM
|
|
|
|
### web_content - openrouter/quasar-alpha
|
|
|
|
- Prompt: `Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.`
|
|
- Expected: `yes`
|
|
- Actual: ``
|
|
- Duration: 8305ms (8.30s)
|
|
- Reason: Model returned empty response
|
|
- Timestamp: 4/6/2025, 5:42:44 PM
|
|
|
|
## Passed Tests
|
|
|
|
### addition - openai/gpt-3.5-turbo
|
|
|
|
- Prompt: `add 5 and 3. Return only the number, no explanation.`
|
|
- Expected: `8`
|
|
- Actual: `8`
|
|
- Duration: 1157ms (1.16s)
|
|
- Timestamp: 4/6/2025, 5:42:04 PM
|
|
|
|
### addition - deepseek/deepseek-r1-distill-qwen-14b:free
|
|
|
|
- Prompt: `add 5 and 3. Return only the number, no explanation.`
|
|
- Expected: `8`
|
|
- Actual: `8`
|
|
- Duration: 4214ms (4.21s)
|
|
- Timestamp: 4/6/2025, 5:42:08 PM
|
|
|
|
### addition - openai/gpt-4o-mini
|
|
|
|
- Prompt: `add 5 and 3. Return only the number, no explanation.`
|
|
- Expected: `8`
|
|
- Actual: `8`
|
|
- Duration: 726ms (0.73s)
|
|
- Timestamp: 4/6/2025, 5:42:09 PM
|
|
|
|
### addition - openrouter/quasar-alpha
|
|
|
|
- Prompt: `add 5 and 3. Return only the number, no explanation.`
|
|
- Expected: `8`
|
|
- Actual: `8`
|
|
- Duration: 814ms (0.81s)
|
|
- Timestamp: 4/6/2025, 5:42:10 PM
|
|
|
|
### multiplication - openai/gpt-3.5-turbo
|
|
|
|
- Prompt: `multiply 8 and 3. Return only the number, no explanation.`
|
|
- Expected: `24`
|
|
- Actual: `24`
|
|
- Duration: 826ms (0.83s)
|
|
- Timestamp: 4/6/2025, 5:42:11 PM
|
|
|
|
### multiplication - deepseek/deepseek-r1-distill-qwen-14b:free
|
|
|
|
- Prompt: `multiply 8 and 3. Return only the number, no explanation.`
|
|
- Expected: `24`
|
|
- Actual: `24`
|
|
- Duration: 6184ms (6.18s)
|
|
- Timestamp: 4/6/2025, 5:42:17 PM
|
|
|
|
### multiplication - openai/gpt-4o-mini
|
|
|
|
- Prompt: `multiply 8 and 3. Return only the number, no explanation.`
|
|
- Expected: `24`
|
|
- Actual: `24`
|
|
- Duration: 856ms (0.86s)
|
|
- Timestamp: 4/6/2025, 5:42:18 PM
|
|
|
|
### multiplication - openrouter/quasar-alpha
|
|
|
|
- Prompt: `multiply 8 and 3. Return only the number, no explanation.`
|
|
- Expected: `24`
|
|
- Actual: `24`
|
|
- Duration: 684ms (0.68s)
|
|
- Timestamp: 4/6/2025, 5:42:18 PM
|
|
|
|
### division - openai/gpt-3.5-turbo
|
|
|
|
- Prompt: `divide 15 by 3. Return only the number, no explanation.`
|
|
- Expected: `5`
|
|
- Actual: `5`
|
|
- Duration: 790ms (0.79s)
|
|
- Timestamp: 4/6/2025, 5:42:19 PM
|
|
|
|
### division - deepseek/deepseek-r1-distill-qwen-14b:free
|
|
|
|
- Prompt: `divide 15 by 3. Return only the number, no explanation.`
|
|
- Expected: `5`
|
|
- Actual: `5`
|
|
- Duration: 1502ms (1.50s)
|
|
- Timestamp: 4/6/2025, 5:42:21 PM
|
|
|
|
### division - openai/gpt-4o-mini
|
|
|
|
- Prompt: `divide 15 by 3. Return only the number, no explanation.`
|
|
- Expected: `5`
|
|
- Actual: `5`
|
|
- Duration: 823ms (0.82s)
|
|
- Timestamp: 4/6/2025, 5:42:22 PM
|
|
|
|
### division - openrouter/quasar-alpha
|
|
|
|
- Prompt: `divide 15 by 3. Return only the number, no explanation.`
|
|
- Expected: `5`
|
|
- Actual: `5`
|
|
- Duration: 855ms (0.85s)
|
|
- Timestamp: 4/6/2025, 5:42:22 PM
|
|
|