mono/packages/kbot/tests/unit/reports/basic.md
2025-04-18 10:03:03 +02:00

95 lines
2.4 KiB
Markdown

# Basic Operations Test Results
## Highscores
### Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|------|-------|--------------|--------------|
| addition | openai/gpt-4o-mini | 1162 | 1.16 |
| addition | openai/gpt-3.5-turbo | 2646 | 2.65 |
| multiplication | openai/gpt-4o-mini | 666 | 0.67 |
| multiplication | openai/gpt-3.5-turbo | 958 | 0.96 |
| division | openai/gpt-4o-mini | 905 | 0.91 |
| division | openai/gpt-3.5-turbo | 1096 | 1.10 |
| web_content | openai/gpt-3.5-turbo | 3306 | 3.31 |
| web_content | openai/gpt-4o-mini | 7600 | 7.60 |
## Summary
- Total Tests: 8
- Passed: 7
- Failed: 1
- Success Rate: 87.50%
- Average Duration: 2292ms (2.29s)
## Failed Tests
### web_content - openai/gpt-3.5-turbo
- Prompt: `Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.`
- Expected: `yes`
- Actual: ``
- Duration: 3306ms (3.31s)
- Reason: Model returned empty response
- Timestamp: 4/18/2025, 8:48:00 AM
## Passed Tests
### addition - openai/gpt-3.5-turbo
- Prompt: `add 5 and 3. Return only the number, no explanation.`
- Expected: `8`
- Actual: `8`
- Duration: 2646ms (2.65s)
- Timestamp: 4/18/2025, 8:47:52 AM
### addition - openai/gpt-4o-mini
- Prompt: `add 5 and 3. Return only the number, no explanation.`
- Expected: `8`
- Actual: `8`
- Duration: 1162ms (1.16s)
- Timestamp: 4/18/2025, 8:47:53 AM
### multiplication - openai/gpt-3.5-turbo
- Prompt: `multiply 8 and 3. Return only the number, no explanation.`
- Expected: `24`
- Actual: `24`
- Duration: 958ms (0.96s)
- Timestamp: 4/18/2025, 8:47:54 AM
### multiplication - openai/gpt-4o-mini
- Prompt: `multiply 8 and 3. Return only the number, no explanation.`
- Expected: `24`
- Actual: `24`
- Duration: 666ms (0.67s)
- Timestamp: 4/18/2025, 8:47:55 AM
### division - openai/gpt-3.5-turbo
- Prompt: `divide 15 by 3. Return only the number, no explanation.`
- Expected: `5`
- Actual: `5`
- Duration: 1096ms (1.10s)
- Timestamp: 4/18/2025, 8:47:56 AM
### division - openai/gpt-4o-mini
- Prompt: `divide 15 by 3. Return only the number, no explanation.`
- Expected: `5`
- Actual: `5`
- Duration: 905ms (0.91s)
- Timestamp: 4/18/2025, 8:47:57 AM
### web_content - openai/gpt-4o-mini
- Prompt: `Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.`
- Expected: `yes`
- Actual: `yes`
- Duration: 7600ms (7.60s)
- Timestamp: 4/18/2025, 8:48:08 AM