mono/packages/kbot/tests/unit/reports/basic.md
2025-04-06 17:35:53 +02:00

3.5 KiB

Basic Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
addition openai/gpt-4o-mini 709 0.71
addition openrouter/quasar-alpha 803 0.80
addition openai/gpt-3.5-turbo 1182 1.18
addition deepseek/deepseek-r1-distill-qwen-14b:free 2473 2.47
multiplication openai/gpt-3.5-turbo 916 0.92
multiplication openai/gpt-4o-mini 1232 1.23
multiplication openrouter/quasar-alpha 5757 5.76
multiplication deepseek/deepseek-r1-distill-qwen-14b:free 6224 6.22
division openai/gpt-4o-mini 680 0.68
division openai/gpt-3.5-turbo 893 0.89
division openrouter/quasar-alpha 1031 1.03
division deepseek/deepseek-r1-distill-qwen-14b:free 1612 1.61

Summary

  • Total Tests: 12
  • Passed: 11
  • Failed: 1
  • Success Rate: 91.67%
  • Average Duration: 1959ms (1.96s)

Failed Tests

addition - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: `The result of adding 5 and 3 is:

5 + 3 = \boxed{8}
\]`
- Duration: 2473ms (2.47s)
- Reason: Expected 8, but got the result of adding 5 and 3 is:

\[
5 + 3 = \boxed{8}
  • Timestamp: 4/6/2025, 5:30:36 PM

Passed Tests

addition - openai/gpt-3.5-turbo

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 1182ms (1.18s)
  • Timestamp: 4/6/2025, 5:30:34 PM

addition - openai/gpt-4o-mini

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 709ms (0.71s)
  • Timestamp: 4/6/2025, 5:30:37 PM

addition - openrouter/quasar-alpha

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 803ms (0.80s)
  • Timestamp: 4/6/2025, 5:30:38 PM

multiplication - openai/gpt-3.5-turbo

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 916ms (0.92s)
  • Timestamp: 4/6/2025, 5:30:39 PM

multiplication - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 6224ms (6.22s)
  • Timestamp: 4/6/2025, 5:30:45 PM

multiplication - openai/gpt-4o-mini

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 1232ms (1.23s)
  • Timestamp: 4/6/2025, 5:30:46 PM

multiplication - openrouter/quasar-alpha

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 5757ms (5.76s)
  • Timestamp: 4/6/2025, 5:30:52 PM

division - openai/gpt-3.5-turbo

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 893ms (0.89s)
  • Timestamp: 4/6/2025, 5:30:53 PM

division - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 1612ms (1.61s)
  • Timestamp: 4/6/2025, 5:30:55 PM

division - openai/gpt-4o-mini

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 680ms (0.68s)
  • Timestamp: 4/6/2025, 5:30:55 PM

division - openrouter/quasar-alpha

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 1031ms (1.03s)
  • Timestamp: 4/6/2025, 5:30:56 PM