mono/packages/kbot/tests/unit/reports/basic.md
2025-04-06 16:22:39 +02:00

3.5 KiB

Basic Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
addition openrouter/quasar-alpha 729 0.73
addition openai/gpt-4o-mini 927 0.93
addition openai/gpt-3.5-turbo 1197 1.20
addition deepseek/deepseek-r1-distill-qwen-14b:free 11570 11.57
multiplication openai/gpt-3.5-turbo 863 0.86
multiplication openrouter/quasar-alpha 960 0.96
multiplication openai/gpt-4o-mini 1105 1.10
multiplication deepseek/deepseek-r1-distill-qwen-14b:free 16310 16.31
division openrouter/quasar-alpha 749 0.75
division openai/gpt-4o-mini 856 0.86
division openai/gpt-3.5-turbo 901 0.90
division deepseek/deepseek-r1-distill-qwen-14b:free 11412 11.41

Summary

  • Total Tests: 12
  • Passed: 11
  • Failed: 1
  • Success Rate: 91.67%
  • Average Duration: 3965ms (3.96s)

Failed Tests

division - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: `15 ÷ 3 equals 5.

5`

  • Duration: 11412ms (11.41s)
  • Reason: Expected 5, but got 15 ÷ 3 equals 5.

5

  • Timestamp: 4/6/2025, 4:21:00 PM

Passed Tests

addition - openai/gpt-3.5-turbo

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 1197ms (1.20s)
  • Timestamp: 4/6/2025, 4:20:15 PM

addition - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 11570ms (11.57s)
  • Timestamp: 4/6/2025, 4:20:27 PM

addition - openai/gpt-4o-mini

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 927ms (0.93s)
  • Timestamp: 4/6/2025, 4:20:28 PM

addition - openrouter/quasar-alpha

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 729ms (0.73s)
  • Timestamp: 4/6/2025, 4:20:28 PM

multiplication - openai/gpt-3.5-turbo

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 863ms (0.86s)
  • Timestamp: 4/6/2025, 4:20:29 PM

multiplication - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 16310ms (16.31s)
  • Timestamp: 4/6/2025, 4:20:45 PM

multiplication - openai/gpt-4o-mini

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 1105ms (1.10s)
  • Timestamp: 4/6/2025, 4:20:47 PM

multiplication - openrouter/quasar-alpha

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 960ms (0.96s)
  • Timestamp: 4/6/2025, 4:20:48 PM

division - openai/gpt-3.5-turbo

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 901ms (0.90s)
  • Timestamp: 4/6/2025, 4:20:48 PM

division - openai/gpt-4o-mini

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 856ms (0.86s)
  • Timestamp: 4/6/2025, 4:21:01 PM

division - openrouter/quasar-alpha

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 749ms (0.75s)
  • Timestamp: 4/6/2025, 4:21:02 PM