mono/packages/kbot/tests/unit/reports/basic.md
2025-04-04 14:44:04 +02:00

3.4 KiB

Basic Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
addition openrouter/quasar-alpha 811 0.81
addition openai/gpt-4o-mini 842 0.84
addition openai/gpt-3.5-turbo 1505 1.50
addition deepseek/deepseek-r1-distill-qwen-14b:free 3470 3.47
multiplication openrouter/quasar-alpha 780 0.78
multiplication openai/gpt-3.5-turbo 881 0.88
multiplication openai/gpt-4o-mini 1096 1.10
multiplication deepseek/deepseek-r1-distill-qwen-14b:free 1327 1.33
division openrouter/quasar-alpha 731 0.73
division openai/gpt-3.5-turbo 784 0.78
division openai/gpt-4o-mini 975 0.97
division deepseek/deepseek-r1-distill-qwen-14b:free 4467 4.47

Summary

  • Total Tests: 12
  • Passed: 11
  • Failed: 1
  • Success Rate: 91.67%
  • Average Duration: 1472ms (1.47s)

Failed Tests

multiplication - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: ``
  • Duration: 1327ms (1.33s)
  • Reason: Model returned empty response
  • Timestamp: 4/4/2025, 2:37:03 PM

Passed Tests

addition - openai/gpt-3.5-turbo

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 1505ms (1.50s)
  • Timestamp: 4/4/2025, 2:36:55 PM

addition - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 3470ms (3.47s)
  • Timestamp: 4/4/2025, 2:36:59 PM

addition - openai/gpt-4o-mini

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 842ms (0.84s)
  • Timestamp: 4/4/2025, 2:37:00 PM

addition - openrouter/quasar-alpha

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 811ms (0.81s)
  • Timestamp: 4/4/2025, 2:37:00 PM

multiplication - openai/gpt-3.5-turbo

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 881ms (0.88s)
  • Timestamp: 4/4/2025, 2:37:01 PM

multiplication - openai/gpt-4o-mini

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 1096ms (1.10s)
  • Timestamp: 4/4/2025, 2:37:04 PM

multiplication - openrouter/quasar-alpha

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 780ms (0.78s)
  • Timestamp: 4/4/2025, 2:37:05 PM

division - openai/gpt-3.5-turbo

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 784ms (0.78s)
  • Timestamp: 4/4/2025, 2:37:05 PM

division - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 4467ms (4.47s)
  • Timestamp: 4/4/2025, 2:37:10 PM

division - openai/gpt-4o-mini

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 975ms (0.97s)
  • Timestamp: 4/4/2025, 2:37:11 PM

division - openrouter/quasar-alpha

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 731ms (0.73s)
  • Timestamp: 4/4/2025, 2:37:11 PM