mono/packages/kbot/tests/unit/reports/basic.md
2025-04-06 17:49:29 +02:00

4.8 KiB

Basic Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
addition openai/gpt-4o-mini 726 0.73
addition openrouter/quasar-alpha 814 0.81
addition openai/gpt-3.5-turbo 1157 1.16
addition deepseek/deepseek-r1-distill-qwen-14b:free 4214 4.21
multiplication openrouter/quasar-alpha 684 0.68
multiplication openai/gpt-3.5-turbo 826 0.83
multiplication openai/gpt-4o-mini 856 0.86
multiplication deepseek/deepseek-r1-distill-qwen-14b:free 6184 6.18
division openai/gpt-3.5-turbo 790 0.79
division openai/gpt-4o-mini 823 0.82
division openrouter/quasar-alpha 855 0.85
division deepseek/deepseek-r1-distill-qwen-14b:free 1502 1.50
web_content deepseek/deepseek-r1-distill-qwen-14b:free 263 0.26
web_content openai/gpt-3.5-turbo 3311 3.31
web_content openrouter/quasar-alpha 8305 8.30
web_content openai/gpt-4o-mini 10048 10.05

Summary

  • Total Tests: 16
  • Passed: 12
  • Failed: 4
  • Success Rate: 75.00%
  • Average Duration: 2585ms (2.58s)

Failed Tests

web_content - openai/gpt-3.5-turbo

  • Prompt: Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.
  • Expected: yes
  • Actual: ``
  • Duration: 3311ms (3.31s)
  • Reason: Model returned empty response
  • Timestamp: 4/6/2025, 5:42:26 PM

web_content - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.
  • Expected: yes
  • Actual: ``
  • Duration: 263ms (0.26s)
  • Reason: Model returned empty response
  • Timestamp: 4/6/2025, 5:42:26 PM

web_content - openai/gpt-4o-mini

  • Prompt: Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.
  • Expected: yes
  • Actual: ``
  • Duration: 10048ms (10.05s)
  • Reason: Model returned empty response
  • Timestamp: 4/6/2025, 5:42:36 PM

web_content - openrouter/quasar-alpha

  • Prompt: Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.
  • Expected: yes
  • Actual: ``
  • Duration: 8305ms (8.30s)
  • Reason: Model returned empty response
  • Timestamp: 4/6/2025, 5:42:44 PM

Passed Tests

addition - openai/gpt-3.5-turbo

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 1157ms (1.16s)
  • Timestamp: 4/6/2025, 5:42:04 PM

addition - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 4214ms (4.21s)
  • Timestamp: 4/6/2025, 5:42:08 PM

addition - openai/gpt-4o-mini

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 726ms (0.73s)
  • Timestamp: 4/6/2025, 5:42:09 PM

addition - openrouter/quasar-alpha

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 814ms (0.81s)
  • Timestamp: 4/6/2025, 5:42:10 PM

multiplication - openai/gpt-3.5-turbo

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 826ms (0.83s)
  • Timestamp: 4/6/2025, 5:42:11 PM

multiplication - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 6184ms (6.18s)
  • Timestamp: 4/6/2025, 5:42:17 PM

multiplication - openai/gpt-4o-mini

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 856ms (0.86s)
  • Timestamp: 4/6/2025, 5:42:18 PM

multiplication - openrouter/quasar-alpha

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 684ms (0.68s)
  • Timestamp: 4/6/2025, 5:42:18 PM

division - openai/gpt-3.5-turbo

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 790ms (0.79s)
  • Timestamp: 4/6/2025, 5:42:19 PM

division - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 1502ms (1.50s)
  • Timestamp: 4/6/2025, 5:42:21 PM

division - openai/gpt-4o-mini

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 823ms (0.82s)
  • Timestamp: 4/6/2025, 5:42:22 PM

division - openrouter/quasar-alpha

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 855ms (0.85s)
  • Timestamp: 4/6/2025, 5:42:22 PM