mono/packages/kbot/tests/unit/reports/basic.md

4.8 KiB

Basic Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
addition openai/gpt-4o-mini 893 0.89
addition deepseek/deepseek-r1-distill-qwen-14b:free 1657 1.66
addition openai/gpt-3.5-turbo 1930 1.93
addition openrouter/quasar-alpha 2215 2.21
multiplication openai/gpt-3.5-turbo 868 0.87
multiplication openrouter/quasar-alpha 930 0.93
multiplication openai/gpt-4o-mini 967 0.97
multiplication deepseek/deepseek-r1-distill-qwen-14b:free 1139 1.14
division openai/gpt-4o-mini 752 0.75
division openai/gpt-3.5-turbo 913 0.91
division openrouter/quasar-alpha 1182 1.18
division deepseek/deepseek-r1-distill-qwen-14b:free 1626 1.63
web_content deepseek/deepseek-r1-distill-qwen-14b:free 261 0.26
web_content openai/gpt-3.5-turbo 3352 3.35
web_content openai/gpt-4o-mini 8906 8.91
web_content openrouter/quasar-alpha 11391 11.39

Summary

  • Total Tests: 16
  • Passed: 12
  • Failed: 4
  • Success Rate: 75.00%
  • Average Duration: 2436ms (2.44s)

Failed Tests

web_content - openai/gpt-3.5-turbo

  • Prompt: Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.
  • Expected: yes
  • Actual: ``
  • Duration: 3352ms (3.35s)
  • Reason: Model returned empty response
  • Timestamp: 4/7/2025, 7:03:50 PM

web_content - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.
  • Expected: yes
  • Actual: ``
  • Duration: 261ms (0.26s)
  • Reason: Model returned empty response
  • Timestamp: 4/7/2025, 7:03:50 PM

web_content - openai/gpt-4o-mini

  • Prompt: Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.
  • Expected: yes
  • Actual: ``
  • Duration: 8906ms (8.91s)
  • Reason: Model returned empty response
  • Timestamp: 4/7/2025, 7:03:59 PM

web_content - openrouter/quasar-alpha

  • Prompt: Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.
  • Expected: yes
  • Actual: ``
  • Duration: 11391ms (11.39s)
  • Reason: Model returned empty response
  • Timestamp: 4/7/2025, 7:04:11 PM

Passed Tests

addition - openai/gpt-3.5-turbo

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 1930ms (1.93s)
  • Timestamp: 4/7/2025, 7:03:33 PM

addition - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 1657ms (1.66s)
  • Timestamp: 4/7/2025, 7:03:35 PM

addition - openai/gpt-4o-mini

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 893ms (0.89s)
  • Timestamp: 4/7/2025, 7:03:36 PM

addition - openrouter/quasar-alpha

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 2215ms (2.21s)
  • Timestamp: 4/7/2025, 7:03:38 PM

multiplication - openai/gpt-3.5-turbo

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 868ms (0.87s)
  • Timestamp: 4/7/2025, 7:03:39 PM

multiplication - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 1139ms (1.14s)
  • Timestamp: 4/7/2025, 7:03:40 PM

multiplication - openai/gpt-4o-mini

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 967ms (0.97s)
  • Timestamp: 4/7/2025, 7:03:41 PM

multiplication - openrouter/quasar-alpha

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 930ms (0.93s)
  • Timestamp: 4/7/2025, 7:03:42 PM

division - openai/gpt-3.5-turbo

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 913ms (0.91s)
  • Timestamp: 4/7/2025, 7:03:43 PM

division - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 1626ms (1.63s)
  • Timestamp: 4/7/2025, 7:03:45 PM

division - openai/gpt-4o-mini

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 752ms (0.75s)
  • Timestamp: 4/7/2025, 7:03:45 PM

division - openrouter/quasar-alpha

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 1182ms (1.18s)
  • Timestamp: 4/7/2025, 7:03:47 PM