mono/packages/kbot/tests/unit/reports/basic.md

2.4 KiB

Basic Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
addition openai/gpt-4o-mini 514 0.51
addition openai/gpt-3.5-turbo 771 0.77
multiplication openai/gpt-3.5-turbo 624 0.62
multiplication openai/gpt-4o-mini 721 0.72
division openai/gpt-3.5-turbo 513 0.51
division openai/gpt-4o-mini 895 0.90
web_content openai/gpt-3.5-turbo 220 0.22
web_content openai/gpt-4o-mini 4358 4.36

Summary

  • Total Tests: 8
  • Passed: 7
  • Failed: 1
  • Success Rate: 87.50%
  • Average Duration: 1077ms (1.08s)

Failed Tests

web_content - openai/gpt-3.5-turbo

  • Prompt: Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.
  • Expected: yes
  • Actual: ``
  • Duration: 220ms (0.22s)
  • Reason: Model returned empty response
  • Timestamp: 6/5/2025, 8:46:11 PM

Passed Tests

addition - openai/gpt-3.5-turbo

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 771ms (0.77s)
  • Timestamp: 6/5/2025, 8:46:08 PM

addition - openai/gpt-4o-mini

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 514ms (0.51s)
  • Timestamp: 6/5/2025, 8:46:08 PM

multiplication - openai/gpt-3.5-turbo

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 624ms (0.62s)
  • Timestamp: 6/5/2025, 8:46:09 PM

multiplication - openai/gpt-4o-mini

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 721ms (0.72s)
  • Timestamp: 6/5/2025, 8:46:09 PM

division - openai/gpt-3.5-turbo

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 513ms (0.51s)
  • Timestamp: 6/5/2025, 8:46:10 PM

division - openai/gpt-4o-mini

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 895ms (0.90s)
  • Timestamp: 6/5/2025, 8:46:11 PM

web_content - openai/gpt-4o-mini

  • Prompt: Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.
  • Expected: yes
  • Actual: Yes
  • Duration: 4358ms (4.36s)
  • Timestamp: 6/5/2025, 8:46:15 PM