mono/packages/kbot/tests/unit/reports/basic.md

2.4 KiB

Basic Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
addition openai/gpt-4o-mini 657 0.66
addition openai/gpt-3.5-turbo 783 0.78
multiplication openai/gpt-3.5-turbo 566 0.57
multiplication openai/gpt-4o-mini 670 0.67
division openai/gpt-4o-mini 609 0.61
division openai/gpt-3.5-turbo 2385 2.38
web_content openai/gpt-3.5-turbo 290 0.29
web_content openai/gpt-4o-mini 7277 7.28

Summary

  • Total Tests: 8
  • Passed: 7
  • Failed: 1
  • Success Rate: 87.50%
  • Average Duration: 1655ms (1.65s)

Failed Tests

web_content - openai/gpt-3.5-turbo

  • Prompt: Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.
  • Expected: yes
  • Actual: ``
  • Duration: 290ms (0.29s)
  • Reason: Model returned empty response
  • Timestamp: 6/3/2025, 11:33:01 PM

Passed Tests

addition - openai/gpt-3.5-turbo

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 783ms (0.78s)
  • Timestamp: 6/3/2025, 11:32:56 PM

addition - openai/gpt-4o-mini

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 657ms (0.66s)
  • Timestamp: 6/3/2025, 11:32:57 PM

multiplication - openai/gpt-3.5-turbo

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 566ms (0.57s)
  • Timestamp: 6/3/2025, 11:32:57 PM

multiplication - openai/gpt-4o-mini

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 670ms (0.67s)
  • Timestamp: 6/3/2025, 11:32:58 PM

division - openai/gpt-3.5-turbo

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 2385ms (2.38s)
  • Timestamp: 6/3/2025, 11:33:00 PM

division - openai/gpt-4o-mini

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 609ms (0.61s)
  • Timestamp: 6/3/2025, 11:33:01 PM

web_content - openai/gpt-4o-mini

  • Prompt: Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.
  • Expected: yes
  • Actual: yes
  • Duration: 7277ms (7.28s)
  • Timestamp: 6/3/2025, 11:33:09 PM