mono/packages/kbot/tests/unit/reports/basic.md
2025-04-18 10:03:03 +02:00

2.4 KiB

Basic Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
addition openai/gpt-4o-mini 1162 1.16
addition openai/gpt-3.5-turbo 2646 2.65
multiplication openai/gpt-4o-mini 666 0.67
multiplication openai/gpt-3.5-turbo 958 0.96
division openai/gpt-4o-mini 905 0.91
division openai/gpt-3.5-turbo 1096 1.10
web_content openai/gpt-3.5-turbo 3306 3.31
web_content openai/gpt-4o-mini 7600 7.60

Summary

  • Total Tests: 8
  • Passed: 7
  • Failed: 1
  • Success Rate: 87.50%
  • Average Duration: 2292ms (2.29s)

Failed Tests

web_content - openai/gpt-3.5-turbo

  • Prompt: Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.
  • Expected: yes
  • Actual: ``
  • Duration: 3306ms (3.31s)
  • Reason: Model returned empty response
  • Timestamp: 4/18/2025, 8:48:00 AM

Passed Tests

addition - openai/gpt-3.5-turbo

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 2646ms (2.65s)
  • Timestamp: 4/18/2025, 8:47:52 AM

addition - openai/gpt-4o-mini

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 1162ms (1.16s)
  • Timestamp: 4/18/2025, 8:47:53 AM

multiplication - openai/gpt-3.5-turbo

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 958ms (0.96s)
  • Timestamp: 4/18/2025, 8:47:54 AM

multiplication - openai/gpt-4o-mini

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 666ms (0.67s)
  • Timestamp: 4/18/2025, 8:47:55 AM

division - openai/gpt-3.5-turbo

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 1096ms (1.10s)
  • Timestamp: 4/18/2025, 8:47:56 AM

division - openai/gpt-4o-mini

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 905ms (0.91s)
  • Timestamp: 4/18/2025, 8:47:57 AM

web_content - openai/gpt-4o-mini

  • Prompt: Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.
  • Expected: yes
  • Actual: yes
  • Duration: 7600ms (7.60s)
  • Timestamp: 4/18/2025, 8:48:08 AM