mono/packages/kbot/tests/unit/reports/basic.md

2.7 KiB

Basic Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
addition openai/gpt-4o-mini 910 0.91
addition openai/gpt-3.5-turbo 1484 1.48
addition deepseek/deepseek-r1-distill-qwen-14b:free 8460 8.46
multiplication openai/gpt-3.5-turbo 955 0.95
multiplication openai/gpt-4o-mini 1095 1.09
multiplication deepseek/deepseek-r1-distill-qwen-14b:free 7653 7.65
division openai/gpt-3.5-turbo 816 0.82
division openai/gpt-4o-mini 954 0.95
division deepseek/deepseek-r1-distill-qwen-14b:free 16655 16.66

Summary

  • Total Tests: 9
  • Passed: 8
  • Failed: 1
  • Success Rate: 88.89%
  • Average Duration: 4331ms (4.33s)

Failed Tests

division - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: `15 divided by 3 is 5.

Answer: 5`

  • Duration: 16655ms (16.66s)
  • Reason: Expected 5, but got 15 divided by 3 is 5.

answer: 5

  • Timestamp: 4/3/2025, 7:14:40 PM

Passed Tests

addition - openai/gpt-3.5-turbo

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 1484ms (1.48s)
  • Timestamp: 4/3/2025, 7:14:04 PM

addition - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 8460ms (8.46s)
  • Timestamp: 4/3/2025, 7:14:12 PM

addition - openai/gpt-4o-mini

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 910ms (0.91s)
  • Timestamp: 4/3/2025, 7:14:13 PM

multiplication - openai/gpt-3.5-turbo

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 955ms (0.95s)
  • Timestamp: 4/3/2025, 7:14:14 PM

multiplication - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 7653ms (7.65s)
  • Timestamp: 4/3/2025, 7:14:22 PM

multiplication - openai/gpt-4o-mini

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 1095ms (1.09s)
  • Timestamp: 4/3/2025, 7:14:23 PM

division - openai/gpt-3.5-turbo

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 816ms (0.82s)
  • Timestamp: 4/3/2025, 7:14:24 PM

division - openai/gpt-4o-mini

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 954ms (0.95s)
  • Timestamp: 4/3/2025, 7:14:41 PM