mono/packages/kbot/tests/unit/reports/math.md
2026-03-19 17:40:06 +01:00

3.5 KiB

Math Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
quadratic openai/gpt-4o-mini 514 0.51
quadratic anthropic/claude-sonnet-4 1120 1.12
factorial openai/gpt-4o-mini 512 0.51
factorial anthropic/claude-sonnet-4 877 0.88
fibonacci openai/gpt-4o-mini 494 0.49
fibonacci anthropic/claude-sonnet-4 4093 4.09
square_root openai/gpt-4o-mini 483 0.48
square_root anthropic/claude-sonnet-4 969 0.97
power anthropic/claude-sonnet-4 1129 1.13
power openai/gpt-4o-mini 1308 1.31

Summary

  • Total Tests: 15
  • Passed: 12
  • Failed: 3
  • Success Rate: 80.00%
  • Average Duration: 1189ms (1.19s)

Failed Tests

quadratic - anthropic/claude-sonnet-4

  • Prompt: Solve the quadratic equation x² + 5x + 6 = 0. Respond ONLY with the solutions as comma-separated numbers (e.g., -3,-2). No other text.
  • Expected: -3,-2
  • Actual: -2,-3
  • Duration: 1120ms (1.12s)
  • Reason: Expected -3,-2, but got -2,-3
  • Timestamp: 3/19/2026, 4:39:00 PM

quadratic - openai/gpt-4o-mini

  • Prompt: Solve the quadratic equation x² + 5x + 6 = 0. Respond ONLY with the solutions as comma-separated numbers (e.g., -3,-2). No other text.
  • Expected: -3,-2
  • Actual: -2,-3
  • Duration: 514ms (0.51s)
  • Reason: Expected -3,-2, but got -2,-3
  • Timestamp: 3/19/2026, 4:38:59 PM

Passed Tests

factorial - anthropic/claude-sonnet-4

  • Prompt: Calculate 5! (factorial of 5). Respond ONLY with the final numerical answer. No explanation, no other text.
  • Expected: 120
  • Actual: 120
  • Duration: 877ms (0.88s)
  • Timestamp: 3/19/2026, 4:39:03 PM

factorial - openai/gpt-4o-mini

  • Prompt: Calculate 5! (factorial of 5). Respond ONLY with the final numerical answer. No explanation, no other text.
  • Expected: 120
  • Actual: 120
  • Duration: 512ms (0.51s)
  • Timestamp: 3/19/2026, 4:39:02 PM

fibonacci - anthropic/claude-sonnet-4

  • Prompt: Calculate the 6th number in the Fibonacci sequence (assuming F(1)=1, F(2)=1). Respond ONLY with the final numerical answer. No other text.
  • Expected: 8
  • Actual: 8
  • Duration: 4093ms (4.09s)
  • Timestamp: 3/19/2026, 4:39:08 PM

fibonacci - openai/gpt-4o-mini

  • Prompt: Calculate the 6th number in the Fibonacci sequence (assuming F(1)=1, F(2)=1). Respond ONLY with the final numerical answer. No other text.
  • Expected: 8
  • Actual: 8
  • Duration: 494ms (0.49s)
  • Timestamp: 3/19/2026, 4:39:04 PM

square_root - anthropic/claude-sonnet-4

  • Prompt: Calculate the square root of 16. Respond ONLY with the final numerical answer. No other text.
  • Expected: 4
  • Actual: 4
  • Duration: 969ms (0.97s)
  • Timestamp: 3/19/2026, 4:39:11 PM

square_root - openai/gpt-4o-mini

  • Prompt: Calculate the square root of 16. Respond ONLY with the final numerical answer. No other text.
  • Expected: 4
  • Actual: 4
  • Duration: 483ms (0.48s)
  • Timestamp: 3/19/2026, 4:39:10 PM

power - anthropic/claude-sonnet-4

  • Prompt: Calculate 2 raised to the power of 3. Respond ONLY with the final numerical answer. No other text.
  • Expected: 8
  • Actual: 8
  • Duration: 1129ms (1.13s)
  • Timestamp: 3/19/2026, 4:39:15 PM

power - openai/gpt-4o-mini

  • Prompt: Calculate 2 raised to the power of 3. Respond ONLY with the final numerical answer. No other text.
  • Expected: 8
  • Actual: 8
  • Duration: 1308ms (1.31s)
  • Timestamp: 3/19/2026, 4:39:14 PM