mono/packages/kbot/tests/unit/reports/math.md

3.3 KiB

Math Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
quadratic openai/gpt-4o-mini 1088 1.09
quadratic openai/gpt-3.5-turbo 1202 1.20
factorial openai/gpt-4o-mini 481 0.48
factorial openai/gpt-3.5-turbo 503 0.50
fibonacci openai/gpt-3.5-turbo 503 0.50
fibonacci openai/gpt-4o-mini 601 0.60
square_root openai/gpt-4o-mini 539 0.54
square_root openai/gpt-3.5-turbo 738 0.74
power openai/gpt-3.5-turbo 592 0.59
power openai/gpt-4o-mini 1103 1.10

Summary

  • Total Tests: 10
  • Passed: 9
  • Failed: 1
  • Success Rate: 90.00%
  • Average Duration: 735ms (0.73s)

Failed Tests

quadratic - openai/gpt-3.5-turbo

  • Prompt: Solve the quadratic equation x² + 5x + 6 = 0. Respond ONLY with the solutions as comma-separated numbers (e.g., -3,-2). No other text.
  • Expected: -3,-2
  • Actual: -2,-3
  • Duration: 1202ms (1.20s)
  • Reason: Expected -3,-2, but got -2,-3
  • Timestamp: 6/5/2025, 8:46:07 PM

Passed Tests

quadratic - openai/gpt-4o-mini

  • Prompt: Solve the quadratic equation x² + 5x + 6 = 0. Respond ONLY with the solutions as comma-separated numbers (e.g., -3,-2). No other text.
  • Expected: -3,-2
  • Actual: -3,-2
  • Duration: 1088ms (1.09s)
  • Timestamp: 6/5/2025, 8:46:09 PM

factorial - openai/gpt-3.5-turbo

  • Prompt: Calculate 5! (factorial of 5). Respond ONLY with the final numerical answer. No explanation, no other text.
  • Expected: 120
  • Actual: 120
  • Duration: 503ms (0.50s)
  • Timestamp: 6/5/2025, 8:46:09 PM

factorial - openai/gpt-4o-mini

  • Prompt: Calculate 5! (factorial of 5). Respond ONLY with the final numerical answer. No explanation, no other text.
  • Expected: 120
  • Actual: 120
  • Duration: 481ms (0.48s)
  • Timestamp: 6/5/2025, 8:46:10 PM

fibonacci - openai/gpt-3.5-turbo

  • Prompt: Calculate the 6th number in the Fibonacci sequence (assuming F(1)=1, F(2)=1). Respond ONLY with the final numerical answer. No other text.
  • Expected: 8
  • Actual: 8
  • Duration: 503ms (0.50s)
  • Timestamp: 6/5/2025, 8:46:10 PM

fibonacci - openai/gpt-4o-mini

  • Prompt: Calculate the 6th number in the Fibonacci sequence (assuming F(1)=1, F(2)=1). Respond ONLY with the final numerical answer. No other text.
  • Expected: 8
  • Actual: 8
  • Duration: 601ms (0.60s)
  • Timestamp: 6/5/2025, 8:46:11 PM

square_root - openai/gpt-3.5-turbo

  • Prompt: Calculate the square root of 16. Respond ONLY with the final numerical answer. No other text.
  • Expected: 4
  • Actual: 4
  • Duration: 738ms (0.74s)
  • Timestamp: 6/5/2025, 8:46:11 PM

square_root - openai/gpt-4o-mini

  • Prompt: Calculate the square root of 16. Respond ONLY with the final numerical answer. No other text.
  • Expected: 4
  • Actual: 4
  • Duration: 539ms (0.54s)
  • Timestamp: 6/5/2025, 8:46:12 PM

power - openai/gpt-3.5-turbo

  • Prompt: Calculate 2 raised to the power of 3. Respond ONLY with the final numerical answer. No other text.
  • Expected: 8
  • Actual: 8
  • Duration: 592ms (0.59s)
  • Timestamp: 6/5/2025, 8:46:12 PM

power - openai/gpt-4o-mini

  • Prompt: Calculate 2 raised to the power of 3. Respond ONLY with the final numerical answer. No other text.
  • Expected: 8
  • Actual: 8
  • Duration: 1103ms (1.10s)
  • Timestamp: 6/5/2025, 8:46:14 PM