mono/packages/kbot/tests/unit/reports/math.md

4.9 KiB

Math Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
quadratic openai/gpt-4o-mini 776 0.78
quadratic openrouter/quasar-alpha 965 0.96
quadratic openai/gpt-3.5-turbo 1286 1.29
factorial openai/gpt-4o-mini 753 0.75
factorial openrouter/quasar-alpha 762 0.76
factorial openai/gpt-3.5-turbo 838 0.84
fibonacci openai/gpt-4o-mini 769 0.77
fibonacci openai/gpt-3.5-turbo 866 0.87
fibonacci openrouter/quasar-alpha 930 0.93
square_root openai/gpt-3.5-turbo 816 0.82
square_root openai/gpt-4o-mini 927 0.93
square_root openrouter/quasar-alpha 5005 5.00
power openai/gpt-3.5-turbo 955 0.95
power openrouter/quasar-alpha 1432 1.43
power openai/gpt-4o-mini 1725 1.73

Summary

  • Total Tests: 15
  • Passed: 13
  • Failed: 2
  • Success Rate: 86.67%
  • Average Duration: 1254ms (1.25s)

Failed Tests

quadratic - openai/gpt-3.5-turbo

  • Prompt: Solve the quadratic equation x² + 5x + 6 = 0. Respond ONLY with the solutions as comma-separated numbers (e.g., -3,-2). No other text.
  • Expected: -3,-2
  • Actual: -2,-3
  • Duration: 1286ms (1.29s)
  • Reason: Expected -3,-2, but got -2,-3
  • Timestamp: 4/7/2025, 7:08:26 PM

quadratic - openrouter/quasar-alpha

  • Prompt: Solve the quadratic equation x² + 5x + 6 = 0. Respond ONLY with the solutions as comma-separated numbers (e.g., -3,-2). No other text.
  • Expected: -3,-2
  • Actual: -2,-3
  • Duration: 965ms (0.96s)
  • Reason: Expected -3,-2, but got -2,-3
  • Timestamp: 4/7/2025, 7:08:28 PM

Passed Tests

quadratic - openai/gpt-4o-mini

  • Prompt: Solve the quadratic equation x² + 5x + 6 = 0. Respond ONLY with the solutions as comma-separated numbers (e.g., -3,-2). No other text.
  • Expected: -3,-2
  • Actual: -3,-2
  • Duration: 776ms (0.78s)
  • Timestamp: 4/7/2025, 7:08:27 PM

factorial - openai/gpt-3.5-turbo

  • Prompt: Calculate 5! (factorial of 5). Respond ONLY with the final numerical answer. No explanation, no other text.
  • Expected: 120
  • Actual: 120
  • Duration: 838ms (0.84s)
  • Timestamp: 4/7/2025, 7:08:29 PM

factorial - openai/gpt-4o-mini

  • Prompt: Calculate 5! (factorial of 5). Respond ONLY with the final numerical answer. No explanation, no other text.
  • Expected: 120
  • Actual: 120
  • Duration: 753ms (0.75s)
  • Timestamp: 4/7/2025, 7:08:29 PM

factorial - openrouter/quasar-alpha

  • Prompt: Calculate 5! (factorial of 5). Respond ONLY with the final numerical answer. No explanation, no other text.
  • Expected: 120
  • Actual: 120
  • Duration: 762ms (0.76s)
  • Timestamp: 4/7/2025, 7:08:30 PM

fibonacci - openai/gpt-3.5-turbo

  • Prompt: Calculate the 6th number in the Fibonacci sequence (assuming F(1)=1, F(2)=1). Respond ONLY with the final numerical answer. No other text.
  • Expected: 8
  • Actual: 8
  • Duration: 866ms (0.87s)
  • Timestamp: 4/7/2025, 7:08:31 PM

fibonacci - openai/gpt-4o-mini

  • Prompt: Calculate the 6th number in the Fibonacci sequence (assuming F(1)=1, F(2)=1). Respond ONLY with the final numerical answer. No other text.
  • Expected: 8
  • Actual: 8
  • Duration: 769ms (0.77s)
  • Timestamp: 4/7/2025, 7:08:32 PM

fibonacci - openrouter/quasar-alpha

  • Prompt: Calculate the 6th number in the Fibonacci sequence (assuming F(1)=1, F(2)=1). Respond ONLY with the final numerical answer. No other text.
  • Expected: 8
  • Actual: 8
  • Duration: 930ms (0.93s)
  • Timestamp: 4/7/2025, 7:08:33 PM

square_root - openai/gpt-3.5-turbo

  • Prompt: Calculate the square root of 16. Respond ONLY with the final numerical answer. No other text.
  • Expected: 4
  • Actual: 4
  • Duration: 816ms (0.82s)
  • Timestamp: 4/7/2025, 7:08:34 PM

square_root - openai/gpt-4o-mini

  • Prompt: Calculate the square root of 16. Respond ONLY with the final numerical answer. No other text.
  • Expected: 4
  • Actual: 4
  • Duration: 927ms (0.93s)
  • Timestamp: 4/7/2025, 7:08:35 PM

square_root - openrouter/quasar-alpha

  • Prompt: Calculate the square root of 16. Respond ONLY with the final numerical answer. No other text.
  • Expected: 4
  • Actual: 4
  • Duration: 5005ms (5.00s)
  • Timestamp: 4/7/2025, 7:08:40 PM

power - openai/gpt-3.5-turbo

  • Prompt: Calculate 2 raised to the power of 3. Respond ONLY with the final numerical answer. No other text.
  • Expected: 8
  • Actual: 8
  • Duration: 955ms (0.95s)
  • Timestamp: 4/7/2025, 7:08:41 PM

power - openai/gpt-4o-mini

  • Prompt: Calculate 2 raised to the power of 3. Respond ONLY with the final numerical answer. No other text.
  • Expected: 8
  • Actual: 8
  • Duration: 1725ms (1.73s)
  • Timestamp: 4/7/2025, 7:08:42 PM

power - openrouter/quasar-alpha

  • Prompt: Calculate 2 raised to the power of 3. Respond ONLY with the final numerical answer. No other text.
  • Expected: 8
  • Actual: 8
  • Duration: 1432ms (1.43s)
  • Timestamp: 4/7/2025, 7:08:44 PM