mono/packages/kbot/tests/unit/reports/math.md
2025-04-06 17:49:29 +02:00

6.5 KiB

Math Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
quadratic openrouter/quasar-alpha 880 0.88
quadratic openai/gpt-4o-mini 1127 1.13
quadratic openai/gpt-3.5-turbo 1193 1.19
quadratic deepseek/deepseek-r1-distill-qwen-14b:free 10584 10.58
factorial openai/gpt-4o-mini 738 0.74
factorial openai/gpt-3.5-turbo 771 0.77
factorial openrouter/quasar-alpha 843 0.84
factorial deepseek/deepseek-r1-distill-qwen-14b:free 5106 5.11
fibonacci deepseek/deepseek-r1-distill-qwen-14b:free 589 0.59
fibonacci openai/gpt-4o-mini 881 0.88
fibonacci openai/gpt-3.5-turbo 1018 1.02
fibonacci openrouter/quasar-alpha 1293 1.29
square_root deepseek/deepseek-r1-distill-qwen-14b:free 421 0.42
square_root openai/gpt-4o-mini 882 0.88
square_root openai/gpt-3.5-turbo 950 0.95
square_root openrouter/quasar-alpha 1064 1.06
power deepseek/deepseek-r1-distill-qwen-14b:free 454 0.45
power openai/gpt-4o-mini 783 0.78
power openrouter/quasar-alpha 783 0.78
power openai/gpt-3.5-turbo 793 0.79

Summary

  • Total Tests: 20
  • Passed: 12
  • Failed: 8
  • Success Rate: 60.00%
  • Average Duration: 1558ms (1.56s)

Failed Tests

quadratic - openai/gpt-3.5-turbo

  • Prompt: Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation.
  • Expected: -2,-3
  • Actual: -2, -3
  • Duration: 1193ms (1.19s)
  • Reason: Expected -2,-3, but got -2, -3
  • Timestamp: 4/6/2025, 5:42:04 PM

quadratic - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation.
  • Expected: -2,-3
  • Actual: `The solutions to the quadratic equation x^2 + 5x + 6 = 0 are x = -2 and x = -3.

-2,-3`

  • Duration: 10584ms (10.58s)
  • Reason: Expected -2,-3, but got the solutions to the quadratic equation x^2 + 5x + 6 = 0 are x = -2 and x = -3.

-2,-3

  • Timestamp: 4/6/2025, 5:42:15 PM

quadratic - openai/gpt-4o-mini

  • Prompt: Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation.
  • Expected: -2,-3
  • Actual: -2, -3
  • Duration: 1127ms (1.13s)
  • Reason: Expected -2,-3, but got -2, -3
  • Timestamp: 4/6/2025, 5:42:16 PM

quadratic - openrouter/quasar-alpha

  • Prompt: Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation.
  • Expected: -2,-3
  • Actual: -2, -3
  • Duration: 880ms (0.88s)
  • Reason: Expected -2,-3, but got -2, -3
  • Timestamp: 4/6/2025, 5:42:17 PM

fibonacci - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation.
  • Expected: 8
  • Actual: ``
  • Duration: 589ms (0.59s)
  • Reason: Model returned empty response
  • Timestamp: 4/6/2025, 5:42:26 PM

fibonacci - openai/gpt-4o-mini

  • Prompt: Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation.
  • Expected: 8
  • Actual: 5
  • Duration: 881ms (0.88s)
  • Reason: Expected 8, but got 5
  • Timestamp: 4/6/2025, 5:42:27 PM

square_root - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Calculate the square root of 16. Return only the number, no explanation.
  • Expected: 4
  • Actual: ``
  • Duration: 421ms (0.42s)
  • Reason: Model returned empty response
  • Timestamp: 4/6/2025, 5:42:29 PM

power - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Calculate 2 raised to the power of 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: ``
  • Duration: 454ms (0.45s)
  • Reason: Model returned empty response
  • Timestamp: 4/6/2025, 5:42:33 PM

Passed Tests

factorial - openai/gpt-3.5-turbo

  • Prompt: Calculate 5! (factorial of 5). Return only the number, no explanation.
  • Expected: 120
  • Actual: 120
  • Duration: 771ms (0.77s)
  • Timestamp: 4/6/2025, 5:42:18 PM

factorial - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Calculate 5! (factorial of 5). Return only the number, no explanation.
  • Expected: 120
  • Actual: 120
  • Duration: 5106ms (5.11s)
  • Timestamp: 4/6/2025, 5:42:23 PM

factorial - openai/gpt-4o-mini

  • Prompt: Calculate 5! (factorial of 5). Return only the number, no explanation.
  • Expected: 120
  • Actual: 120
  • Duration: 738ms (0.74s)
  • Timestamp: 4/6/2025, 5:42:23 PM

factorial - openrouter/quasar-alpha

  • Prompt: Calculate 5! (factorial of 5). Return only the number, no explanation.
  • Expected: 120
  • Actual: 120
  • Duration: 843ms (0.84s)
  • Timestamp: 4/6/2025, 5:42:24 PM

fibonacci - openai/gpt-3.5-turbo

  • Prompt: Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 1018ms (1.02s)
  • Timestamp: 4/6/2025, 5:42:25 PM

fibonacci - openrouter/quasar-alpha

  • Prompt: Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 1293ms (1.29s)
  • Timestamp: 4/6/2025, 5:42:28 PM

square_root - openai/gpt-3.5-turbo

  • Prompt: Calculate the square root of 16. Return only the number, no explanation.
  • Expected: 4
  • Actual: 4
  • Duration: 950ms (0.95s)
  • Timestamp: 4/6/2025, 5:42:29 PM

square_root - openai/gpt-4o-mini

  • Prompt: Calculate the square root of 16. Return only the number, no explanation.
  • Expected: 4
  • Actual: 4
  • Duration: 882ms (0.88s)
  • Timestamp: 4/6/2025, 5:42:30 PM

square_root - openrouter/quasar-alpha

  • Prompt: Calculate the square root of 16. Return only the number, no explanation.
  • Expected: 4
  • Actual: 4
  • Duration: 1064ms (1.06s)
  • Timestamp: 4/6/2025, 5:42:31 PM

power - openai/gpt-3.5-turbo

  • Prompt: Calculate 2 raised to the power of 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 793ms (0.79s)
  • Timestamp: 4/6/2025, 5:42:32 PM

power - openai/gpt-4o-mini

  • Prompt: Calculate 2 raised to the power of 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 783ms (0.78s)
  • Timestamp: 4/6/2025, 5:42:34 PM

power - openrouter/quasar-alpha

  • Prompt: Calculate 2 raised to the power of 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 783ms (0.78s)
  • Timestamp: 4/6/2025, 5:42:34 PM