mono/packages/kbot/tests/unit/reports/math.md
2025-04-04 14:44:04 +02:00

6.3 KiB

Math Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
quadratic openai/gpt-4o-mini 943 0.94
quadratic openrouter/quasar-alpha 1105 1.10
quadratic openai/gpt-3.5-turbo 1229 1.23
quadratic deepseek/deepseek-r1-distill-qwen-14b:free 11633 11.63
factorial openai/gpt-3.5-turbo 838 0.84
factorial openrouter/quasar-alpha 840 0.84
factorial openai/gpt-4o-mini 920 0.92
factorial deepseek/deepseek-r1-distill-qwen-14b:free 7825 7.83
fibonacci openrouter/quasar-alpha 701 0.70
fibonacci openai/gpt-4o-mini 935 0.94
fibonacci openai/gpt-3.5-turbo 1195 1.20
fibonacci deepseek/deepseek-r1-distill-qwen-14b:free 11358 11.36
square_root openai/gpt-3.5-turbo 793 0.79
square_root openai/gpt-4o-mini 1012 1.01
square_root openrouter/quasar-alpha 1535 1.53
square_root deepseek/deepseek-r1-distill-qwen-14b:free 16332 16.33
power openai/gpt-3.5-turbo 922 0.92
power openai/gpt-4o-mini 1004 1.00
power openrouter/quasar-alpha 1567 1.57
power deepseek/deepseek-r1-distill-qwen-14b:free 7091 7.09

Summary

  • Total Tests: 20
  • Passed: 16
  • Failed: 4
  • Success Rate: 80.00%
  • Average Duration: 3489ms (3.49s)

Failed Tests

quadratic - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation.
  • Expected: -2,-3
  • Actual: `The solutions to the quadratic equation x² + 5x + 6 = 0 are -3, -2.

Answer: -3, -2`

  • Duration: 11633ms (11.63s)
  • Reason: Expected -2,-3, but got the solutions to the quadratic equation x² + 5x + 6 = 0 are -3, -2.

answer: -3, -2

  • Timestamp: 4/4/2025, 2:38:24 PM

quadratic - openai/gpt-4o-mini

  • Prompt: Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation.
  • Expected: -2,-3
  • Actual: -2, -3
  • Duration: 943ms (0.94s)
  • Reason: Expected -2,-3, but got -2, -3
  • Timestamp: 4/4/2025, 2:38:25 PM

fibonacci - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation.
  • Expected: 8
  • Actual: 5
  • Duration: 11358ms (11.36s)
  • Reason: Expected 8, but got 5
  • Timestamp: 4/4/2025, 2:38:49 PM

fibonacci - openai/gpt-4o-mini

  • Prompt: Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation.
  • Expected: 8
  • Actual: 5
  • Duration: 935ms (0.94s)
  • Reason: Expected 8, but got 5
  • Timestamp: 4/4/2025, 2:38:50 PM

Passed Tests

quadratic - openai/gpt-3.5-turbo

  • Prompt: Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation.
  • Expected: -2,-3
  • Actual: -2,-3
  • Duration: 1229ms (1.23s)
  • Timestamp: 4/4/2025, 2:38:12 PM

quadratic - openrouter/quasar-alpha

  • Prompt: Solve the quadratic equation x² + 5x + 6 = 0. Return only the solutions as comma-separated numbers, no explanation.
  • Expected: -2,-3
  • Actual: -2,-3
  • Duration: 1105ms (1.10s)
  • Timestamp: 4/4/2025, 2:38:26 PM

factorial - openai/gpt-3.5-turbo

  • Prompt: Calculate 5! (factorial of 5). Return only the number, no explanation.
  • Expected: 120
  • Actual: 120
  • Duration: 838ms (0.84s)
  • Timestamp: 4/4/2025, 2:38:27 PM

factorial - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Calculate 5! (factorial of 5). Return only the number, no explanation.
  • Expected: 120
  • Actual: 120
  • Duration: 7825ms (7.83s)
  • Timestamp: 4/4/2025, 2:38:34 PM

factorial - openai/gpt-4o-mini

  • Prompt: Calculate 5! (factorial of 5). Return only the number, no explanation.
  • Expected: 120
  • Actual: 120
  • Duration: 920ms (0.92s)
  • Timestamp: 4/4/2025, 2:38:35 PM

factorial - openrouter/quasar-alpha

  • Prompt: Calculate 5! (factorial of 5). Return only the number, no explanation.
  • Expected: 120
  • Actual: 120
  • Duration: 840ms (0.84s)
  • Timestamp: 4/4/2025, 2:38:36 PM

fibonacci - openai/gpt-3.5-turbo

  • Prompt: Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 1195ms (1.20s)
  • Timestamp: 4/4/2025, 2:38:37 PM

fibonacci - openrouter/quasar-alpha

  • Prompt: Calculate the 6th number in the Fibonacci sequence. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 701ms (0.70s)
  • Timestamp: 4/4/2025, 2:38:50 PM

square_root - openai/gpt-3.5-turbo

  • Prompt: Calculate the square root of 16. Return only the number, no explanation.
  • Expected: 4
  • Actual: 4
  • Duration: 793ms (0.79s)
  • Timestamp: 4/4/2025, 2:38:51 PM

square_root - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Calculate the square root of 16. Return only the number, no explanation.
  • Expected: 4
  • Actual: 4
  • Duration: 16332ms (16.33s)
  • Timestamp: 4/4/2025, 2:39:08 PM

square_root - openai/gpt-4o-mini

  • Prompt: Calculate the square root of 16. Return only the number, no explanation.
  • Expected: 4
  • Actual: 4
  • Duration: 1012ms (1.01s)
  • Timestamp: 4/4/2025, 2:39:09 PM

square_root - openrouter/quasar-alpha

  • Prompt: Calculate the square root of 16. Return only the number, no explanation.
  • Expected: 4
  • Actual: 4
  • Duration: 1535ms (1.53s)
  • Timestamp: 4/4/2025, 2:39:10 PM

power - openai/gpt-3.5-turbo

  • Prompt: Calculate 2 raised to the power of 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 922ms (0.92s)
  • Timestamp: 4/4/2025, 2:39:11 PM

power - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Calculate 2 raised to the power of 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 7091ms (7.09s)
  • Timestamp: 4/4/2025, 2:39:18 PM

power - openai/gpt-4o-mini

  • Prompt: Calculate 2 raised to the power of 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 1004ms (1.00s)
  • Timestamp: 4/4/2025, 2:39:19 PM

power - openrouter/quasar-alpha

  • Prompt: Calculate 2 raised to the power of 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 1567ms (1.57s)
  • Timestamp: 4/4/2025, 2:39:21 PM