mono/packages/kbot/tests/unit/reports/basic.md
2025-04-02 16:26:00 +02:00

5.4 KiB

Basic Operations Test Results

Highscores

Test Model Duration (ms) Duration (s)
addition openai/gpt-4o-mini 885 0.89
division openai/gpt-3.5-turbo 889 0.89
division qwen/qwq-32b 917 0.92
multiplication openai/gpt-3.5-turbo 984 0.98
division openai/gpt-4o-mini 1104 1.10
multiplication openai/gpt-4o-mini 1111 1.11
multiplication anthropic/claude-3.5-sonnet 1190 1.19
division anthropic/claude-3.5-sonnet 1405 1.41
multiplication deepseek/deepseek-r1-distill-qwen-14b:free 1558 1.56
addition anthropic/claude-3.5-sonnet 1689 1.69
division deepseek/deepseek-r1-distill-qwen-14b:free 3646 3.65
addition qwen/qwq-32b 3807 3.81
multiplication qwen/qwq-32b 5008 5.01
division deepseek/deepseek-r1 7130 7.13
addition openai/gpt-3.5-turbo 10455 10.46
addition deepseek/deepseek-r1 12064 12.06

Summary

  • Total Tests: 18
  • Passed: 16
  • Failed: 2
  • Success Rate: 88.89%
  • Average Duration: 3639ms (3.64s)

Failed Tests

addition - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: The sum of 5 and 3 is 8. Therefore, the result is \boxed{8}.
  • Duration: 6405ms (6405.00s)
  • Reason: Expected 8, but got the sum of 5 and 3 is 8. therefore, the result is \boxed{8}.
  • Timestamp: 4/2/2025, 3:44:40 PM

multiplication - deepseek/deepseek-r1

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: `24

24

The result is 24.

24

Here's the answer: 24

The answer will be 24.

24

24

The product of 8 and 3 is 24.

24

The answer is 24.

24

24

24

The result is 24.

24

Here's the numerical result: 24

The answer is 24.

24

24

The answer is 24.`

  • Duration: 5258ms (5258.00s)
  • Reason: Expected 24, but got 24

24

the result is 24.

24

here's the answer: 24

the answer will be 24.

24

24

the product of 8 and 3 is 24.

24

the answer is 24.

24

24

24

the result is 24.

24

here's the numerical result: 24

the answer is 24.

24

24

the answer is 24.

  • Timestamp: 4/2/2025, 3:44:53 PM

Passed Tests

addition - anthropic/claude-3.5-sonnet

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 1689ms (1689.00s)
  • Timestamp: 4/2/2025, 3:44:06 PM

addition - qwen/qwq-32b

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 3807ms (3807.00s)
  • Timestamp: 4/2/2025, 3:44:10 PM

addition - openai/gpt-4o-mini

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 885ms (885.00s)
  • Timestamp: 4/2/2025, 3:44:11 PM

addition - openai/gpt-3.5-turbo

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 10455ms (10455.00s)
  • Timestamp: 4/2/2025, 3:44:21 PM

addition - deepseek/deepseek-r1

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Duration: 12064ms (12064.00s)
  • Timestamp: 4/2/2025, 3:44:33 PM

multiplication - anthropic/claude-3.5-sonnet

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 1190ms (1190.00s)
  • Timestamp: 4/2/2025, 3:44:41 PM

multiplication - qwen/qwq-32b

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 5008ms (5008.00s)
  • Timestamp: 4/2/2025, 3:44:46 PM

multiplication - openai/gpt-4o-mini

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 1111ms (1111.00s)
  • Timestamp: 4/2/2025, 3:44:47 PM

multiplication - openai/gpt-3.5-turbo

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 984ms (984.00s)
  • Timestamp: 4/2/2025, 3:44:48 PM

multiplication - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Duration: 1558ms (1558.00s)
  • Timestamp: 4/2/2025, 3:44:55 PM

division - anthropic/claude-3.5-sonnet

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 1405ms (1405.00s)
  • Timestamp: 4/2/2025, 3:44:56 PM

division - qwen/qwq-32b

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 917ms (917.00s)
  • Timestamp: 4/2/2025, 3:44:57 PM

division - openai/gpt-4o-mini

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 1104ms (1104.00s)
  • Timestamp: 4/2/2025, 3:44:58 PM

division - openai/gpt-3.5-turbo

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 889ms (889.00s)
  • Timestamp: 4/2/2025, 3:44:59 PM

division - deepseek/deepseek-r1

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 7130ms (7130.00s)
  • Timestamp: 4/2/2025, 3:45:06 PM

division - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Duration: 3646ms (3646.00s)
  • Timestamp: 4/2/2025, 3:45:10 PM