mono/packages/kbot/tests/unit/basic-report.md

4.5 KiB

Basic Operations Test Results

Failed Tests

basic_arithmetic - deepseek/deepseek-chat:free

  • Prompt: return the result of 2+2, dont comment
  • Expected: undefined
  • Actual: 4
  • Reason: undefined
  • Timestamp: 4/1/2025, 12:26:30 PM

basic_arithmetic - google/gemini-2.0-flash-exp:free

  • Prompt: return the result of 2+2, dont comment
  • Expected: undefined
  • Actual: 4
  • Reason: undefined
  • Timestamp: 4/1/2025, 12:26:31 PM

basic_arithmetic - gpt-4

  • Prompt: return the result of 2+2, dont comment
  • Expected: undefined
  • Actual: 4
  • Reason: undefined
  • Timestamp: 4/1/2025, 12:26:32 PM

json_structure - deepseek/deepseek-chat:free

  • Prompt: return a JSON object with two fields: "name" as "test" and "value" as 42. Return only the JSON, no other text.
  • Expected: undefined
  • Actual: {"name":"test","value":42}
  • Reason: undefined
  • Timestamp: 4/1/2025, 12:26:33 PM

json_structure - gpt-4

  • Prompt: return a JSON object with two fields: "name" as "test" and "value" as 42. Return only the JSON, no other text.
  • Expected: undefined
  • Actual: {"name": "test", "value": 42}
  • Reason: undefined
  • Timestamp: 4/1/2025, 12:26:36 PM

json_structure - google/gemini-2.0-flash-exp:free

  • Prompt: return a JSON object with two fields: "name" as "test" and "value" as 42. Return only the JSON, no other text.
  • Expected: undefined
  • Actual: { "name": "test", "value": 42 }
  • Reason: undefined
  • Timestamp: 4/1/2025, 12:26:35 PM

hello - deepseek/deepseek-chat:free

  • Prompt: say "hello"
  • Expected: hello
  • Actual: ``
  • Reason: Model returned empty response
  • Timestamp: 4/1/2025, 1:36:37 PM

hello - google/gemini-2.0-flash-exp:free

  • Prompt: say "hello"
  • Expected: hello
  • Actual: ``
  • Reason: Model returned empty response
  • Timestamp: 4/1/2025, 1:36:37 PM

hello - gpt-4

  • Prompt: say "hello"
  • Expected: hello
  • Actual: ``
  • Reason: Unknown error occurred
  • Timestamp: 4/1/2025, 1:36:42 PM

goodbye - deepseek/deepseek-chat:free

  • Prompt: say "goodbye"
  • Expected: goodbye
  • Actual: ``
  • Reason: Model returned empty response
  • Timestamp: 4/1/2025, 1:36:42 PM

goodbye - google/gemini-2.0-flash-exp:free

  • Prompt: say "goodbye"
  • Expected: goodbye
  • Actual: ``
  • Reason: Model returned empty response
  • Timestamp: 4/1/2025, 1:36:43 PM

goodbye - gpt-4

  • Prompt: say "goodbye"
  • Expected: goodbye
  • Actual: ``
  • Reason: expected 'goodbye.' to deeply equal 'goodbye'
  • Timestamp: 4/1/2025, 1:36:44 PM

yes - deepseek/deepseek-chat:free

  • Prompt: say "yes"
  • Expected: yes
  • Actual: ``
  • Reason: Model returned empty response
  • Timestamp: 4/1/2025, 1:36:45 PM

yes - google/gemini-2.0-flash-exp:free

  • Prompt: say "yes"
  • Expected: yes
  • Actual: ``
  • Reason: Model returned empty response
  • Timestamp: 4/1/2025, 1:36:45 PM

yes - gpt-4

  • Prompt: say "yes"
  • Expected: yes
  • Actual: ``
  • Reason: Unknown error occurred
  • Timestamp: 4/1/2025, 1:36:46 PM

Passed Tests

addition - deepseek/deepseek-chat:free

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Timestamp: 4/1/2025, 12:59:06 PM

addition - google/gemini-2.0-flash-exp:free

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Timestamp: 4/1/2025, 12:59:08 PM

addition - gpt-4

  • Prompt: add 5 and 3. Return only the number, no explanation.
  • Expected: 8
  • Actual: 8
  • Timestamp: 4/1/2025, 1:39:04 PM

multiplication - deepseek/deepseek-chat:free

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Timestamp: 4/1/2025, 12:59:13 PM

multiplication - google/gemini-2.0-flash-exp:free

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Timestamp: 4/1/2025, 12:59:15 PM

multiplication - gpt-4

  • Prompt: multiply 8 and 3. Return only the number, no explanation.
  • Expected: 24
  • Actual: 24
  • Timestamp: 4/1/2025, 1:39:06 PM

division - deepseek/deepseek-chat:free

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Timestamp: 4/1/2025, 12:59:18 PM

division - google/gemini-2.0-flash-exp:free

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Timestamp: 4/1/2025, 12:56:09 PM

division - gpt-4

  • Prompt: divide 15 by 3. Return only the number, no explanation.
  • Expected: 5
  • Actual: 5
  • Timestamp: 4/1/2025, 1:39:08 PM