4.5 KiB
4.5 KiB
Basic Operations Test Results
Failed Tests
basic_arithmetic - deepseek/deepseek-chat:free
- Prompt:
return the result of 2+2, dont comment - Expected:
undefined - Actual:
4 - Reason: undefined
- Timestamp: 4/1/2025, 12:26:30 PM
basic_arithmetic - google/gemini-2.0-flash-exp:free
- Prompt:
return the result of 2+2, dont comment - Expected:
undefined - Actual:
4 - Reason: undefined
- Timestamp: 4/1/2025, 12:26:31 PM
basic_arithmetic - gpt-4
- Prompt:
return the result of 2+2, dont comment - Expected:
undefined - Actual:
4 - Reason: undefined
- Timestamp: 4/1/2025, 12:26:32 PM
json_structure - deepseek/deepseek-chat:free
- Prompt:
return a JSON object with two fields: "name" as "test" and "value" as 42. Return only the JSON, no other text. - Expected:
undefined - Actual:
{"name":"test","value":42} - Reason: undefined
- Timestamp: 4/1/2025, 12:26:33 PM
json_structure - gpt-4
- Prompt:
return a JSON object with two fields: "name" as "test" and "value" as 42. Return only the JSON, no other text. - Expected:
undefined - Actual:
{"name": "test", "value": 42} - Reason: undefined
- Timestamp: 4/1/2025, 12:26:36 PM
json_structure - google/gemini-2.0-flash-exp:free
- Prompt:
return a JSON object with two fields: "name" as "test" and "value" as 42. Return only the JSON, no other text. - Expected:
undefined - Actual:
{ "name": "test", "value": 42 } - Reason: undefined
- Timestamp: 4/1/2025, 12:26:35 PM
hello - deepseek/deepseek-chat:free
- Prompt:
say "hello" - Expected:
hello - Actual: ``
- Reason: Model returned empty response
- Timestamp: 4/1/2025, 1:36:37 PM
hello - google/gemini-2.0-flash-exp:free
- Prompt:
say "hello" - Expected:
hello - Actual: ``
- Reason: Model returned empty response
- Timestamp: 4/1/2025, 1:36:37 PM
hello - gpt-4
- Prompt:
say "hello" - Expected:
hello - Actual: ``
- Reason: Unknown error occurred
- Timestamp: 4/1/2025, 1:36:42 PM
goodbye - deepseek/deepseek-chat:free
- Prompt:
say "goodbye" - Expected:
goodbye - Actual: ``
- Reason: Model returned empty response
- Timestamp: 4/1/2025, 1:36:42 PM
goodbye - google/gemini-2.0-flash-exp:free
- Prompt:
say "goodbye" - Expected:
goodbye - Actual: ``
- Reason: Model returned empty response
- Timestamp: 4/1/2025, 1:36:43 PM
goodbye - gpt-4
- Prompt:
say "goodbye" - Expected:
goodbye - Actual: ``
- Reason: expected 'goodbye.' to deeply equal 'goodbye'
- Timestamp: 4/1/2025, 1:36:44 PM
yes - deepseek/deepseek-chat:free
- Prompt:
say "yes" - Expected:
yes - Actual: ``
- Reason: Model returned empty response
- Timestamp: 4/1/2025, 1:36:45 PM
yes - google/gemini-2.0-flash-exp:free
- Prompt:
say "yes" - Expected:
yes - Actual: ``
- Reason: Model returned empty response
- Timestamp: 4/1/2025, 1:36:45 PM
yes - gpt-4
- Prompt:
say "yes" - Expected:
yes - Actual: ``
- Reason: Unknown error occurred
- Timestamp: 4/1/2025, 1:36:46 PM
Passed Tests
addition - deepseek/deepseek-chat:free
- Prompt:
add 5 and 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Timestamp: 4/1/2025, 12:59:06 PM
addition - google/gemini-2.0-flash-exp:free
- Prompt:
add 5 and 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Timestamp: 4/1/2025, 12:59:08 PM
addition - gpt-4
- Prompt:
add 5 and 3. Return only the number, no explanation. - Expected:
8 - Actual:
8 - Timestamp: 4/1/2025, 1:39:04 PM
multiplication - deepseek/deepseek-chat:free
- Prompt:
multiply 8 and 3. Return only the number, no explanation. - Expected:
24 - Actual:
24 - Timestamp: 4/1/2025, 12:59:13 PM
multiplication - google/gemini-2.0-flash-exp:free
- Prompt:
multiply 8 and 3. Return only the number, no explanation. - Expected:
24 - Actual:
24 - Timestamp: 4/1/2025, 12:59:15 PM
multiplication - gpt-4
- Prompt:
multiply 8 and 3. Return only the number, no explanation. - Expected:
24 - Actual:
24 - Timestamp: 4/1/2025, 1:39:06 PM
division - deepseek/deepseek-chat:free
- Prompt:
divide 15 by 3. Return only the number, no explanation. - Expected:
5 - Actual:
5 - Timestamp: 4/1/2025, 12:59:18 PM
division - google/gemini-2.0-flash-exp:free
- Prompt:
divide 15 by 3. Return only the number, no explanation. - Expected:
5 - Actual:
5 - Timestamp: 4/1/2025, 12:56:09 PM
division - gpt-4
- Prompt:
divide 15 by 3. Return only the number, no explanation. - Expected:
5 - Actual:
5 - Timestamp: 4/1/2025, 1:39:08 PM