16 KiB
16 KiB
Format Test Results
Failed Tests
basic_structure - deepseek/deepseek-chat:free
- Prompt:
return a greeting "hello" with count 42 - Expected:
{"greeting":"hello","count":42} - Actual:
"" - Duration: 885ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:21:36 PM
basic_structure - google/gemini-2.0-flash-exp:free
- Prompt:
return a greeting "hello" with count 42 - Expected:
{"greeting":"hello","count":42} - Actual:
"" - Duration: 757ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:21:36 PM
basic_structure - gpt-4
- Prompt:
return a greeting "hello" with count 42 - Expected:
{"greeting":"hello","count":42} - Actual:
"" - Duration: 1043ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:21:37 PM
basic_structure - anthropic/claude-3.7-sonnet
- Prompt:
return a greeting "hello" with count 42 - Expected:
{"greeting":"hello","count":42} - Actual:
"" - Duration: 1790ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: Unexpected token 'h', "hello 42" is not valid JSON
- Reason: Failed to parse or validate response: Unexpected token 'h', "hello 42" is not valid JSON
- Timestamp: 4/1/2025, 1:23:05 PM
basic_structure - openai/gpt-4
- Prompt:
Return a JSON object with a greeting "hello" and count 42. The response must be valid JSON with exactly these fields: { "greeting": string, "count": number } - Expected:
{"greeting":"hello","count":42} - Actual:
"" - Duration: 1258ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Invalid response from API
- Reason: Invalid response from API
- Timestamp: 4/1/2025, 1:32:43 PM
nested_structure - deepseek/deepseek-chat:free
- Prompt:
return user John age 30 with dark theme and notifications enabled - Expected:
{"user":{"name":"John","age":30},"settings":{"theme":"dark","notifications":true}} - Actual:
"" - Duration: 655ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:21:38 PM
nested_structure - google/gemini-2.0-flash-exp:free
- Prompt:
return user John age 30 with dark theme and notifications enabled - Expected:
{"user":{"name":"John","age":30},"settings":{"theme":"dark","notifications":true}} - Actual:
"" - Duration: 790ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:21:39 PM
nested_structure - gpt-4
- Prompt:
return user John age 30 with dark theme and notifications enabled - Expected:
{"user":{"name":"John","age":30},"settings":{"theme":"dark","notifications":true}} - Actual:
"" - Duration: 717ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:21:40 PM
nested_structure - anthropic/claude-3.7-sonnet
- Prompt:
return user John age 30 with dark theme and notifications enabled - Expected:
{"user":{"name":"John","age":30},"settings":{"theme":"dark","notifications":true}} - Actual:
"" - Duration: 1189ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: Unexpected token '#', "# John's U"... is not valid JSON
- Reason: Failed to parse or validate response: Unexpected token '#', "# John's U"... is not valid JSON
- Timestamp: 4/1/2025, 1:23:06 PM
nested_structure - openai/gpt-4
- Prompt:
Return a JSON object with user John age 30, dark theme and notifications enabled. The response must be valid JSON with this structure: { "user": { "name": string, "age": number }, "settings": { "theme": string, "notifications": boolean } } - Expected:
{"user":{"name":"John","age":30},"settings":{"theme":"dark","notifications":true}} - Actual:
"" - Duration: 716ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Invalid response from API
- Reason: Invalid response from API
- Timestamp: 4/1/2025, 1:32:44 PM
array_structure - deepseek/deepseek-chat:free
- Prompt:
return a list of 2 items with ids 1 and 2, names "first" and "second" - Expected:
{"items":[{"id":1,"name":"first"},{"id":2,"name":"second"}]} - Actual:
"" - Duration: 617ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:21:40 PM
array_structure - google/gemini-2.0-flash-exp:free
- Prompt:
return a list of 2 items with ids 1 and 2, names "first" and "second" - Expected:
{"items":[{"id":1,"name":"first"},{"id":2,"name":"second"}]} - Actual:
"" - Duration: 756ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:21:41 PM
array_structure - gpt-4
- Prompt:
return a list of 2 items with ids 1 and 2, names "first" and "second" - Expected:
{"items":[{"id":1,"name":"first"},{"id":2,"name":"second"}]} - Actual:
"" - Duration: 1026ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:21:42 PM
array_structure - anthropic/claude-3.7-sonnet
- Prompt:
return a list of 2 items with ids 1 and 2, names "first" and "second" - Expected:
{"items":[{"id":1,"name":"first"},{"id":2,"name":"second"}]} - Actual:
"" - Duration: 1190ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:23:08 PM
array_structure - openai/gpt-4
- Prompt:
Return a JSON object with a list of 2 items. The response must be valid JSON with this structure: { "items": [{ "id": number, "name": string }] }. The first item should have id 1 and name "first", the second item should have id 2 and name "second". - Expected:
{"items":[{"id":1,"name":"first"},{"id":2,"name":"second"}]} - Actual:
"" - Duration: 703ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Invalid response from API
- Reason: Invalid response from API
- Timestamp: 4/1/2025, 1:32:44 PM
enum_structure - deepseek/deepseek-chat:free
- Prompt:
return status success with message "Operation completed" - Expected:
{"status":"success","message":"Operation completed"} - Actual:
"" - Duration: 647ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:21:43 PM
enum_structure - google/gemini-2.0-flash-exp:free
- Prompt:
return status success with message "Operation completed" - Expected:
{"status":"success","message":"Operation completed"} - Actual:
"" - Duration: 813ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:21:43 PM
enum_structure - gpt-4
- Prompt:
return status success with message "Operation completed" - Expected:
{"status":"success","message":"Operation completed"} - Actual:
"" - Duration: 1138ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:21:45 PM
enum_structure - anthropic/claude-3.7-sonnet
- Prompt:
return status success with message "Operation completed" - Expected:
{"status":"success","message":"Operation completed"} - Actual:
"" - Duration: 1728ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: Unexpected token '`', "```json { "... is not valid JSON
- Reason: Failed to parse or validate response: Unexpected token '`', "```json { "... is not valid JSON
- Timestamp: 4/1/2025, 1:23:09 PM
enum_structure - openai/gpt-4
- Prompt:
Return a JSON object with status "success" and message "Operation completed". The response must be valid JSON with this structure: { "status": "success" | "error" | "pending", "message": string } - Expected:
{"status":"success","message":"Operation completed"} - Actual:
"" - Duration: 688ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Invalid response from API
- Reason: Invalid response from API
- Timestamp: 4/1/2025, 1:32:45 PM
optional_fields - deepseek/deepseek-chat:free
- Prompt:
return name "John" with age 30 and email "john@example.com" - Expected:
{"name":"John","age":30,"email":"john@example.com"} - Actual:
"" - Duration: 676ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:21:45 PM
optional_fields - google/gemini-2.0-flash-exp:free
- Prompt:
return name "John" with age 30 and email "john@example.com" - Expected:
{"name":"John","age":30,"email":"john@example.com"} - Actual:
"" - Duration: 884ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:21:46 PM
optional_fields - gpt-4
- Prompt:
return name "John" with age 30 and email "john@example.com" - Expected:
{"name":"John","age":30,"email":"john@example.com"} - Actual:
"" - Duration: 669ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Reason: Failed to parse or validate response: [ { "code": "invalid_type", "expected": "object", "received": "null", "path": [], "message": "Expected object, received null" } ]
- Timestamp: 4/1/2025, 1:21:47 PM
optional_fields - anthropic/claude-3.7-sonnet
- Prompt:
return name "John" with age 30 and email "john@example.com" - Expected:
{"name":"John","age":30,"email":"john@example.com"} - Actual:
"" - Duration: 1576ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Failed to parse or validate response: Unexpected token '`', "```json { "... is not valid JSON
- Reason: Failed to parse or validate response: Unexpected token '`', "```json { "... is not valid JSON
- Timestamp: 4/1/2025, 1:23:11 PM
optional_fields - openai/gpt-4
- Prompt:
Return a JSON object with name "John", age 30, and email "john@example.com". The response must be valid JSON with this structure: { "name": string, "age"?: number, "email"?: string } - Expected:
{"name":"John","age":30,"email":"john@example.com"} - Actual:
"" - Duration: 682ms
- Error Type: Error
- Error Code: UNKNOWN
- Error Message: Invalid response from API
- Reason: Invalid response from API
- Timestamp: 4/1/2025, 1:32:46 PM
Passed Tests
No passed tests