4.6 KiB
4.6 KiB
Format Operations Test Results
Highscores
Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|---|---|---|---|
| json_formatting | openai/gpt-4o-mini | 840 | 0.84 |
| json_formatting | openai/gpt-3.5-turbo | 1815 | 1.81 |
| markdown_formatting | openai/gpt-3.5-turbo | 699 | 0.70 |
| markdown_formatting | openai/gpt-4o-mini | 862 | 0.86 |
| code_formatting | openai/gpt-3.5-turbo | 637 | 0.64 |
| code_formatting | openai/gpt-4o-mini | 860 | 0.86 |
| date_formatting | openai/gpt-3.5-turbo | 552 | 0.55 |
| date_formatting | openai/gpt-4o-mini | 3548 | 3.55 |
| currency_formatting | openai/gpt-4o-mini | 634 | 0.63 |
| currency_formatting | openai/gpt-3.5-turbo | 870 | 0.87 |
Summary
- Total Tests: 10
- Passed: 4
- Failed: 6
- Success Rate: 40.00%
- Average Duration: 1132ms (1.13s)
Failed Tests
json_formatting - openai/gpt-3.5-turbo
- Prompt:
Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation. - Expected:
{ "name": "John", "age": 30 } - Actual:
{ "name": "John", "age": 30 } - Duration: 1815ms (1.81s)
- Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
- Timestamp: 6/5/2025, 8:46:08 PM
json_formatting - openai/gpt-4o-mini
- Prompt:
Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation. - Expected:
{ "name": "John", "age": 30 } - Actual:
{ "name": "John", "age": 30 } - Duration: 840ms (0.84s)
- Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
- Timestamp: 6/5/2025, 8:46:09 PM
markdown_formatting - openai/gpt-3.5-turbo
- Prompt:
Create markdown: H1=The Title, H2=The Subtitle, P=This is the body text. Respond ONLY with markdown. Do not mention 'user preferences' or 'undefined'. - Expected: `# The Title
The Subtitle
This is the body text.`
- Actual: `# The Title
The Subtitle
This is the body text.`
- Duration: 699ms (0.70s)
- Reason: Expected # The Title
The Subtitle
This is the body text., but got # the title
the subtitle
this is the body text.
- Timestamp: 6/5/2025, 8:46:10 PM
markdown_formatting - openai/gpt-4o-mini
- Prompt:
Create markdown: H1=The Title, H2=The Subtitle, P=This is the body text. Respond ONLY with markdown. Do not mention 'user preferences' or 'undefined'. - Expected: `# The Title
The Subtitle
This is the body text.`
- Actual: `# The Title
The Subtitle
This is the body text.`
- Duration: 862ms (0.86s)
- Reason: Expected # The Title
The Subtitle
This is the body text., but got # the title
the subtitle
this is the body text.
- Timestamp: 6/5/2025, 8:46:10 PM
code_formatting - openai/gpt-3.5-turbo
- Prompt:
Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation. - Expected:
function add(a, b) { return a + b; } - Actual:
function add(a, b) { return a + b; } - Duration: 637ms (0.64s)
- Reason: Expected function add(a, b) { return a + b; }, but got function add(a, b) { return a + b; }
- Timestamp: 6/5/2025, 8:46:11 PM
code_formatting - openai/gpt-4o-mini
- Prompt:
Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation. - Expected:
function add(a, b) { return a + b; } - Actual:
function add(a, b) { return a + b; } - Duration: 860ms (0.86s)
- Reason: Expected function add(a, b) { return a + b; }, but got function add(a, b) { return a + b; }
- Timestamp: 6/5/2025, 8:46:12 PM
Passed Tests
date_formatting - openai/gpt-3.5-turbo
- Prompt:
Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation. - Expected:
03/15/2024 - Actual:
03/15/2024 - Duration: 552ms (0.55s)
- Timestamp: 6/5/2025, 8:46:13 PM
date_formatting - openai/gpt-4o-mini
- Prompt:
Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation. - Expected:
03/15/2024 - Actual:
03/15/2024 - Duration: 3548ms (3.55s)
- Timestamp: 6/5/2025, 8:46:16 PM
currency_formatting - openai/gpt-3.5-turbo
- Prompt:
Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation. - Expected:
$1,234.56 - Actual:
$1,234.56 - Duration: 870ms (0.87s)
- Timestamp: 6/5/2025, 8:46:17 PM
currency_formatting - openai/gpt-4o-mini
- Prompt:
Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation. - Expected:
$1,234.56 - Actual:
$1,234.56 - Duration: 634ms (0.63s)
- Timestamp: 6/5/2025, 8:46:18 PM