6.6 KiB
6.6 KiB
Format Operations Test Results
Highscores
Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|---|---|---|---|
| json_formatting | openai/gpt-3.5-turbo | 1283 | 1.28 |
| json_formatting | openrouter/quasar-alpha | 1474 | 1.47 |
| json_formatting | openai/gpt-4o-mini | 1846 | 1.85 |
| markdown_formatting | openrouter/quasar-alpha | 1194 | 1.19 |
| markdown_formatting | openai/gpt-3.5-turbo | 1301 | 1.30 |
| markdown_formatting | openai/gpt-4o-mini | 1734 | 1.73 |
| code_formatting | openai/gpt-4o-mini | 1227 | 1.23 |
| code_formatting | openrouter/quasar-alpha | 1252 | 1.25 |
| code_formatting | openai/gpt-3.5-turbo | 1297 | 1.30 |
| date_formatting | openai/gpt-3.5-turbo | 878 | 0.88 |
| date_formatting | openai/gpt-4o-mini | 878 | 0.88 |
| date_formatting | openrouter/quasar-alpha | 1435 | 1.44 |
| currency_formatting | openai/gpt-3.5-turbo | 795 | 0.80 |
| currency_formatting | openai/gpt-4o-mini | 813 | 0.81 |
| currency_formatting | openrouter/quasar-alpha | 1044 | 1.04 |
Summary
- Total Tests: 15
- Passed: 7
- Failed: 8
- Success Rate: 46.67%
- Average Duration: 1230ms (1.23s)
Failed Tests
json_formatting - openai/gpt-3.5-turbo
- Prompt:
Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation. - Expected:
{ "name": "John", "age": 30 } - Actual:
{"name": "John", "age": 30} - Duration: 1283ms (1.28s)
- Reason: Expected { "name": "John", "age": 30 }, but got {"name": "john", "age": 30}
- Timestamp: 4/7/2025, 7:18:26 PM
json_formatting - openai/gpt-4o-mini
- Prompt:
Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation. - Expected:
{ "name": "John", "age": 30 } - Actual:
{ "name": "John", "age": 30 } - Duration: 1846ms (1.85s)
- Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
- Timestamp: 4/7/2025, 7:18:28 PM
json_formatting - openrouter/quasar-alpha
- Prompt:
Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation. - Expected:
{ "name": "John", "age": 30 } - Actual:
{ "name": "John", "age": 30 } - Duration: 1474ms (1.47s)
- Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
- Timestamp: 4/7/2025, 7:18:29 PM
markdown_formatting - openai/gpt-3.5-turbo
- Prompt:
Create markdown: H1=The Title, H2=The Subtitle, P=This is the body text. Respond ONLY with markdown. Do not mention 'user preferences' or 'undefined'. - Expected: `# The Title
The Subtitle
This is the body text.`
- Actual: `# The Title
The Subtitle
This is the body text.`
- Duration: 1301ms (1.30s)
- Reason: Expected # The Title
The Subtitle
This is the body text., but got # the title
the subtitle
this is the body text.
- Timestamp: 4/7/2025, 7:18:31 PM
markdown_formatting - openai/gpt-4o-mini
- Prompt:
Create markdown: H1=The Title, H2=The Subtitle, P=This is the body text. Respond ONLY with markdown. Do not mention 'user preferences' or 'undefined'. - Expected: `# The Title
The Subtitle
This is the body text.`
- Actual: `# The Title
The Subtitle
This is the body text.`
- Duration: 1734ms (1.73s)
- Reason: Expected # The Title
The Subtitle
This is the body text., but got # the title
the subtitle
this is the body text.
- Timestamp: 4/7/2025, 7:18:32 PM
markdown_formatting - openrouter/quasar-alpha
- Prompt:
Create markdown: H1=The Title, H2=The Subtitle, P=This is the body text. Respond ONLY with markdown. Do not mention 'user preferences' or 'undefined'. - Expected: `# The Title
The Subtitle
This is the body text.`
- Actual: `# The Title
The Subtitle
This is the body text.`
- Duration: 1194ms (1.19s)
- Reason: Expected # The Title
The Subtitle
This is the body text., but got # the title
the subtitle
this is the body text.
- Timestamp: 4/7/2025, 7:18:34 PM
code_formatting - openai/gpt-3.5-turbo
- Prompt:
Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation. - Expected:
function add(a, b) { return a + b; } - Actual:
function add(a, b) { return a + b; } - Duration: 1297ms (1.30s)
- Reason: Expected function add(a, b) { return a + b; }, but got function add(a, b) { return a + b; }
- Timestamp: 4/7/2025, 7:18:35 PM
code_formatting - openai/gpt-4o-mini
- Prompt:
Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation. - Expected:
function add(a, b) { return a + b; } - Actual:
function add(a, b) { return a + b; } - Duration: 1227ms (1.23s)
- Reason: Expected function add(a, b) { return a + b; }, but got function add(a, b) { return a + b; }
- Timestamp: 4/7/2025, 7:18:36 PM
Passed Tests
code_formatting - openrouter/quasar-alpha
- Prompt:
Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation. - Expected:
function add(a, b) { return a + b; } - Actual:
function add(a, b) { return a + b; } - Duration: 1252ms (1.25s)
- Timestamp: 4/7/2025, 7:18:37 PM
date_formatting - openai/gpt-3.5-turbo
- Prompt:
Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation. - Expected:
03/15/2024 - Actual:
03/15/2024 - Duration: 878ms (0.88s)
- Timestamp: 4/7/2025, 7:18:38 PM
date_formatting - openai/gpt-4o-mini
- Prompt:
Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation. - Expected:
03/15/2024 - Actual:
03/15/2024 - Duration: 878ms (0.88s)
- Timestamp: 4/7/2025, 7:18:39 PM
date_formatting - openrouter/quasar-alpha
- Prompt:
Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation. - Expected:
03/15/2024 - Actual:
03/15/2024 - Duration: 1435ms (1.44s)
- Timestamp: 4/7/2025, 7:18:41 PM
currency_formatting - openai/gpt-3.5-turbo
- Prompt:
Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation. - Expected:
$1,234.56 - Actual:
$1,234.56 - Duration: 795ms (0.80s)
- Timestamp: 4/7/2025, 7:18:41 PM
currency_formatting - openai/gpt-4o-mini
- Prompt:
Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation. - Expected:
$1,234.56 - Actual:
$1,234.56 - Duration: 813ms (0.81s)
- Timestamp: 4/7/2025, 7:18:42 PM
currency_formatting - openrouter/quasar-alpha
- Prompt:
Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation. - Expected:
$1,234.56 - Actual:
$1,234.56 - Duration: 1044ms (1.04s)
- Timestamp: 4/7/2025, 7:18:43 PM