8.6 KiB
8.6 KiB
Format Operations Test Results
Highscores
Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|---|---|---|---|
| json_formatting | openrouter/quasar-alpha | 806 | 0.81 |
| json_formatting | openai/gpt-4o-mini | 1169 | 1.17 |
| json_formatting | openai/gpt-3.5-turbo | 1295 | 1.29 |
| json_formatting | deepseek/deepseek-r1-distill-qwen-14b:free | 6959 | 6.96 |
| markdown_formatting | openai/gpt-3.5-turbo | 1010 | 1.01 |
| markdown_formatting | openrouter/quasar-alpha | 1107 | 1.11 |
| markdown_formatting | openai/gpt-4o-mini | 1123 | 1.12 |
| markdown_formatting | deepseek/deepseek-r1-distill-qwen-14b:free | 3242 | 3.24 |
| code_formatting | deepseek/deepseek-r1-distill-qwen-14b:free | 448 | 0.45 |
| code_formatting | openai/gpt-3.5-turbo | 855 | 0.85 |
| code_formatting | openrouter/quasar-alpha | 1174 | 1.17 |
| code_formatting | openai/gpt-4o-mini | 1361 | 1.36 |
| date_formatting | deepseek/deepseek-r1-distill-qwen-14b:free | 421 | 0.42 |
| date_formatting | openai/gpt-3.5-turbo | 787 | 0.79 |
| date_formatting | openai/gpt-4o-mini | 952 | 0.95 |
| date_formatting | openrouter/quasar-alpha | 1164 | 1.16 |
| currency_formatting | deepseek/deepseek-r1-distill-qwen-14b:free | 463 | 0.46 |
| currency_formatting | openai/gpt-4o-mini | 903 | 0.90 |
| currency_formatting | openrouter/quasar-alpha | 1121 | 1.12 |
| currency_formatting | openai/gpt-3.5-turbo | 1952 | 1.95 |
Summary
- Total Tests: 20
- Passed: 7
- Failed: 13
- Success Rate: 35.00%
- Average Duration: 1416ms (1.42s)
Failed Tests
json_formatting - openai/gpt-3.5-turbo
- Prompt:
Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation. - Expected:
{ "name": "John", "age": 30 } - Actual:
{ "name": "John", "age": 30 } - Duration: 1295ms (1.29s)
- Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
- Timestamp: 4/6/2025, 5:42:04 PM
json_formatting - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation. - Expected:
{ "name": "John", "age": 30 } - Actual:
{ "name": "John", "age": 30 } - Duration: 6959ms (6.96s)
- Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
- Timestamp: 4/6/2025, 5:42:11 PM
json_formatting - openai/gpt-4o-mini
- Prompt:
Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation. - Expected:
{ "name": "John", "age": 30 } - Actual:
{ "name": "John", "age": 30 } - Duration: 1169ms (1.17s)
- Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
- Timestamp: 4/6/2025, 5:42:12 PM
json_formatting - openrouter/quasar-alpha
- Prompt:
Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation. - Expected:
{ "name": "John", "age": 30 } - Actual:
{ "name": "John", "age": 30 } - Duration: 806ms (0.81s)
- Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
- Timestamp: 4/6/2025, 5:42:13 PM
markdown_formatting - openai/gpt-3.5-turbo
- Prompt:
Format this markdown: #title ##subtitle text. Return only the formatted markdown, no explanation. - Expected: `# Title
Subtitle
Text`
- Actual: `# USER Preferences
Preferences`
- Duration: 1010ms (1.01s)
- Reason: Expected # Title
Subtitle
Text, but got # user preferences
preferences
- Timestamp: 4/6/2025, 5:42:14 PM
markdown_formatting - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Format this markdown: #title ##subtitle text. Return only the formatted markdown, no explanation. - Expected: `# Title
Subtitle
Text`
- Actual:
#title ##subtitle text - Duration: 3242ms (3.24s)
- Reason: Expected # Title
Subtitle
Text, but got #title ##subtitle text
- Timestamp: 4/6/2025, 5:42:17 PM
markdown_formatting - openai/gpt-4o-mini
- Prompt:
Format this markdown: #title ##subtitle text. Return only the formatted markdown, no explanation. - Expected: `# Title
Subtitle
Text`
- Actual: `# Preferences
USER Preferences`
- Duration: 1123ms (1.12s)
- Reason: Expected # Title
Subtitle
Text, but got # preferences
user preferences
- Timestamp: 4/6/2025, 5:42:19 PM
markdown_formatting - openrouter/quasar-alpha
- Prompt:
Format this markdown: #title ##subtitle text. Return only the formatted markdown, no explanation. - Expected: `# Title
Subtitle
Text`
- Actual: `# Preferences
You are a helpful AI assistant. When asked to perform calculations, you should return only the numerical result without any explanation or comments.`
- Duration: 1107ms (1.11s)
- Reason: Expected # Title
Subtitle
Text, but got # preferences
you are a helpful ai assistant. when asked to perform calculations, you should return only the numerical result without any explanation or comments.
- Timestamp: 4/6/2025, 5:42:20 PM
code_formatting - openai/gpt-3.5-turbo
- Prompt:
Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation. - Expected:
function add(a, b) { return a + b; } - Actual:
function add(a, b) { return a + b; } - Duration: 855ms (0.85s)
- Reason: Expected function add(a, b) { return a + b; }, but got function add(a, b) { return a + b; }
- Timestamp: 4/6/2025, 5:42:21 PM
code_formatting - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation. - Expected:
function add(a, b) { return a + b; } - Actual: ``
- Duration: 448ms (0.45s)
- Reason: Model returned empty response
- Timestamp: 4/6/2025, 5:42:21 PM
code_formatting - openai/gpt-4o-mini
- Prompt:
Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation. - Expected:
function add(a, b) { return a + b; } - Actual:
function add(a, b) { return a + b; } - Duration: 1361ms (1.36s)
- Reason: Expected function add(a, b) { return a + b; }, but got function add(a, b) { return a + b; }
- Timestamp: 4/6/2025, 5:42:22 PM
date_formatting - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation. - Expected:
03/15/2024 - Actual: ``
- Duration: 421ms (0.42s)
- Reason: Model returned empty response
- Timestamp: 4/6/2025, 5:42:25 PM
currency_formatting - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation. - Expected:
$1,234.56 - Actual: ``
- Duration: 463ms (0.46s)
- Reason: Model returned empty response
- Timestamp: 4/6/2025, 5:42:29 PM
Passed Tests
code_formatting - openrouter/quasar-alpha
- Prompt:
Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation. - Expected:
function add(a, b) { return a + b; } - Actual:
function add(a, b) { return a + b; } - Duration: 1174ms (1.17s)
- Timestamp: 4/6/2025, 5:42:24 PM
date_formatting - openai/gpt-3.5-turbo
- Prompt:
Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation. - Expected:
03/15/2024 - Actual:
03/15/2024 - Duration: 787ms (0.79s)
- Timestamp: 4/6/2025, 5:42:24 PM
date_formatting - openai/gpt-4o-mini
- Prompt:
Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation. - Expected:
03/15/2024 - Actual:
03/15/2024 - Duration: 952ms (0.95s)
- Timestamp: 4/6/2025, 5:42:26 PM
date_formatting - openrouter/quasar-alpha
- Prompt:
Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation. - Expected:
03/15/2024 - Actual:
03/15/2024 - Duration: 1164ms (1.16s)
- Timestamp: 4/6/2025, 5:42:27 PM
currency_formatting - openai/gpt-3.5-turbo
- Prompt:
Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation. - Expected:
$1,234.56 - Actual:
$1,234.56 - Duration: 1952ms (1.95s)
- Timestamp: 4/6/2025, 5:42:29 PM
currency_formatting - openai/gpt-4o-mini
- Prompt:
Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation. - Expected:
$1,234.56 - Actual:
$1,234.56 - Duration: 903ms (0.90s)
- Timestamp: 4/6/2025, 5:42:30 PM
currency_formatting - openrouter/quasar-alpha
- Prompt:
Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation. - Expected:
$1,234.56 - Actual:
$1,234.56 - Duration: 1121ms (1.12s)
- Timestamp: 4/6/2025, 5:42:31 PM