babayaga 35ee87ef66 kbot:support web urls

2025-04-06 17:49:35 +02:00

8.6 KiB

Raw Blame History

Format Operations Test Results

Highscores

Performance Rankings (Duration)

Test	Model	Duration (ms)	Duration (s)
json_formatting	openrouter/quasar-alpha	806	0.81
json_formatting	openai/gpt-4o-mini	1169	1.17
json_formatting	openai/gpt-3.5-turbo	1295	1.29
json_formatting	deepseek/deepseek-r1-distill-qwen-14b:free	6959	6.96
markdown_formatting	openai/gpt-3.5-turbo	1010	1.01
markdown_formatting	openrouter/quasar-alpha	1107	1.11
markdown_formatting	openai/gpt-4o-mini	1123	1.12
markdown_formatting	deepseek/deepseek-r1-distill-qwen-14b:free	3242	3.24
code_formatting	deepseek/deepseek-r1-distill-qwen-14b:free	448	0.45
code_formatting	openai/gpt-3.5-turbo	855	0.85
code_formatting	openrouter/quasar-alpha	1174	1.17
code_formatting	openai/gpt-4o-mini	1361	1.36
date_formatting	deepseek/deepseek-r1-distill-qwen-14b:free	421	0.42
date_formatting	openai/gpt-3.5-turbo	787	0.79
date_formatting	openai/gpt-4o-mini	952	0.95
date_formatting	openrouter/quasar-alpha	1164	1.16
currency_formatting	deepseek/deepseek-r1-distill-qwen-14b:free	463	0.46
currency_formatting	openai/gpt-4o-mini	903	0.90
currency_formatting	openrouter/quasar-alpha	1121	1.12
currency_formatting	openai/gpt-3.5-turbo	1952	1.95

Summary

Total Tests: 20
Passed: 7
Failed: 13
Success Rate: 35.00%
Average Duration: 1416ms (1.42s)

Failed Tests

json_formatting - openai/gpt-3.5-turbo

Prompt: Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation.
Expected: { "name": "John", "age": 30 }
Actual: { "name": "John", "age": 30 }
Duration: 1295ms (1.29s)
Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
Timestamp: 4/6/2025, 5:42:04 PM

json_formatting - deepseek/deepseek-r1-distill-qwen-14b:free

Prompt: Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation.
Expected: { "name": "John", "age": 30 }
Actual: { "name": "John", "age": 30 }
Duration: 6959ms (6.96s)
Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
Timestamp: 4/6/2025, 5:42:11 PM

json_formatting - openai/gpt-4o-mini

Prompt: Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation.
Expected: { "name": "John", "age": 30 }
Actual: { "name": "John", "age": 30 }
Duration: 1169ms (1.17s)
Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
Timestamp: 4/6/2025, 5:42:12 PM

json_formatting - openrouter/quasar-alpha

Prompt: Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation.
Expected: { "name": "John", "age": 30 }
Actual: { "name": "John", "age": 30 }
Duration: 806ms (0.81s)
Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
Timestamp: 4/6/2025, 5:42:13 PM

markdown_formatting - openai/gpt-3.5-turbo

Prompt: Format this markdown: #title ##subtitle text. Return only the formatted markdown, no explanation.
Expected: `# Title

Subtitle

Text`

Actual: `# USER Preferences

Preferences`

Duration: 1010ms (1.01s)
Reason: Expected # Title

Subtitle

Text, but got # user preferences

preferences

Timestamp: 4/6/2025, 5:42:14 PM

markdown_formatting - deepseek/deepseek-r1-distill-qwen-14b:free

Prompt: Format this markdown: #title ##subtitle text. Return only the formatted markdown, no explanation.
Expected: `# Title

Subtitle

Text`

Actual: #title ##subtitle text
Duration: 3242ms (3.24s)
Reason: Expected # Title

Subtitle

Text, but got #title ##subtitle text

Timestamp: 4/6/2025, 5:42:17 PM

markdown_formatting - openai/gpt-4o-mini

Prompt: Format this markdown: #title ##subtitle text. Return only the formatted markdown, no explanation.
Expected: `# Title

Subtitle

Text`

Actual: `# Preferences

USER Preferences`

Duration: 1123ms (1.12s)
Reason: Expected # Title

Subtitle

Text, but got # preferences

user preferences

Timestamp: 4/6/2025, 5:42:19 PM

markdown_formatting - openrouter/quasar-alpha

Prompt: Format this markdown: #title ##subtitle text. Return only the formatted markdown, no explanation.
Expected: `# Title

Subtitle

Text`

Actual: `# Preferences

You are a helpful AI assistant. When asked to perform calculations, you should return only the numerical result without any explanation or comments.`

Duration: 1107ms (1.11s)
Reason: Expected # Title

Subtitle

Text, but got # preferences

you are a helpful ai assistant. when asked to perform calculations, you should return only the numerical result without any explanation or comments.

Timestamp: 4/6/2025, 5:42:20 PM

code_formatting - openai/gpt-3.5-turbo

Prompt: Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation.
Expected: function add(a, b) { return a + b; }
Actual: function add(a, b) { return a + b; }
Duration: 855ms (0.85s)
Reason: Expected function add(a, b) { return a + b; }, but got function add(a, b) { return a + b; }
Timestamp: 4/6/2025, 5:42:21 PM

code_formatting - deepseek/deepseek-r1-distill-qwen-14b:free

Prompt: Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation.
Expected: function add(a, b) { return a + b; }
Actual: ``
Duration: 448ms (0.45s)
Reason: Model returned empty response
Timestamp: 4/6/2025, 5:42:21 PM

code_formatting - openai/gpt-4o-mini

Prompt: Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation.
Expected: function add(a, b) { return a + b; }
Actual: function add(a, b) { return a + b; }
Duration: 1361ms (1.36s)
Reason: Expected function add(a, b) { return a + b; }, but got function add(a, b) { return a + b; }
Timestamp: 4/6/2025, 5:42:22 PM

date_formatting - deepseek/deepseek-r1-distill-qwen-14b:free

Prompt: Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation.
Expected: 03/15/2024
Actual: ``
Duration: 421ms (0.42s)
Reason: Model returned empty response
Timestamp: 4/6/2025, 5:42:25 PM

currency_formatting - deepseek/deepseek-r1-distill-qwen-14b:free

Prompt: Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation.
Expected: $1,234.56
Actual: ``
Duration: 463ms (0.46s)
Reason: Model returned empty response
Timestamp: 4/6/2025, 5:42:29 PM

Passed Tests

code_formatting - openrouter/quasar-alpha

Prompt: Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation.
Expected: function add(a, b) { return a + b; }
Actual: function add(a, b) { return a + b; }
Duration: 1174ms (1.17s)
Timestamp: 4/6/2025, 5:42:24 PM

date_formatting - openai/gpt-3.5-turbo

Prompt: Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation.
Expected: 03/15/2024
Actual: 03/15/2024
Duration: 787ms (0.79s)
Timestamp: 4/6/2025, 5:42:24 PM

date_formatting - openai/gpt-4o-mini

Prompt: Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation.
Expected: 03/15/2024
Actual: 03/15/2024
Duration: 952ms (0.95s)
Timestamp: 4/6/2025, 5:42:26 PM

date_formatting - openrouter/quasar-alpha

Prompt: Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation.
Expected: 03/15/2024
Actual: 03/15/2024
Duration: 1164ms (1.16s)
Timestamp: 4/6/2025, 5:42:27 PM

currency_formatting - openai/gpt-3.5-turbo

Prompt: Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation.
Expected: $1,234.56
Actual: $1,234.56
Duration: 1952ms (1.95s)
Timestamp: 4/6/2025, 5:42:29 PM

currency_formatting - openai/gpt-4o-mini

Prompt: Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation.
Expected: $1,234.56
Actual: $1,234.56
Duration: 903ms (0.90s)
Timestamp: 4/6/2025, 5:42:30 PM

currency_formatting - openrouter/quasar-alpha

Prompt: Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation.
Expected: $1,234.56
Actual: $1,234.56
Duration: 1121ms (1.12s)
Timestamp: 4/6/2025, 5:42:31 PM

8.6 KiB Raw Blame History

Format Operations Test Results

Highscores

Performance Rankings (Duration)

Summary

Failed Tests

json_formatting - openai/gpt-3.5-turbo

json_formatting - deepseek/deepseek-r1-distill-qwen-14b:free

json_formatting - openai/gpt-4o-mini

json_formatting - openrouter/quasar-alpha

markdown_formatting - openai/gpt-3.5-turbo

Subtitle

Preferences`

Subtitle

preferences

markdown_formatting - deepseek/deepseek-r1-distill-qwen-14b:free

Subtitle

Subtitle

markdown_formatting - openai/gpt-4o-mini

Subtitle

USER Preferences`

Subtitle

user preferences

markdown_formatting - openrouter/quasar-alpha

Subtitle

Subtitle

code_formatting - openai/gpt-3.5-turbo

code_formatting - deepseek/deepseek-r1-distill-qwen-14b:free

code_formatting - openai/gpt-4o-mini

date_formatting - deepseek/deepseek-r1-distill-qwen-14b:free

currency_formatting - deepseek/deepseek-r1-distill-qwen-14b:free

Passed Tests

code_formatting - openrouter/quasar-alpha

date_formatting - openai/gpt-3.5-turbo

date_formatting - openai/gpt-4o-mini

date_formatting - openrouter/quasar-alpha

currency_formatting - openai/gpt-3.5-turbo

currency_formatting - openai/gpt-4o-mini

currency_formatting - openrouter/quasar-alpha

8.6 KiB

Raw Blame History