mono/packages/kbot/tests/unit/reports/format.md
2025-04-06 17:49:35 +02:00

8.6 KiB

Format Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
json_formatting openrouter/quasar-alpha 806 0.81
json_formatting openai/gpt-4o-mini 1169 1.17
json_formatting openai/gpt-3.5-turbo 1295 1.29
json_formatting deepseek/deepseek-r1-distill-qwen-14b:free 6959 6.96
markdown_formatting openai/gpt-3.5-turbo 1010 1.01
markdown_formatting openrouter/quasar-alpha 1107 1.11
markdown_formatting openai/gpt-4o-mini 1123 1.12
markdown_formatting deepseek/deepseek-r1-distill-qwen-14b:free 3242 3.24
code_formatting deepseek/deepseek-r1-distill-qwen-14b:free 448 0.45
code_formatting openai/gpt-3.5-turbo 855 0.85
code_formatting openrouter/quasar-alpha 1174 1.17
code_formatting openai/gpt-4o-mini 1361 1.36
date_formatting deepseek/deepseek-r1-distill-qwen-14b:free 421 0.42
date_formatting openai/gpt-3.5-turbo 787 0.79
date_formatting openai/gpt-4o-mini 952 0.95
date_formatting openrouter/quasar-alpha 1164 1.16
currency_formatting deepseek/deepseek-r1-distill-qwen-14b:free 463 0.46
currency_formatting openai/gpt-4o-mini 903 0.90
currency_formatting openrouter/quasar-alpha 1121 1.12
currency_formatting openai/gpt-3.5-turbo 1952 1.95

Summary

  • Total Tests: 20
  • Passed: 7
  • Failed: 13
  • Success Rate: 35.00%
  • Average Duration: 1416ms (1.42s)

Failed Tests

json_formatting - openai/gpt-3.5-turbo

  • Prompt: Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation.
  • Expected: { "name": "John", "age": 30 }
  • Actual: { "name": "John", "age": 30 }
  • Duration: 1295ms (1.29s)
  • Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
  • Timestamp: 4/6/2025, 5:42:04 PM

json_formatting - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation.
  • Expected: { "name": "John", "age": 30 }
  • Actual: { "name": "John", "age": 30 }
  • Duration: 6959ms (6.96s)
  • Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
  • Timestamp: 4/6/2025, 5:42:11 PM

json_formatting - openai/gpt-4o-mini

  • Prompt: Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation.
  • Expected: { "name": "John", "age": 30 }
  • Actual: { "name": "John", "age": 30 }
  • Duration: 1169ms (1.17s)
  • Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
  • Timestamp: 4/6/2025, 5:42:12 PM

json_formatting - openrouter/quasar-alpha

  • Prompt: Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation.
  • Expected: { "name": "John", "age": 30 }
  • Actual: { "name": "John", "age": 30 }
  • Duration: 806ms (0.81s)
  • Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
  • Timestamp: 4/6/2025, 5:42:13 PM

markdown_formatting - openai/gpt-3.5-turbo

  • Prompt: Format this markdown: #title ##subtitle text. Return only the formatted markdown, no explanation.
  • Expected: `# Title

Subtitle

Text`

  • Actual: `# USER Preferences

Preferences`

  • Duration: 1010ms (1.01s)
  • Reason: Expected # Title

Subtitle

Text, but got # user preferences

preferences

  • Timestamp: 4/6/2025, 5:42:14 PM

markdown_formatting - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Format this markdown: #title ##subtitle text. Return only the formatted markdown, no explanation.
  • Expected: `# Title

Subtitle

Text`

  • Actual: #title ##subtitle text
  • Duration: 3242ms (3.24s)
  • Reason: Expected # Title

Subtitle

Text, but got #title ##subtitle text

  • Timestamp: 4/6/2025, 5:42:17 PM

markdown_formatting - openai/gpt-4o-mini

  • Prompt: Format this markdown: #title ##subtitle text. Return only the formatted markdown, no explanation.
  • Expected: `# Title

Subtitle

Text`

  • Actual: `# Preferences

USER Preferences`

  • Duration: 1123ms (1.12s)
  • Reason: Expected # Title

Subtitle

Text, but got # preferences

user preferences

  • Timestamp: 4/6/2025, 5:42:19 PM

markdown_formatting - openrouter/quasar-alpha

  • Prompt: Format this markdown: #title ##subtitle text. Return only the formatted markdown, no explanation.
  • Expected: `# Title

Subtitle

Text`

  • Actual: `# Preferences

You are a helpful AI assistant. When asked to perform calculations, you should return only the numerical result without any explanation or comments.`

  • Duration: 1107ms (1.11s)
  • Reason: Expected # Title

Subtitle

Text, but got # preferences

you are a helpful ai assistant. when asked to perform calculations, you should return only the numerical result without any explanation or comments.

  • Timestamp: 4/6/2025, 5:42:20 PM

code_formatting - openai/gpt-3.5-turbo

  • Prompt: Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation.
  • Expected: function add(a, b) { return a + b; }
  • Actual: function add(a, b) { return a + b; }
  • Duration: 855ms (0.85s)
  • Reason: Expected function add(a, b) { return a + b; }, but got function add(a, b) { return a + b; }
  • Timestamp: 4/6/2025, 5:42:21 PM

code_formatting - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation.
  • Expected: function add(a, b) { return a + b; }
  • Actual: ``
  • Duration: 448ms (0.45s)
  • Reason: Model returned empty response
  • Timestamp: 4/6/2025, 5:42:21 PM

code_formatting - openai/gpt-4o-mini

  • Prompt: Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation.
  • Expected: function add(a, b) { return a + b; }
  • Actual: function add(a, b) { return a + b; }
  • Duration: 1361ms (1.36s)
  • Reason: Expected function add(a, b) { return a + b; }, but got function add(a, b) { return a + b; }
  • Timestamp: 4/6/2025, 5:42:22 PM

date_formatting - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation.
  • Expected: 03/15/2024
  • Actual: ``
  • Duration: 421ms (0.42s)
  • Reason: Model returned empty response
  • Timestamp: 4/6/2025, 5:42:25 PM

currency_formatting - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation.
  • Expected: $1,234.56
  • Actual: ``
  • Duration: 463ms (0.46s)
  • Reason: Model returned empty response
  • Timestamp: 4/6/2025, 5:42:29 PM

Passed Tests

code_formatting - openrouter/quasar-alpha

  • Prompt: Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation.
  • Expected: function add(a, b) { return a + b; }
  • Actual: function add(a, b) { return a + b; }
  • Duration: 1174ms (1.17s)
  • Timestamp: 4/6/2025, 5:42:24 PM

date_formatting - openai/gpt-3.5-turbo

  • Prompt: Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation.
  • Expected: 03/15/2024
  • Actual: 03/15/2024
  • Duration: 787ms (0.79s)
  • Timestamp: 4/6/2025, 5:42:24 PM

date_formatting - openai/gpt-4o-mini

  • Prompt: Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation.
  • Expected: 03/15/2024
  • Actual: 03/15/2024
  • Duration: 952ms (0.95s)
  • Timestamp: 4/6/2025, 5:42:26 PM

date_formatting - openrouter/quasar-alpha

  • Prompt: Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation.
  • Expected: 03/15/2024
  • Actual: 03/15/2024
  • Duration: 1164ms (1.16s)
  • Timestamp: 4/6/2025, 5:42:27 PM

currency_formatting - openai/gpt-3.5-turbo

  • Prompt: Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation.
  • Expected: $1,234.56
  • Actual: $1,234.56
  • Duration: 1952ms (1.95s)
  • Timestamp: 4/6/2025, 5:42:29 PM

currency_formatting - openai/gpt-4o-mini

  • Prompt: Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation.
  • Expected: $1,234.56
  • Actual: $1,234.56
  • Duration: 903ms (0.90s)
  • Timestamp: 4/6/2025, 5:42:30 PM

currency_formatting - openrouter/quasar-alpha

  • Prompt: Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation.
  • Expected: $1,234.56
  • Actual: $1,234.56
  • Duration: 1121ms (1.12s)
  • Timestamp: 4/6/2025, 5:42:31 PM