mono/packages/kbot/tests/unit/reports/format.md

6.6 KiB

Format Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
json_formatting openai/gpt-3.5-turbo 1283 1.28
json_formatting openrouter/quasar-alpha 1474 1.47
json_formatting openai/gpt-4o-mini 1846 1.85
markdown_formatting openrouter/quasar-alpha 1194 1.19
markdown_formatting openai/gpt-3.5-turbo 1301 1.30
markdown_formatting openai/gpt-4o-mini 1734 1.73
code_formatting openai/gpt-4o-mini 1227 1.23
code_formatting openrouter/quasar-alpha 1252 1.25
code_formatting openai/gpt-3.5-turbo 1297 1.30
date_formatting openai/gpt-3.5-turbo 878 0.88
date_formatting openai/gpt-4o-mini 878 0.88
date_formatting openrouter/quasar-alpha 1435 1.44
currency_formatting openai/gpt-3.5-turbo 795 0.80
currency_formatting openai/gpt-4o-mini 813 0.81
currency_formatting openrouter/quasar-alpha 1044 1.04

Summary

  • Total Tests: 15
  • Passed: 7
  • Failed: 8
  • Success Rate: 46.67%
  • Average Duration: 1230ms (1.23s)

Failed Tests

json_formatting - openai/gpt-3.5-turbo

  • Prompt: Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation.
  • Expected: { "name": "John", "age": 30 }
  • Actual: {"name": "John", "age": 30}
  • Duration: 1283ms (1.28s)
  • Reason: Expected { "name": "John", "age": 30 }, but got {"name": "john", "age": 30}
  • Timestamp: 4/7/2025, 7:18:26 PM

json_formatting - openai/gpt-4o-mini

  • Prompt: Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation.
  • Expected: { "name": "John", "age": 30 }
  • Actual: { "name": "John", "age": 30 }
  • Duration: 1846ms (1.85s)
  • Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
  • Timestamp: 4/7/2025, 7:18:28 PM

json_formatting - openrouter/quasar-alpha

  • Prompt: Format this JSON: {"name":"John","age":30}. Return only the formatted JSON, no explanation.
  • Expected: { "name": "John", "age": 30 }
  • Actual: { "name": "John", "age": 30 }
  • Duration: 1474ms (1.47s)
  • Reason: Expected { "name": "John", "age": 30 }, but got { "name": "john", "age": 30 }
  • Timestamp: 4/7/2025, 7:18:29 PM

markdown_formatting - openai/gpt-3.5-turbo

  • Prompt: Create markdown: H1=The Title, H2=The Subtitle, P=This is the body text. Respond ONLY with markdown. Do not mention 'user preferences' or 'undefined'.
  • Expected: `# The Title

The Subtitle

This is the body text.`

  • Actual: `# The Title

The Subtitle

This is the body text.`

  • Duration: 1301ms (1.30s)
  • Reason: Expected # The Title

The Subtitle

This is the body text., but got # the title

the subtitle

this is the body text.

  • Timestamp: 4/7/2025, 7:18:31 PM

markdown_formatting - openai/gpt-4o-mini

  • Prompt: Create markdown: H1=The Title, H2=The Subtitle, P=This is the body text. Respond ONLY with markdown. Do not mention 'user preferences' or 'undefined'.
  • Expected: `# The Title

The Subtitle

This is the body text.`

  • Actual: `# The Title

The Subtitle

This is the body text.`

  • Duration: 1734ms (1.73s)
  • Reason: Expected # The Title

The Subtitle

This is the body text., but got # the title

the subtitle

this is the body text.

  • Timestamp: 4/7/2025, 7:18:32 PM

markdown_formatting - openrouter/quasar-alpha

  • Prompt: Create markdown: H1=The Title, H2=The Subtitle, P=This is the body text. Respond ONLY with markdown. Do not mention 'user preferences' or 'undefined'.
  • Expected: `# The Title

The Subtitle

This is the body text.`

  • Actual: `# The Title

The Subtitle

This is the body text.`

  • Duration: 1194ms (1.19s)
  • Reason: Expected # The Title

The Subtitle

This is the body text., but got # the title

the subtitle

this is the body text.

  • Timestamp: 4/7/2025, 7:18:34 PM

code_formatting - openai/gpt-3.5-turbo

  • Prompt: Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation.
  • Expected: function add(a, b) { return a + b; }
  • Actual: function add(a, b) { return a + b; }
  • Duration: 1297ms (1.30s)
  • Reason: Expected function add(a, b) { return a + b; }, but got function add(a, b) { return a + b; }
  • Timestamp: 4/7/2025, 7:18:35 PM

code_formatting - openai/gpt-4o-mini

  • Prompt: Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation.
  • Expected: function add(a, b) { return a + b; }
  • Actual: function add(a, b) { return a + b; }
  • Duration: 1227ms (1.23s)
  • Reason: Expected function add(a, b) { return a + b; }, but got function add(a, b) { return a + b; }
  • Timestamp: 4/7/2025, 7:18:36 PM

Passed Tests

code_formatting - openrouter/quasar-alpha

  • Prompt: Format this code: function add(a,b){return a+b}. Return only the formatted code, no explanation.
  • Expected: function add(a, b) { return a + b; }
  • Actual: function add(a, b) { return a + b; }
  • Duration: 1252ms (1.25s)
  • Timestamp: 4/7/2025, 7:18:37 PM

date_formatting - openai/gpt-3.5-turbo

  • Prompt: Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation.
  • Expected: 03/15/2024
  • Actual: 03/15/2024
  • Duration: 878ms (0.88s)
  • Timestamp: 4/7/2025, 7:18:38 PM

date_formatting - openai/gpt-4o-mini

  • Prompt: Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation.
  • Expected: 03/15/2024
  • Actual: 03/15/2024
  • Duration: 878ms (0.88s)
  • Timestamp: 4/7/2025, 7:18:39 PM

date_formatting - openrouter/quasar-alpha

  • Prompt: Format this date: 2024-03-15. Return only the formatted date in MM/DD/YYYY format, no explanation.
  • Expected: 03/15/2024
  • Actual: 03/15/2024
  • Duration: 1435ms (1.44s)
  • Timestamp: 4/7/2025, 7:18:41 PM

currency_formatting - openai/gpt-3.5-turbo

  • Prompt: Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation.
  • Expected: $1,234.56
  • Actual: $1,234.56
  • Duration: 795ms (0.80s)
  • Timestamp: 4/7/2025, 7:18:41 PM

currency_formatting - openai/gpt-4o-mini

  • Prompt: Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation.
  • Expected: $1,234.56
  • Actual: $1,234.56
  • Duration: 813ms (0.81s)
  • Timestamp: 4/7/2025, 7:18:42 PM

currency_formatting - openrouter/quasar-alpha

  • Prompt: Format this number as USD currency: 1234.56. Return only the formatted currency, no explanation.
  • Expected: $1,234.56
  • Actual: $1,234.56
  • Duration: 1044ms (1.04s)
  • Timestamp: 4/7/2025, 7:18:43 PM