mono/packages/kbot/tests/unit/reports/language.md

8.1 KiB

Language Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
translation openai/gpt-4o-mini 769 0.77
translation openrouter/quasar-alpha 832 0.83
translation openai/gpt-3.5-turbo 1322 1.32
translation deepseek/deepseek-r1-distill-qwen-14b:free 3972 3.97
grammar openai/gpt-3.5-turbo 819 0.82
grammar openrouter/quasar-alpha 1011 1.01
grammar openai/gpt-4o-mini 1113 1.11
grammar deepseek/deepseek-r1-distill-qwen-14b:free 7029 7.03
summarization openai/gpt-3.5-turbo 832 0.83
summarization openrouter/quasar-alpha 832 0.83
summarization openai/gpt-4o-mini 1048 1.05
summarization deepseek/deepseek-r1-distill-qwen-14b:free 4994 4.99
language_detection deepseek/deepseek-r1-distill-qwen-14b:free 419 0.42
language_detection openai/gpt-3.5-turbo 741 0.74
language_detection openrouter/quasar-alpha 886 0.89
language_detection openai/gpt-4o-mini 887 0.89
synonyms deepseek/deepseek-r1-distill-qwen-14b:free 411 0.41
synonyms openai/gpt-4o-mini 644 0.64
synonyms openai/gpt-3.5-turbo 715 0.71
synonyms openrouter/quasar-alpha 716 0.72

Summary

  • Total Tests: 20
  • Passed: 2
  • Failed: 18
  • Success Rate: 10.00%
  • Average Duration: 1500ms (1.50s)

Failed Tests

translation - openai/gpt-3.5-turbo

  • Prompt: Translate "Hello, world!" to Spanish. Return only the translation, no explanation.
  • Expected: ¡Hola, mundo!
  • Actual: ¡Hola, mundo!
  • Duration: 1322ms (1.32s)
  • Reason: Expected ¡Hola, mundo!, but got ¡hola, mundo!
  • Timestamp: 4/7/2025, 12:28:08 AM

translation - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Translate "Hello, world!" to Spanish. Return only the translation, no explanation.
  • Expected: ¡Hola, mundo!
  • Actual: ¡Hola, mundo!
  • Duration: 3972ms (3.97s)
  • Reason: Expected ¡Hola, mundo!, but got ¡hola, mundo!
  • Timestamp: 4/7/2025, 12:28:12 AM

translation - openai/gpt-4o-mini

  • Prompt: Translate "Hello, world!" to Spanish. Return only the translation, no explanation.
  • Expected: ¡Hola, mundo!
  • Actual: ¡Hola, mundo!
  • Duration: 769ms (0.77s)
  • Reason: Expected ¡Hola, mundo!, but got ¡hola, mundo!
  • Timestamp: 4/7/2025, 12:28:12 AM

translation - openrouter/quasar-alpha

  • Prompt: Translate "Hello, world!" to Spanish. Return only the translation, no explanation.
  • Expected: ¡Hola, mundo!
  • Actual: ¡Hola, mundo!
  • Duration: 832ms (0.83s)
  • Reason: Expected ¡Hola, mundo!, but got ¡hola, mundo!
  • Timestamp: 4/7/2025, 12:28:13 AM

grammar - openai/gpt-3.5-turbo

  • Prompt: Correct the grammar in: "I goes to the store yesterday". Return only the corrected sentence, no explanation.
  • Expected: I went to the store yesterday
  • Actual: **Corrected Sentence:** I went to the store yesterday.
  • Duration: 819ms (0.82s)
  • Reason: Expected I went to the store yesterday, but got corrected sentence: i went to the store yesterday.
  • Timestamp: 4/7/2025, 12:28:14 AM

grammar - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Correct the grammar in: "I goes to the store yesterday". Return only the corrected sentence, no explanation.
  • Expected: I went to the store yesterday
  • Actual: I went to the store yesterday.
  • Duration: 7029ms (7.03s)
  • Reason: Expected I went to the store yesterday, but got i went to the store yesterday.
  • Timestamp: 4/7/2025, 12:28:21 AM

grammar - openai/gpt-4o-mini

  • Prompt: Correct the grammar in: "I goes to the store yesterday". Return only the corrected sentence, no explanation.
  • Expected: I went to the store yesterday
  • Actual: I went to the store yesterday.
  • Duration: 1113ms (1.11s)
  • Reason: Expected I went to the store yesterday, but got i went to the store yesterday.
  • Timestamp: 4/7/2025, 12:28:22 AM

grammar - openrouter/quasar-alpha

  • Prompt: Correct the grammar in: "I goes to the store yesterday". Return only the corrected sentence, no explanation.
  • Expected: I went to the store yesterday
  • Actual: I went to the store yesterday.
  • Duration: 1011ms (1.01s)
  • Reason: Expected I went to the store yesterday, but got i went to the store yesterday.
  • Timestamp: 4/7/2025, 12:28:23 AM

summarization - openai/gpt-3.5-turbo

  • Prompt: Summarize: "The quick brown fox jumps over the lazy dog". Return only the summary, no explanation.
  • Expected: A fox jumps over a dog
  • Actual: Summary: The quick brown fox jumps over the lazy dog.
  • Duration: 832ms (0.83s)
  • Reason: Expected A fox jumps over a dog, but got summary: the quick brown fox jumps over the lazy dog.
  • Timestamp: 4/7/2025, 12:28:24 AM

summarization - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Summarize: "The quick brown fox jumps over the lazy dog". Return only the summary, no explanation.
  • Expected: A fox jumps over a dog
  • Actual: A quick brown fox jumps over a lazy dog.
  • Duration: 4994ms (4.99s)
  • Reason: Expected A fox jumps over a dog, but got a quick brown fox jumps over a lazy dog.
  • Timestamp: 4/7/2025, 12:28:29 AM

summarization - openai/gpt-4o-mini

  • Prompt: Summarize: "The quick brown fox jumps over the lazy dog". Return only the summary, no explanation.
  • Expected: A fox jumps over a dog
  • Actual: A swift fox leaps over a sluggish dog.
  • Duration: 1048ms (1.05s)
  • Reason: Expected A fox jumps over a dog, but got a swift fox leaps over a sluggish dog.
  • Timestamp: 4/7/2025, 12:28:30 AM

summarization - openrouter/quasar-alpha

  • Prompt: Summarize: "The quick brown fox jumps over the lazy dog". Return only the summary, no explanation.
  • Expected: A fox jumps over a dog
  • Actual: A fox jumps over a dog.
  • Duration: 832ms (0.83s)
  • Reason: Expected A fox jumps over a dog, but got a fox jumps over a dog.
  • Timestamp: 4/7/2025, 12:28:31 AM

language_detection - openai/gpt-3.5-turbo

  • Prompt: Identify the language of: "Bonjour, comment allez-vous?". Return only the language name, no explanation.
  • Expected: French
  • Actual: French
  • Duration: 741ms (0.74s)
  • Reason: Expected French, but got french
  • Timestamp: 4/7/2025, 12:28:32 AM

language_detection - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Identify the language of: "Bonjour, comment allez-vous?". Return only the language name, no explanation.
  • Expected: French
  • Actual: ``
  • Duration: 419ms (0.42s)
  • Reason: Model returned empty response
  • Timestamp: 4/7/2025, 12:28:32 AM

language_detection - openai/gpt-4o-mini

  • Prompt: Identify the language of: "Bonjour, comment allez-vous?". Return only the language name, no explanation.
  • Expected: French
  • Actual: French
  • Duration: 887ms (0.89s)
  • Reason: Expected French, but got french
  • Timestamp: 4/7/2025, 12:28:33 AM

language_detection - openrouter/quasar-alpha

  • Prompt: Identify the language of: "Bonjour, comment allez-vous?". Return only the language name, no explanation.
  • Expected: French
  • Actual: French
  • Duration: 886ms (0.89s)
  • Reason: Expected French, but got french
  • Timestamp: 4/7/2025, 12:28:34 AM

synonyms - openai/gpt-3.5-turbo

  • Prompt: Provide a synonym for "happy". Return only the synonym, no explanation.
  • Expected: joyful
  • Actual: Delighted
  • Duration: 715ms (0.71s)
  • Reason: Expected joyful, but got delighted
  • Timestamp: 4/7/2025, 12:28:35 AM

synonyms - deepseek/deepseek-r1-distill-qwen-14b:free

  • Prompt: Provide a synonym for "happy". Return only the synonym, no explanation.
  • Expected: joyful
  • Actual: ``
  • Duration: 411ms (0.41s)
  • Reason: Model returned empty response
  • Timestamp: 4/7/2025, 12:28:35 AM

Passed Tests

synonyms - openai/gpt-4o-mini

  • Prompt: Provide a synonym for "happy". Return only the synonym, no explanation.
  • Expected: joyful
  • Actual: Joyful
  • Duration: 644ms (0.64s)
  • Timestamp: 4/7/2025, 12:28:36 AM

synonyms - openrouter/quasar-alpha

  • Prompt: Provide a synonym for "happy". Return only the synonym, no explanation.
  • Expected: joyful
  • Actual: Joyful
  • Duration: 716ms (0.72s)
  • Timestamp: 4/7/2025, 12:28:37 AM