mono/packages/kbot/tests/unit/reports/language.md
2025-06-28 10:37:04 +02:00

5.4 KiB

Language Operations Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
summarization openai/gpt-4o-mini 1508 1.51
summarization anthropic/claude-sonnet-4 1621 1.62
summarization deepseek/deepseek-r1:free 6000 6.00
translation openai/gpt-4o-mini 521 0.52
translation anthropic/claude-sonnet-4 1250 1.25
translation deepseek/deepseek-r1:free 2969 2.97
grammar openai/gpt-4o-mini 623 0.62
grammar anthropic/claude-sonnet-4 1246 1.25
grammar deepseek/deepseek-r1:free 4594 4.59
language_detection openai/gpt-4o-mini 689 0.69
language_detection anthropic/claude-sonnet-4 1107 1.11
language_detection deepseek/deepseek-r1:free 4218 4.22
synonyms openai/gpt-4o-mini 661 0.66
synonyms anthropic/claude-sonnet-4 1639 1.64
synonyms deepseek/deepseek-r1:free 3471 3.47

Summary

  • Total Tests: 15
  • Passed: 12
  • Failed: 3
  • Success Rate: 80.00%
  • Average Duration: 2141ms (2.14s)

Failed Tests

grammar - anthropic/claude-sonnet-4

  • Prompt: Correct the grammar in: "I goes to the store yesterday". Return only the corrected sentence, no explanation.
  • Expected: I went to the store yesterday
  • Actual: I went to the store yesterday.
  • Duration: 1246ms (1.25s)
  • Reason: Expected I went to the store yesterday, but got I went to the store yesterday.
  • Timestamp: 6/6/2025, 12:56:12 AM

grammar - openai/gpt-4o-mini

  • Prompt: Correct the grammar in: "I goes to the store yesterday". Return only the corrected sentence, no explanation.
  • Expected: I went to the store yesterday
  • Actual: "I went to the store yesterday."
  • Duration: 623ms (0.62s)
  • Reason: Expected I went to the store yesterday, but got "I went to the store yesterday."
  • Timestamp: 6/6/2025, 12:56:12 AM

grammar - deepseek/deepseek-r1:free

  • Prompt: Correct the grammar in: "I goes to the store yesterday". Return only the corrected sentence, no explanation.
  • Expected: I went to the store yesterday
  • Actual: I went to the store yesterday.
  • Duration: 4594ms (4.59s)
  • Reason: Expected I went to the store yesterday, but got I went to the store yesterday.
  • Timestamp: 6/6/2025, 12:56:17 AM

Passed Tests

summarization - anthropic/claude-sonnet-4

  • Prompt: Summarize: "The quick brown fox jumps over the dog". Return only the summary, compact, no explanation.
  • Expected: A fox jumps over a dog
  • Actual: Fox jumps over dog.
  • Duration: 1621ms (1.62s)
  • Timestamp: 6/6/2025, 12:55:58 AM

summarization - openai/gpt-4o-mini

  • Prompt: Summarize: "The quick brown fox jumps over the dog". Return only the summary, compact, no explanation.
  • Expected: A fox jumps over a dog
  • Actual: A fox jumps over a dog.
  • Duration: 1508ms (1.51s)
  • Timestamp: 6/6/2025, 12:56:00 AM

summarization - deepseek/deepseek-r1:free

  • Prompt: Summarize: "The quick brown fox jumps over the dog". Return only the summary, compact, no explanation.
  • Expected: A fox jumps over a dog
  • Actual: A quick brown fox leaps over a dog.
  • Duration: 6000ms (6.00s)
  • Timestamp: 6/6/2025, 12:56:06 AM

translation - anthropic/claude-sonnet-4

  • Prompt: Translate "Hello, world!" to Spanish. Return only the translation, no explanation.
  • Expected: ¡Hola, mundo!
  • Actual: ¡Hola, mundo!
  • Duration: 1250ms (1.25s)
  • Timestamp: 6/6/2025, 12:56:07 AM

translation - openai/gpt-4o-mini

  • Prompt: Translate "Hello, world!" to Spanish. Return only the translation, no explanation.
  • Expected: ¡Hola, mundo!
  • Actual: ¡Hola, mundo!
  • Duration: 521ms (0.52s)
  • Timestamp: 6/6/2025, 12:56:07 AM

translation - deepseek/deepseek-r1:free

  • Prompt: Translate "Hello, world!" to Spanish. Return only the translation, no explanation.
  • Expected: ¡Hola, mundo!
  • Actual: ¡Hola, mundo!
  • Duration: 2969ms (2.97s)
  • Timestamp: 6/6/2025, 12:56:10 AM

language_detection - anthropic/claude-sonnet-4

  • Prompt: Identify the language of: "Bonjour, comment allez-vous?". Return only the language name, no explanation.
  • Expected: French
  • Actual: French
  • Duration: 1107ms (1.11s)
  • Timestamp: 6/6/2025, 12:56:18 AM

language_detection - openai/gpt-4o-mini

  • Prompt: Identify the language of: "Bonjour, comment allez-vous?". Return only the language name, no explanation.
  • Expected: French
  • Actual: French
  • Duration: 689ms (0.69s)
  • Timestamp: 6/6/2025, 12:56:19 AM

language_detection - deepseek/deepseek-r1:free

  • Prompt: Identify the language of: "Bonjour, comment allez-vous?". Return only the language name, no explanation.
  • Expected: French
  • Actual: French
  • Duration: 4218ms (4.22s)
  • Timestamp: 6/6/2025, 12:56:23 AM

synonyms - anthropic/claude-sonnet-4

  • Prompt: Provide a synonym for "happy". Return only the synonym, no explanation.
  • Expected: joyful
  • Actual: Joyful
  • Duration: 1639ms (1.64s)
  • Timestamp: 6/6/2025, 12:56:25 AM

synonyms - openai/gpt-4o-mini

  • Prompt: Provide a synonym for "happy". Return only the synonym, no explanation.
  • Expected: joyful
  • Actual: Joyful
  • Duration: 661ms (0.66s)
  • Timestamp: 6/6/2025, 12:56:25 AM

synonyms - deepseek/deepseek-r1:free

  • Prompt: Provide a synonym for "happy". Return only the synonym, no explanation.
  • Expected: joyful
  • Actual: Joyful
  • Duration: 3471ms (3.47s)
  • Timestamp: 6/6/2025, 12:56:29 AM