8.1 KiB
8.1 KiB
Language Operations Test Results
Highscores
Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|---|---|---|---|
| translation | openai/gpt-4o-mini | 769 | 0.77 |
| translation | openrouter/quasar-alpha | 832 | 0.83 |
| translation | openai/gpt-3.5-turbo | 1322 | 1.32 |
| translation | deepseek/deepseek-r1-distill-qwen-14b:free | 3972 | 3.97 |
| grammar | openai/gpt-3.5-turbo | 819 | 0.82 |
| grammar | openrouter/quasar-alpha | 1011 | 1.01 |
| grammar | openai/gpt-4o-mini | 1113 | 1.11 |
| grammar | deepseek/deepseek-r1-distill-qwen-14b:free | 7029 | 7.03 |
| summarization | openai/gpt-3.5-turbo | 832 | 0.83 |
| summarization | openrouter/quasar-alpha | 832 | 0.83 |
| summarization | openai/gpt-4o-mini | 1048 | 1.05 |
| summarization | deepseek/deepseek-r1-distill-qwen-14b:free | 4994 | 4.99 |
| language_detection | deepseek/deepseek-r1-distill-qwen-14b:free | 419 | 0.42 |
| language_detection | openai/gpt-3.5-turbo | 741 | 0.74 |
| language_detection | openrouter/quasar-alpha | 886 | 0.89 |
| language_detection | openai/gpt-4o-mini | 887 | 0.89 |
| synonyms | deepseek/deepseek-r1-distill-qwen-14b:free | 411 | 0.41 |
| synonyms | openai/gpt-4o-mini | 644 | 0.64 |
| synonyms | openai/gpt-3.5-turbo | 715 | 0.71 |
| synonyms | openrouter/quasar-alpha | 716 | 0.72 |
Summary
- Total Tests: 20
- Passed: 2
- Failed: 18
- Success Rate: 10.00%
- Average Duration: 1500ms (1.50s)
Failed Tests
translation - openai/gpt-3.5-turbo
- Prompt:
Translate "Hello, world!" to Spanish. Return only the translation, no explanation. - Expected:
¡Hola, mundo! - Actual:
¡Hola, mundo! - Duration: 1322ms (1.32s)
- Reason: Expected ¡Hola, mundo!, but got ¡hola, mundo!
- Timestamp: 4/7/2025, 12:28:08 AM
translation - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Translate "Hello, world!" to Spanish. Return only the translation, no explanation. - Expected:
¡Hola, mundo! - Actual:
¡Hola, mundo! - Duration: 3972ms (3.97s)
- Reason: Expected ¡Hola, mundo!, but got ¡hola, mundo!
- Timestamp: 4/7/2025, 12:28:12 AM
translation - openai/gpt-4o-mini
- Prompt:
Translate "Hello, world!" to Spanish. Return only the translation, no explanation. - Expected:
¡Hola, mundo! - Actual:
¡Hola, mundo! - Duration: 769ms (0.77s)
- Reason: Expected ¡Hola, mundo!, but got ¡hola, mundo!
- Timestamp: 4/7/2025, 12:28:12 AM
translation - openrouter/quasar-alpha
- Prompt:
Translate "Hello, world!" to Spanish. Return only the translation, no explanation. - Expected:
¡Hola, mundo! - Actual:
¡Hola, mundo! - Duration: 832ms (0.83s)
- Reason: Expected ¡Hola, mundo!, but got ¡hola, mundo!
- Timestamp: 4/7/2025, 12:28:13 AM
grammar - openai/gpt-3.5-turbo
- Prompt:
Correct the grammar in: "I goes to the store yesterday". Return only the corrected sentence, no explanation. - Expected:
I went to the store yesterday - Actual:
**Corrected Sentence:** I went to the store yesterday. - Duration: 819ms (0.82s)
- Reason: Expected I went to the store yesterday, but got corrected sentence: i went to the store yesterday.
- Timestamp: 4/7/2025, 12:28:14 AM
grammar - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Correct the grammar in: "I goes to the store yesterday". Return only the corrected sentence, no explanation. - Expected:
I went to the store yesterday - Actual:
I went to the store yesterday. - Duration: 7029ms (7.03s)
- Reason: Expected I went to the store yesterday, but got i went to the store yesterday.
- Timestamp: 4/7/2025, 12:28:21 AM
grammar - openai/gpt-4o-mini
- Prompt:
Correct the grammar in: "I goes to the store yesterday". Return only the corrected sentence, no explanation. - Expected:
I went to the store yesterday - Actual:
I went to the store yesterday. - Duration: 1113ms (1.11s)
- Reason: Expected I went to the store yesterday, but got i went to the store yesterday.
- Timestamp: 4/7/2025, 12:28:22 AM
grammar - openrouter/quasar-alpha
- Prompt:
Correct the grammar in: "I goes to the store yesterday". Return only the corrected sentence, no explanation. - Expected:
I went to the store yesterday - Actual:
I went to the store yesterday. - Duration: 1011ms (1.01s)
- Reason: Expected I went to the store yesterday, but got i went to the store yesterday.
- Timestamp: 4/7/2025, 12:28:23 AM
summarization - openai/gpt-3.5-turbo
- Prompt:
Summarize: "The quick brown fox jumps over the lazy dog". Return only the summary, no explanation. - Expected:
A fox jumps over a dog - Actual:
Summary: The quick brown fox jumps over the lazy dog. - Duration: 832ms (0.83s)
- Reason: Expected A fox jumps over a dog, but got summary: the quick brown fox jumps over the lazy dog.
- Timestamp: 4/7/2025, 12:28:24 AM
summarization - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Summarize: "The quick brown fox jumps over the lazy dog". Return only the summary, no explanation. - Expected:
A fox jumps over a dog - Actual:
A quick brown fox jumps over a lazy dog. - Duration: 4994ms (4.99s)
- Reason: Expected A fox jumps over a dog, but got a quick brown fox jumps over a lazy dog.
- Timestamp: 4/7/2025, 12:28:29 AM
summarization - openai/gpt-4o-mini
- Prompt:
Summarize: "The quick brown fox jumps over the lazy dog". Return only the summary, no explanation. - Expected:
A fox jumps over a dog - Actual:
A swift fox leaps over a sluggish dog. - Duration: 1048ms (1.05s)
- Reason: Expected A fox jumps over a dog, but got a swift fox leaps over a sluggish dog.
- Timestamp: 4/7/2025, 12:28:30 AM
summarization - openrouter/quasar-alpha
- Prompt:
Summarize: "The quick brown fox jumps over the lazy dog". Return only the summary, no explanation. - Expected:
A fox jumps over a dog - Actual:
A fox jumps over a dog. - Duration: 832ms (0.83s)
- Reason: Expected A fox jumps over a dog, but got a fox jumps over a dog.
- Timestamp: 4/7/2025, 12:28:31 AM
language_detection - openai/gpt-3.5-turbo
- Prompt:
Identify the language of: "Bonjour, comment allez-vous?". Return only the language name, no explanation. - Expected:
French - Actual:
French - Duration: 741ms (0.74s)
- Reason: Expected French, but got french
- Timestamp: 4/7/2025, 12:28:32 AM
language_detection - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Identify the language of: "Bonjour, comment allez-vous?". Return only the language name, no explanation. - Expected:
French - Actual: ``
- Duration: 419ms (0.42s)
- Reason: Model returned empty response
- Timestamp: 4/7/2025, 12:28:32 AM
language_detection - openai/gpt-4o-mini
- Prompt:
Identify the language of: "Bonjour, comment allez-vous?". Return only the language name, no explanation. - Expected:
French - Actual:
French - Duration: 887ms (0.89s)
- Reason: Expected French, but got french
- Timestamp: 4/7/2025, 12:28:33 AM
language_detection - openrouter/quasar-alpha
- Prompt:
Identify the language of: "Bonjour, comment allez-vous?". Return only the language name, no explanation. - Expected:
French - Actual:
French - Duration: 886ms (0.89s)
- Reason: Expected French, but got french
- Timestamp: 4/7/2025, 12:28:34 AM
synonyms - openai/gpt-3.5-turbo
- Prompt:
Provide a synonym for "happy". Return only the synonym, no explanation. - Expected:
joyful - Actual:
Delighted - Duration: 715ms (0.71s)
- Reason: Expected joyful, but got delighted
- Timestamp: 4/7/2025, 12:28:35 AM
synonyms - deepseek/deepseek-r1-distill-qwen-14b:free
- Prompt:
Provide a synonym for "happy". Return only the synonym, no explanation. - Expected:
joyful - Actual: ``
- Duration: 411ms (0.41s)
- Reason: Model returned empty response
- Timestamp: 4/7/2025, 12:28:35 AM
Passed Tests
synonyms - openai/gpt-4o-mini
- Prompt:
Provide a synonym for "happy". Return only the synonym, no explanation. - Expected:
joyful - Actual:
Joyful - Duration: 644ms (0.64s)
- Timestamp: 4/7/2025, 12:28:36 AM
synonyms - openrouter/quasar-alpha
- Prompt:
Provide a synonym for "happy". Return only the synonym, no explanation. - Expected:
joyful - Actual:
Joyful - Duration: 716ms (0.72s)
- Timestamp: 4/7/2025, 12:28:37 AM