5.4 KiB
5.4 KiB
Language Operations Test Results
Highscores
Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|---|---|---|---|
| summarization | openai/gpt-4o-mini | 1508 | 1.51 |
| summarization | anthropic/claude-sonnet-4 | 1621 | 1.62 |
| summarization | deepseek/deepseek-r1:free | 6000 | 6.00 |
| translation | openai/gpt-4o-mini | 521 | 0.52 |
| translation | anthropic/claude-sonnet-4 | 1250 | 1.25 |
| translation | deepseek/deepseek-r1:free | 2969 | 2.97 |
| grammar | openai/gpt-4o-mini | 623 | 0.62 |
| grammar | anthropic/claude-sonnet-4 | 1246 | 1.25 |
| grammar | deepseek/deepseek-r1:free | 4594 | 4.59 |
| language_detection | openai/gpt-4o-mini | 689 | 0.69 |
| language_detection | anthropic/claude-sonnet-4 | 1107 | 1.11 |
| language_detection | deepseek/deepseek-r1:free | 4218 | 4.22 |
| synonyms | openai/gpt-4o-mini | 661 | 0.66 |
| synonyms | anthropic/claude-sonnet-4 | 1639 | 1.64 |
| synonyms | deepseek/deepseek-r1:free | 3471 | 3.47 |
Summary
- Total Tests: 15
- Passed: 12
- Failed: 3
- Success Rate: 80.00%
- Average Duration: 2141ms (2.14s)
Failed Tests
grammar - anthropic/claude-sonnet-4
- Prompt:
Correct the grammar in: "I goes to the store yesterday". Return only the corrected sentence, no explanation. - Expected:
I went to the store yesterday - Actual:
I went to the store yesterday. - Duration: 1246ms (1.25s)
- Reason: Expected I went to the store yesterday, but got I went to the store yesterday.
- Timestamp: 6/6/2025, 12:56:12 AM
grammar - openai/gpt-4o-mini
- Prompt:
Correct the grammar in: "I goes to the store yesterday". Return only the corrected sentence, no explanation. - Expected:
I went to the store yesterday - Actual:
"I went to the store yesterday." - Duration: 623ms (0.62s)
- Reason: Expected I went to the store yesterday, but got "I went to the store yesterday."
- Timestamp: 6/6/2025, 12:56:12 AM
grammar - deepseek/deepseek-r1:free
- Prompt:
Correct the grammar in: "I goes to the store yesterday". Return only the corrected sentence, no explanation. - Expected:
I went to the store yesterday - Actual:
I went to the store yesterday. - Duration: 4594ms (4.59s)
- Reason: Expected I went to the store yesterday, but got I went to the store yesterday.
- Timestamp: 6/6/2025, 12:56:17 AM
Passed Tests
summarization - anthropic/claude-sonnet-4
- Prompt:
Summarize: "The quick brown fox jumps over the dog". Return only the summary, compact, no explanation. - Expected:
A fox jumps over a dog - Actual:
Fox jumps over dog. - Duration: 1621ms (1.62s)
- Timestamp: 6/6/2025, 12:55:58 AM
summarization - openai/gpt-4o-mini
- Prompt:
Summarize: "The quick brown fox jumps over the dog". Return only the summary, compact, no explanation. - Expected:
A fox jumps over a dog - Actual:
A fox jumps over a dog. - Duration: 1508ms (1.51s)
- Timestamp: 6/6/2025, 12:56:00 AM
summarization - deepseek/deepseek-r1:free
- Prompt:
Summarize: "The quick brown fox jumps over the dog". Return only the summary, compact, no explanation. - Expected:
A fox jumps over a dog - Actual:
A quick brown fox leaps over a dog. - Duration: 6000ms (6.00s)
- Timestamp: 6/6/2025, 12:56:06 AM
translation - anthropic/claude-sonnet-4
- Prompt:
Translate "Hello, world!" to Spanish. Return only the translation, no explanation. - Expected:
¡Hola, mundo! - Actual:
¡Hola, mundo! - Duration: 1250ms (1.25s)
- Timestamp: 6/6/2025, 12:56:07 AM
translation - openai/gpt-4o-mini
- Prompt:
Translate "Hello, world!" to Spanish. Return only the translation, no explanation. - Expected:
¡Hola, mundo! - Actual:
¡Hola, mundo! - Duration: 521ms (0.52s)
- Timestamp: 6/6/2025, 12:56:07 AM
translation - deepseek/deepseek-r1:free
- Prompt:
Translate "Hello, world!" to Spanish. Return only the translation, no explanation. - Expected:
¡Hola, mundo! - Actual:
¡Hola, mundo! - Duration: 2969ms (2.97s)
- Timestamp: 6/6/2025, 12:56:10 AM
language_detection - anthropic/claude-sonnet-4
- Prompt:
Identify the language of: "Bonjour, comment allez-vous?". Return only the language name, no explanation. - Expected:
French - Actual:
French - Duration: 1107ms (1.11s)
- Timestamp: 6/6/2025, 12:56:18 AM
language_detection - openai/gpt-4o-mini
- Prompt:
Identify the language of: "Bonjour, comment allez-vous?". Return only the language name, no explanation. - Expected:
French - Actual:
French - Duration: 689ms (0.69s)
- Timestamp: 6/6/2025, 12:56:19 AM
language_detection - deepseek/deepseek-r1:free
- Prompt:
Identify the language of: "Bonjour, comment allez-vous?". Return only the language name, no explanation. - Expected:
French - Actual:
French - Duration: 4218ms (4.22s)
- Timestamp: 6/6/2025, 12:56:23 AM
synonyms - anthropic/claude-sonnet-4
- Prompt:
Provide a synonym for "happy". Return only the synonym, no explanation. - Expected:
joyful - Actual:
Joyful - Duration: 1639ms (1.64s)
- Timestamp: 6/6/2025, 12:56:25 AM
synonyms - openai/gpt-4o-mini
- Prompt:
Provide a synonym for "happy". Return only the synonym, no explanation. - Expected:
joyful - Actual:
Joyful - Duration: 661ms (0.66s)
- Timestamp: 6/6/2025, 12:56:25 AM
synonyms - deepseek/deepseek-r1:free
- Prompt:
Provide a synonym for "happy". Return only the synonym, no explanation. - Expected:
joyful - Actual:
Joyful - Duration: 3471ms (3.47s)
- Timestamp: 6/6/2025, 12:56:29 AM