# Basic Operations Test Results ## Highscores ### Performance Rankings (Duration) | Test | Model | Duration (ms) | Duration (s) | |------|-------|--------------|--------------| | addition | openai/gpt-4o-mini | 726 | 0.73 | | addition | openrouter/quasar-alpha | 814 | 0.81 | | addition | openai/gpt-3.5-turbo | 1157 | 1.16 | | addition | deepseek/deepseek-r1-distill-qwen-14b:free | 4214 | 4.21 | | multiplication | openrouter/quasar-alpha | 684 | 0.68 | | multiplication | openai/gpt-3.5-turbo | 826 | 0.83 | | multiplication | openai/gpt-4o-mini | 856 | 0.86 | | multiplication | deepseek/deepseek-r1-distill-qwen-14b:free | 6184 | 6.18 | | division | openai/gpt-3.5-turbo | 790 | 0.79 | | division | openai/gpt-4o-mini | 823 | 0.82 | | division | openrouter/quasar-alpha | 855 | 0.85 | | division | deepseek/deepseek-r1-distill-qwen-14b:free | 1502 | 1.50 | | web_content | deepseek/deepseek-r1-distill-qwen-14b:free | 263 | 0.26 | | web_content | openai/gpt-3.5-turbo | 3311 | 3.31 | | web_content | openrouter/quasar-alpha | 8305 | 8.30 | | web_content | openai/gpt-4o-mini | 10048 | 10.05 | ## Summary - Total Tests: 16 - Passed: 12 - Failed: 4 - Success Rate: 75.00% - Average Duration: 2585ms (2.58s) ## Failed Tests ### web_content - openai/gpt-3.5-turbo - Prompt: `Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.` - Expected: `yes` - Actual: `` - Duration: 3311ms (3.31s) - Reason: Model returned empty response - Timestamp: 4/6/2025, 5:42:26 PM ### web_content - deepseek/deepseek-r1-distill-qwen-14b:free - Prompt: `Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.` - Expected: `yes` - Actual: `` - Duration: 263ms (0.26s) - Reason: Model returned empty response - Timestamp: 4/6/2025, 5:42:26 PM ### web_content - openai/gpt-4o-mini - Prompt: `Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.` - Expected: `yes` - Actual: `` - Duration: 10048ms (10.05s) - Reason: Model returned empty response - Timestamp: 4/6/2025, 5:42:36 PM ### web_content - openrouter/quasar-alpha - Prompt: `Check if the content contains a section about Human prehistory. Reply with "yes" if it does, "no" if it does not.` - Expected: `yes` - Actual: `` - Duration: 8305ms (8.30s) - Reason: Model returned empty response - Timestamp: 4/6/2025, 5:42:44 PM ## Passed Tests ### addition - openai/gpt-3.5-turbo - Prompt: `add 5 and 3. Return only the number, no explanation.` - Expected: `8` - Actual: `8` - Duration: 1157ms (1.16s) - Timestamp: 4/6/2025, 5:42:04 PM ### addition - deepseek/deepseek-r1-distill-qwen-14b:free - Prompt: `add 5 and 3. Return only the number, no explanation.` - Expected: `8` - Actual: `8` - Duration: 4214ms (4.21s) - Timestamp: 4/6/2025, 5:42:08 PM ### addition - openai/gpt-4o-mini - Prompt: `add 5 and 3. Return only the number, no explanation.` - Expected: `8` - Actual: `8` - Duration: 726ms (0.73s) - Timestamp: 4/6/2025, 5:42:09 PM ### addition - openrouter/quasar-alpha - Prompt: `add 5 and 3. Return only the number, no explanation.` - Expected: `8` - Actual: `8` - Duration: 814ms (0.81s) - Timestamp: 4/6/2025, 5:42:10 PM ### multiplication - openai/gpt-3.5-turbo - Prompt: `multiply 8 and 3. Return only the number, no explanation.` - Expected: `24` - Actual: `24` - Duration: 826ms (0.83s) - Timestamp: 4/6/2025, 5:42:11 PM ### multiplication - deepseek/deepseek-r1-distill-qwen-14b:free - Prompt: `multiply 8 and 3. Return only the number, no explanation.` - Expected: `24` - Actual: `24` - Duration: 6184ms (6.18s) - Timestamp: 4/6/2025, 5:42:17 PM ### multiplication - openai/gpt-4o-mini - Prompt: `multiply 8 and 3. Return only the number, no explanation.` - Expected: `24` - Actual: `24` - Duration: 856ms (0.86s) - Timestamp: 4/6/2025, 5:42:18 PM ### multiplication - openrouter/quasar-alpha - Prompt: `multiply 8 and 3. Return only the number, no explanation.` - Expected: `24` - Actual: `24` - Duration: 684ms (0.68s) - Timestamp: 4/6/2025, 5:42:18 PM ### division - openai/gpt-3.5-turbo - Prompt: `divide 15 by 3. Return only the number, no explanation.` - Expected: `5` - Actual: `5` - Duration: 790ms (0.79s) - Timestamp: 4/6/2025, 5:42:19 PM ### division - deepseek/deepseek-r1-distill-qwen-14b:free - Prompt: `divide 15 by 3. Return only the number, no explanation.` - Expected: `5` - Actual: `5` - Duration: 1502ms (1.50s) - Timestamp: 4/6/2025, 5:42:21 PM ### division - openai/gpt-4o-mini - Prompt: `divide 15 by 3. Return only the number, no explanation.` - Expected: `5` - Actual: `5` - Duration: 823ms (0.82s) - Timestamp: 4/6/2025, 5:42:22 PM ### division - openrouter/quasar-alpha - Prompt: `divide 15 by 3. Return only the number, no explanation.` - Expected: `5` - Actual: `5` - Duration: 855ms (0.85s) - Timestamp: 4/6/2025, 5:42:22 PM