# Math Operations Test Results ## Highscores ### Performance Rankings (Duration) | Test | Model | Duration (ms) | Duration (s) | |------|-------|--------------|--------------| | quadratic | openai/gpt-4o-mini | 514 | 0.51 | | quadratic | anthropic/claude-sonnet-4 | 1120 | 1.12 | | factorial | openai/gpt-4o-mini | 512 | 0.51 | | factorial | anthropic/claude-sonnet-4 | 877 | 0.88 | | fibonacci | openai/gpt-4o-mini | 494 | 0.49 | | fibonacci | anthropic/claude-sonnet-4 | 4093 | 4.09 | | square_root | openai/gpt-4o-mini | 483 | 0.48 | | square_root | anthropic/claude-sonnet-4 | 969 | 0.97 | | power | anthropic/claude-sonnet-4 | 1129 | 1.13 | | power | openai/gpt-4o-mini | 1308 | 1.31 | ## Summary - Total Tests: 15 - Passed: 12 - Failed: 3 - Success Rate: 80.00% - Average Duration: 1189ms (1.19s) ## Failed Tests ### quadratic - anthropic/claude-sonnet-4 - Prompt: `Solve the quadratic equation x² + 5x + 6 = 0. Respond ONLY with the solutions as comma-separated numbers (e.g., -3,-2). No other text.` - Expected: `-3,-2` - Actual: `-2,-3` - Duration: 1120ms (1.12s) - Reason: Expected -3,-2, but got -2,-3 - Timestamp: 3/19/2026, 4:39:00 PM ### quadratic - openai/gpt-4o-mini - Prompt: `Solve the quadratic equation x² + 5x + 6 = 0. Respond ONLY with the solutions as comma-separated numbers (e.g., -3,-2). No other text.` - Expected: `-3,-2` - Actual: `-2,-3` - Duration: 514ms (0.51s) - Reason: Expected -3,-2, but got -2,-3 - Timestamp: 3/19/2026, 4:38:59 PM ## Passed Tests ### factorial - anthropic/claude-sonnet-4 - Prompt: `Calculate 5! (factorial of 5). Respond ONLY with the final numerical answer. No explanation, no other text.` - Expected: `120` - Actual: `120` - Duration: 877ms (0.88s) - Timestamp: 3/19/2026, 4:39:03 PM ### factorial - openai/gpt-4o-mini - Prompt: `Calculate 5! (factorial of 5). Respond ONLY with the final numerical answer. No explanation, no other text.` - Expected: `120` - Actual: `120` - Duration: 512ms (0.51s) - Timestamp: 3/19/2026, 4:39:02 PM ### fibonacci - anthropic/claude-sonnet-4 - Prompt: `Calculate the 6th number in the Fibonacci sequence (assuming F(1)=1, F(2)=1). Respond ONLY with the final numerical answer. No other text.` - Expected: `8` - Actual: `8` - Duration: 4093ms (4.09s) - Timestamp: 3/19/2026, 4:39:08 PM ### fibonacci - openai/gpt-4o-mini - Prompt: `Calculate the 6th number in the Fibonacci sequence (assuming F(1)=1, F(2)=1). Respond ONLY with the final numerical answer. No other text.` - Expected: `8` - Actual: `8` - Duration: 494ms (0.49s) - Timestamp: 3/19/2026, 4:39:04 PM ### square_root - anthropic/claude-sonnet-4 - Prompt: `Calculate the square root of 16. Respond ONLY with the final numerical answer. No other text.` - Expected: `4` - Actual: `4` - Duration: 969ms (0.97s) - Timestamp: 3/19/2026, 4:39:11 PM ### square_root - openai/gpt-4o-mini - Prompt: `Calculate the square root of 16. Respond ONLY with the final numerical answer. No other text.` - Expected: `4` - Actual: `4` - Duration: 483ms (0.48s) - Timestamp: 3/19/2026, 4:39:10 PM ### power - anthropic/claude-sonnet-4 - Prompt: `Calculate 2 raised to the power of 3. Respond ONLY with the final numerical answer. No other text.` - Expected: `8` - Actual: `8` - Duration: 1129ms (1.13s) - Timestamp: 3/19/2026, 4:39:15 PM ### power - openai/gpt-4o-mini - Prompt: `Calculate 2 raised to the power of 3. Respond ONLY with the final numerical answer. No other text.` - Expected: `8` - Actual: `8` - Duration: 1308ms (1.31s) - Timestamp: 3/19/2026, 4:39:14 PM