2.3 KiB
2.3 KiB
LLM Tools Test Results
Highscores
Performance Rankings (Duration)
| Test | Model | Duration (ms) | Duration (s) |
|---|---|---|---|
| equation_solving | openai/gpt-4o | 4181 | 4.18 |
| file_operations | openai/gpt-4o | 7243 | 7.24 |
| directory_listing | openai/gpt-4o | 2274 | 2.27 |
Summary
- Total Tests: 3
- Passed: 0
- Failed: 3
- Success Rate: 0.00%
- Average Duration: 4566ms (4.57s)
Failed Tests
equation_solving - openai/gpt-4o
- Prompt:
Read the file at C:\Users\zx\Desktop\polymech\polymech-mono\packages\kbot\tests\units\tools.test.md and solve all equations. Return the results in the specified JSON format. - Expected:
[{"equation":"2x + 5 = 13","result":"4"},{"equation":"3y - 7 = 20","result":"9"},{"equation":"4z + 8 = 32","result":"6"}] - Actual:
I cannot directly access the file as it's on a local system. You can provide its contents, and I'll assist you in solving the equations. - Duration: 4181ms (4.18s)
- Reason: Expected [{"equation":"2x + 5 = 13","result":"4"},{"equation":"3y - 7 = 20","result":"9"},{"equation":"4z + 8 = 32","result":"6"}], but got i cannot directly access the file as it's on a local system. you can provide its contents, and i'll assist you in solving the equations.
- Timestamp: 4/2/2025, 9:29:20 PM
file_operations - openai/gpt-4o
- Prompt:
Write the following data to C:\Users\zx\Desktop\polymech\polymech-mono\packages\kbot\tests\unit\test-data\test-data.json and then read it back: {"test":"data","timestamp":"2025-04-02T19:29:20.998Z"}. Return the read data in JSON format. - Expected:
{"test":"data","timestamp":"2025-04-02T19:29:20.998Z"} - Actual:
{"test":"data","timestamp":"2025-04-02T19:29:20.998Z"} - Duration: 7243ms (7.24s)
- Reason: Expected {"test":"data","timestamp":"2025-04-02T19:29:20.998Z"}, but got {"test":"data","timestamp":"2025-04-02t19:29:20.998z"}
- Timestamp: 4/2/2025, 9:29:28 PM
directory_listing - openai/gpt-4o
- Prompt:
List all files in the directory C:\Users\zx\Desktop\polymech\polymech-mono\packages\kbot\tests\unit\test-data. Return the list as a JSON array of filenames. - Expected:
[] - Actual:
{"files":[]} - Duration: 2274ms (2.27s)
- Reason: Expected [], but got {"files":[]}
- Timestamp: 4/2/2025, 9:29:30 PM
Passed Tests
No passed tests