mono/packages/kbot/tests/unit/reports/tools.md
2025-04-02 21:29:46 +02:00

2.3 KiB

LLM Tools Test Results

Highscores

Performance Rankings (Duration)

Test Model Duration (ms) Duration (s)
equation_solving openai/gpt-4o 4181 4.18
file_operations openai/gpt-4o 7243 7.24
directory_listing openai/gpt-4o 2274 2.27

Summary

  • Total Tests: 3
  • Passed: 0
  • Failed: 3
  • Success Rate: 0.00%
  • Average Duration: 4566ms (4.57s)

Failed Tests

equation_solving - openai/gpt-4o

  • Prompt: Read the file at C:\Users\zx\Desktop\polymech\polymech-mono\packages\kbot\tests\units\tools.test.md and solve all equations. Return the results in the specified JSON format.
  • Expected: [{"equation":"2x + 5 = 13","result":"4"},{"equation":"3y - 7 = 20","result":"9"},{"equation":"4z + 8 = 32","result":"6"}]
  • Actual: I cannot directly access the file as it's on a local system. You can provide its contents, and I'll assist you in solving the equations.
  • Duration: 4181ms (4.18s)
  • Reason: Expected [{"equation":"2x + 5 = 13","result":"4"},{"equation":"3y - 7 = 20","result":"9"},{"equation":"4z + 8 = 32","result":"6"}], but got i cannot directly access the file as it's on a local system. you can provide its contents, and i'll assist you in solving the equations.
  • Timestamp: 4/2/2025, 9:29:20 PM

file_operations - openai/gpt-4o

  • Prompt: Write the following data to C:\Users\zx\Desktop\polymech\polymech-mono\packages\kbot\tests\unit\test-data\test-data.json and then read it back: {"test":"data","timestamp":"2025-04-02T19:29:20.998Z"}. Return the read data in JSON format.
  • Expected: {"test":"data","timestamp":"2025-04-02T19:29:20.998Z"}
  • Actual: {"test":"data","timestamp":"2025-04-02T19:29:20.998Z"}
  • Duration: 7243ms (7.24s)
  • Reason: Expected {"test":"data","timestamp":"2025-04-02T19:29:20.998Z"}, but got {"test":"data","timestamp":"2025-04-02t19:29:20.998z"}
  • Timestamp: 4/2/2025, 9:29:28 PM

directory_listing - openai/gpt-4o

  • Prompt: List all files in the directory C:\Users\zx\Desktop\polymech\polymech-mono\packages\kbot\tests\unit\test-data. Return the list as a JSON array of filenames.
  • Expected: []
  • Actual: {"files":[]}
  • Duration: 2274ms (2.27s)
  • Reason: Expected [], but got {"files":[]}
  • Timestamp: 4/2/2025, 9:29:30 PM

Passed Tests

No passed tests