A framework to enable multimodal models to operate a computer.
COMMITS
/ evaluate.py December 19, 2024
J
remove `max_tokens`
Josh Bickett committed
June 11, 2024
J
swap out for `gpt-4o`
Josh Bickett committed
February 15, 2024
M
Pass model to `operate`
Michael Hogue committed
M
Add `-m` argument to evaluate.py
Michael Hogue committed
January 16, 2024
M
Update test result message format
Michael Hogue committed
M
Update error message
Michael Hogue committed
M
Check for last screenshot instead of summary screenshot
Michael Hogue committed
December 9, 2023
M
Add comment to TEST_CASES
Michael Hogue committed
M
Change default test cases
Michael Hogue committed
M
Add evaluation justification
Michael Hogue committed
M
Add summary message
Michael Hogue committed
M
Change test cases
Michael Hogue committed
M
Use gpt-4v to evalue summary screenshot
Michael Hogue committed
M
Rename to `evaluate`
Michael Hogue committed