So we have multiple modules like selenium etc that automate the computer / web pages. Now, self operating computer script that was just released tries to explain to gpt4 what it is looking at with gpt 4 vision and allows it to use the computer according to a prompt.
It seems to me that there should be a module that does the same. Explains to llm what it can do with prompts using mouse / keyboard. Customized to different os’s.
So then all we would have to do is load the module and be able to do something similar for our own special needs.
Is anyone working on something like this?
This post is an automated archive from a submission made on /r/python, powered by Fediverser software running on fediverser.communick.dev. Responses to this submission will not be seen by the original author until they claim ownership of their fediverser.communick.dev account. Please consider reaching out to them let them know about this post and help them migrate to Lemmy.
Lemmy users: you are still very much encouraged to participate in the discussion. There are still many other subscribers on !python@fediverser.communick.dev that can benefit from your contribution and join in the conversation.
Reddit users: you can also join the fediverse right away by getting by visiting https://fediverser-portal.communick.dev/. If you are looking for a Reddit alternative made for and by an independent community, check out Fediverser.