It’s easy to doom and gloom about big businesses building their big data scraping AI to consolidate even more control and wealth, given that the odds are unfortunately in their favor.
However, AI presents yet another technological turning point for the public in a similar way to mass mechanical automation, and so, how might one imagine communities building their own to help themselves? How might communities use these tools to retain and make more use of their own accumulated data rather than continue to give it up to big businesses?
By using a completely different tool. LLMs are fine for sentence structure, but they aren’t intelligent. There is no capacity to distinguish fact from fiction, or to form an underlying model of reality to draw answers from.
I tend to agree, however LLMs aren’t the entirety of the AI field at the moment, despite them receiving a large amount of attention. This question is open to all forms of AI under development.
Excellent point. Everyone say it with me:
LLMs are not AI.
I’m curious what makes it not AI? It definitely seems like AI, it’s able to learn and create new sentences.
It is absolutely ai. The experience of talking with chatgpt is so human like that it just blows my mind. What I’ve learned so far is that human brains aren’t nearly as magical as they seem.
I don’t think it is possible, yet.
AI is still at the big money, big technical investment stage. It will be a decade or more before what you are talking about will be possible.
Aren’t there already a few free and open source tools available though? That’s a part of what inspired this question tbh.
The codebases are free, but the training sets are not. To have intelligence like you see in GPT-4 you need a lot of training data that is expensive to put together.
Honestly if I, the underdog, want to utilize AI for my goals by best bet is to pay $20/mo for the AI from OpenAI.
I thought the main obstacle was the computing power to update 175 billion neurons with large datasets. You probably could generate a good llm just using Wikipedia, but I think it requires a room full of expensive video cards to do.
a lot of training data that is expensive to put together.
Isn’t training data simply data? If a community were to agree to pool their data together to enable the AI, wouldn’t that bypass the cost issue? Or is this one of those situations where the amount of data required thoroughly demonstrates how much businesses have arguably stolen from the public, and in turn no community may produce sufficient data to enable their AI tools to the same degree?