Why do AI image generators have a stroke when they try to generate text?

raptir@lemdro.id · 11 months ago

Why do AI image generators have a stroke when they try to generate text?

hoshikarakitaridia@sh.itjust.works · 11 months ago

The best answer will require a very technical understanding, but I’ll give it a try and stay abstract.

The AI is trained using images. If you type in things like “a tree” it has a vague idea of what it looks like.

The thing is writing letters is a hard concept. How should the AI know text is made up of letters? Connected lines make a letter and unconnected ones don’t. Sentences are connected using dots.

Easy enough for us, you have to imagine an AI is best with what it can directly observe. But knowing when to literally write out letters is hard. So it has a stroke. It has a vague notion of “this is where text is supposed to go” but making the letters look right in an adjusted font, remembering where letters end and how words are spaced; all of this is far too complex.

Now I haven’t looked into it for AIs who CAN generate text more well, but I assume the only they do this is by deciding “there’s gonna be text” and then using another process to insert the text basically after the fact. Or maybe there’s some special process change in the training or inference of the image going on? Idk, for this one I need an expert.