LLMs, large language models, are now becoming superseded by LMMs, large
multi-modal models. ChatGPT plus is currently rolling out access to GPT-4V
that is capable of vision. Users with access can upload images to be
interpreted. I have seen a few examples and I am very impressed. GPT-4V can
read images of signs in various languages and translate them into English.
Some have uploaded images of their fridge and tasked the AI with generating
recipes using the available ingredients. The sky's the limit.

I personally plan on developing a program to stream images of my computer
desktop to ChatGPT so it can autonomously control my entire computer while
I sit back and drink tea daydreaming of the post-labor world. I am also
designing glasses that can send whatever one is seeing and saying to
ChatGPT hands-free which I imagine will be tremendously useful. Meta is
releasing something similar but I prefer a DIY approach because I don’t
want to be trapped in Meta’s ecosystem.

Also, Dalle 3 the image generator has been released and is available for
free via Bing Chat. It can handle text in images reasonably well and
handles complex prompts much better than any other model. The results are
very impressive, by far the best image generator I’ve experienced so far. I
attached an image I generated. I am blown away by the quality.

