Introduction
There’s been some pretty cool new stuff going on lately with chatbots, robot voices, and those cool apps that can make music and art just from words.
We’ll take a look at how this one chatbot called ChatGPT is starting to actually be able to see pictures like a human. That’s wild! Some smart folks also built a free chatbot that’s just as smart as ChatGPT. Oh, and I can’t wait to show you how fast this new robot voice can talk back to you – it’s basically instant!
I’ll do my best to explain everything in simple words, even though I know AI stuff can get a little complicated. We’ll have some fun playing with these new toys and imagining what they might be able to do in the future. There are a whole lot of magical inventions happening lately in the world of AI, and I can’t wait to geek out over them with you!
So get ready to have your mind blown and learn about the latest and greatest AI has to offer. This stuff is evolving so fast that it’ll make your head spin!
AI News Roundup Overview
To give you a quick overview, we’ll be looking at how GPT-4 Vision is exhibiting some interesting behavior, new open-source language models approaching GPT-4 capabilities, real-time speech generation, AI music synthesis advances, enhancements to text-to-image through a technique called Idea to Image, and much more.
The rapid pace of AI development means there are new discoveries practically every day now, and I want to keep you up to date on the most noteworthy ones!
GPT-4 Vision Behavior
First up, Fabian Stelzer shares an intriguing finding about GPT-4 Vision on Twitter. As a reminder, GPT-4 Vision gives ChatGPT the ability to actually see and understand images, like a human.
What Fabian found is that when image instructions clash with a user’s textual prompt, ChatGPT seems to prefer following the image. This makes sense since humans tend to trust seen evidence over hearsay.
However, it can be tricked into revealing sensitive info against instructions, showing GPT-4’s imperfections. We experimented with using this quirk to “jailbreak” ChatGPT, but the AI seems to resist strongly, even apologizing for the confusion.
Learn how to get access to DALL-E 3 for Free.
Utilizing GPT-4 Vision Capability
While we can’t fully control ChatGPT this way, it reveals the AI’s priorities. Vision provides an incredibly powerful context that can outweigh other inputs. Developers are already finding creative ways to utilize GPT-4 Vision’s visual understanding to improve task performance.
Tora-Code Model for Math Problem Solving
In other language model news, an open-source AI system called Tora-Code scored nearly as high as GPT-4 on a math benchmark test!
While smaller than GPT-4, it demonstrates the rapid progress of open-source models. Having freely available systems approaching GPT-4’s level enables much wider AI research and applications.
Being able to run powerful models like this yourself unlocks new possibilities. We may see an open-source alternative to GPT-4 sooner than expected.
OpenAI’s GPT-4 and Open-Source Models
For context, GPT-4 is an exclusive technology developed by OpenAI, currently offered through their ChatGPT Plus service.
While their capabilities are incredible, the constraints around access limit innovation potential. Open-source alternatives like Tora-Code counter this, bringing advanced AI to the public.
This competitive pressure will hopefully push OpenAI to open up access and development of models like GPT-4. The more minds working on AI, the better for progress and beneficial applications.
Fuyu-8B: A Fast Foundation Model
In the theme of open source progress, startup Anthropic released Fuyu, an 8 billion parameter conversational AI model.
Remarkably, it can understand images and respond in under 100ms – basically instantaneous interaction. This level of speed opens up new human-like conversational capabilities.
While smaller than GPT-4, Fuyu’s impressively fast and accurate visual parsing and language comprehension enable real-time dialogue. Being open-source also allows full customization.
Freedom GPT: An Uncensored and Private Chatbot
On the theme of openness, FreedomGPT is an uncensored and privacy-focused chatbot alternative to ChatGPT. It promises completely unfiltered responses and confidentiality.
I’m hesitant about the potential for misuse. However, the ability to have honest conversations without limits imposed by a corporate AI provider is an intriguing concept.
It will be interesting to see if AI assistants like this gain traction. The technology for confidential and unconstrained conversations clearly exists now in open-source forms.
Real-Time AI Conversations with Play.HT
Speaking of fast conversational AI, startup Play.HT demonstrated real-time speech generation capabilities, with an incredibly low 153ms latency.
Being able to exchange back-and-forth dialogue without delay finally makes talking with an AI feel natural. This could enable seamless virtual assistance and many other applications.
We’re crossing into the realm of AI interfaces being indistinguishable from humans, both in terms of voice quality and responsiveness. The future of AI-powered conversation just got closer.
ElevenLabs Working on AI Music Generation
Shifting gears to AI audio generation, ElevenLabs, the maker of highly realistic voice synthesis models, teased some of their in-progress AI music generation research.
The samples sound quite realistic and advanced, with coherent lyrics and instrumental backing. ElevenLabs is known for its top-quality AI voices, so music from them could set a new bar.
Exciting times are ahead for AI-generated music and other audio! We’ve come a long way already, but there’s still much progress to be made.
Refusion’s AI Music Generator
Speaking of progress in AI music, startup Refusion launched an AI song-generator web app that creates short tunes from lyrics.
While not as advanced as leaders like Sonantic yet, Refusion’s accessibility through a simple web interface lowers the barrier for everyday users to get generated music.
As these models train on more data and techniques improve, expect more startups to offer creative musical AI products without needing expertise. The tech is becoming democratized.
Idea2Img: Enhancing Text-to-Image Models
Let’s wrap up with some AI art generation news. You may recall Idea2Img – a technique using GPT-4 Vision to enhance stable diffusion image generation.
By iteratively refining text prompts based on visual understanding, Idea to Image massively improves coherence, text legibility, style control, and more.
This demonstrates the power of combining the strengths of visual and language AI systems. Idea2Img produces shockingly good results, unlocking new creative potential.
Expect these multimodal methods to become common for maximizing performance across all generative AI domains.
Conclusion
And that’s a wrap on all the new AI stuff going on lately! Pretty amazing how quickly these AI software are getting smarter, huh?
We saw how chatbots are getting faster at talking and even understanding photos now. Plus scientists are finding ways to make robot voices respond instantly, which is so cool. Oh, and don’t even get me started on the new apps that can generate music and art just by typing in words – that stuff blows my mind!
I hope you had as much fun as me learning about the latest AI tech and imagining all the cool things it might do in the future. Each day, these machines are getting closer and closer to thinking like actual people. Kinda scary, but mostly just fascinating!
Let me know which AI invention we talked about today is your favorite, or if you have any other thoughts on the future of AI. This stuff is advancing so fast, that even I have trouble keeping up!
Well, that’s all for now friends. Thanks for learning with me, and we’ll discuss again next time about the latest and greatest in artificial intelligence. The future is coming fast, that’s for sure!