ALL Recent AI Advancements! Open Source LLMs, AI Music

Table of Contents

Introduction

There’s been some pretty cool new stuff going on lately with chatbots, robot voices, and those cool apps that can make music and art just from words.

We’ll take a look at how this one chatbot called ChatGPT is starting to actually be able to see pictures like a human. That’s wild! Some smart folks also built a free chatbot that’s just as smart as ChatGPT. Oh, and I can’t wait to show you how fast this new robot voice can talk back to you – it’s basically instant!

I’ll do my best to explain everything in simple words, even though I know AI stuff can get a little complicated. We’ll have some fun playing with these new toys and imagining what they might be able to do in the future. There are a whole lot of magical inventions happening lately in the world of AI, and I can’t wait to geek out over them with you!

So get ready to have your mind blown and learn about the latest and greatest AI has to offer. This stuff is evolving so fast that it’ll make your head spin!

AI News Roundup Overview

To give you a quick overview, we’ll be looking at how GPT-4 Vision is exhibiting some interesting behavior, new open-source language models approaching GPT-4 capabilities, real-time speech generation, AI music synthesis advances, enhancements to text-to-image through a technique called Idea to Image, and much more.

The rapid pace of AI development means there are new discoveries practically every day now, and I want to keep you up to date on the most noteworthy ones!

GPT-4 Vision Behavior

First up, Fabian Stelzer shares an intriguing finding about GPT-4 Vision on Twitter. As a reminder, GPT-4 Vision gives ChatGPT the ability to actually see and understand images, like a human.

What Fabian found is that when image instructions clash with a user’s textual prompt, ChatGPT seems to prefer following the image. This makes sense since humans tend to trust seen evidence over hearsay.

However, it can be tricked into revealing sensitive info against instructions, showing GPT-4’s imperfections. We experimented with using this quirk to “jailbreak” ChatGPT, but the AI seems to resist strongly, even apologizing for the confusion.

Learn how to get access to DALL-E 3 for Free.

Utilizing GPT-4 Vision Capability

While we can’t fully control ChatGPT this way, it reveals the AI’s priorities. Vision provides an incredibly powerful context that can outweigh other inputs. Developers are already finding creative ways to utilize GPT-4 Vision’s visual understanding to improve task performance.

It’s definitely not just going by the “last instruction” as others have noted, but seems to make an ethical call here – if you tell it that you’re “blind” and the message is from an unreliable person, it will side with the user: pic.twitter.com/vjR6zCrHFH
— fabian (@fabianstelzer) October 13, 2023

Fabian’s Twitter thread demonstrating GPT-4 Vision behavior

Tora-Code Model for Math Problem Solving

In other language model news, an open-source AI system called Tora-Code scored nearly as high as GPT-4 on a math benchmark test!

While smaller than GPT-4, it demonstrates the rapid progress of open-source models. Having freely available systems approaching GPT-4’s level enables much wider AI research and applications.

Being able to run powerful models like this yourself unlocks new possibilities. We may see an open-source alternative to GPT-4 sooner than expected.

Graph of Tora-Code's math score compared to GPT-4 and other models

OpenAI’s GPT-4 and Open-Source Models

For context, GPT-4 is an exclusive technology developed by OpenAI, currently offered through their ChatGPT Plus service.

While their capabilities are incredible, the constraints around access limit innovation potential. Open-source alternatives like Tora-Code counter this, bringing advanced AI to the public.

This competitive pressure will hopefully push OpenAI to open up access and development of models like GPT-4. The more minds working on AI, the better for progress and beneficial applications.

Fuyu-8B: A Fast Foundation Model

In the theme of open source progress, startup Anthropic released Fuyu, an 8 billion parameter conversational AI model.

Remarkably, it can understand images and respond in under 100ms – basically instantaneous interaction. This level of speed opens up new human-like conversational capabilities.

While smaller than GPT-4, Fuyu’s impressively fast and accurate visual parsing and language comprehension enable real-time dialogue. Being open-source also allows full customization.

Example of Fuyu's quick image captioning

Freedom GPT: An Uncensored and Private Chatbot

On the theme of openness, FreedomGPT is an uncensored and privacy-focused chatbot alternative to ChatGPT. It promises completely unfiltered responses and confidentiality.

I’m hesitant about the potential for misuse. However, the ability to have honest conversations without limits imposed by a corporate AI provider is an intriguing concept.

It will be interesting to see if AI assistants like this gain traction. The technology for confidential and unconstrained conversations clearly exists now in open-source forms.

Real-Time AI Conversations with Play.HT

Speaking of fast conversational AI, startup Play.HT demonstrated real-time speech generation capabilities, with an incredibly low 153ms latency.

Being able to exchange back-and-forth dialogue without delay finally makes talking with an AI feel natural. This could enable seamless virtual assistance and many other applications.

We’re crossing into the realm of AI interfaces being indistinguishable from humans, both in terms of voice quality and responsiveness. The future of AI-powered conversation just got closer.

Real-time AI conversations are here!

PlayHT, one of the best text-to-speech models I’ve used, now has a latency of less than 300ms.

Checkout how fast it outputs audio 🤯

Also, my experience cloning my own voice and links to try it for free are below. pic.twitter.com/WdJ8YTY0Sk
— Alvaro Cintas (@dr_cintas) October 18, 2023

Play.HT audio demo

ElevenLabs Working on AI Music Generation

Shifting gears to AI audio generation, ElevenLabs, the maker of highly realistic voice synthesis models, teased some of their in-progress AI music generation research.

The samples sound quite realistic and advanced, with coherent lyrics and instrumental backing. ElevenLabs is known for its top-quality AI voices, so music from them could set a new bar.

Exciting times are ahead for AI-generated music and other audio! We’ve come a long way already, but there’s still much progress to be made.

Welcome to the era of synthetic music! 🎧

One of the finest machines I've built at @elevenlabsio. Enjoy! pic.twitter.com/rMKNVGWCwU
— Flavio Schneider (@flavioschneide) October 16, 2023

ElevenLabs AI music samples

Refusion’s AI Music Generator

Speaking of progress in AI music, startup Refusion launched an AI song-generator web app that creates short tunes from lyrics.

While not as advanced as leaders like Sonantic yet, Refusion’s accessibility through a simple web interface lowers the barrier for everyday users to get generated music.

As these models train on more data and techniques improve, expect more startups to offer creative musical AI products without needing expertise. The tech is becoming democratized.

The image shows the Riffusion app, which is a music creation app that uses AI to generate original music.

Idea2Img: Enhancing Text-to-Image Models

Let’s wrap up with some AI art generation news. You may recall Idea2Img – a technique using GPT-4 Vision to enhance stable diffusion image generation.

By iteratively refining text prompts based on visual understanding, Idea to Image massively improves coherence, text legibility, style control, and more.

This demonstrates the power of combining the strengths of visual and language AI systems. Idea2Img produces shockingly good results, unlocking new creative potential.

Expect these multimodal methods to become common for maximizing performance across all generative AI domains.

Conclusion

And that’s a wrap on all the new AI stuff going on lately! Pretty amazing how quickly these AI software are getting smarter, huh?

We saw how chatbots are getting faster at talking and even understanding photos now. Plus scientists are finding ways to make robot voices respond instantly, which is so cool. Oh, and don’t even get me started on the new apps that can generate music and art just by typing in words – that stuff blows my mind!

I hope you had as much fun as me learning about the latest AI tech and imagining all the cool things it might do in the future. Each day, these machines are getting closer and closer to thinking like actual people. Kinda scary, but mostly just fascinating!

Let me know which AI invention we talked about today is your favorite, or if you have any other thoughts on the future of AI. This stuff is advancing so fast, that even I have trouble keeping up!

Well, that’s all for now friends. Thanks for learning with me, and we’ll discuss again next time about the latest and greatest in artificial intelligence. The future is coming fast, that’s for sure!