• AI Valley
  • Posts
  • Meta's First Open-Source Multimodal AI

Meta's First Open-Source Multimodal AI

PLUS: AI avatars can now join Zoom calls

Together with

Howdy! It’s Barsee again.

Happy Monday, AI family, and welcome back to AI Valley.

In today’s edition:

  • 🤖 Meta's first open-source multimodal AI

  • 👥 AI avatars can now join Zoom calls

  • 🤖 Plus trending AI tools, posts, and resources

Ready, set, go…

TOGETHER WITH CLUESO

Tired of spending hours creating product demo videos and writing help articles for your customers?

Meet Clueso – the fastest way to create polished, studio-quality videos and step-by-step guides for your product in minutes.

Simply capture or upload a rough screen recording, and Clueso will handle everything: from writing the script and generating professional AI voiceovers to adding zoom effects, highlights, and your company’s branding.

Hundreds of fast-growing companies trust Clueso to create:

  • Product walkthrough videos and help articles

  • Demo videos for new feature launches

  • Internal training videos and SOPs

META

🤖 Meta's first open-source multimodal AI

Meta has introduced Spirit LM, its first open-source multimodal AI model that seamlessly combines text and speech, offering more expressive and natural-sounding voice interactions for various applications.

The model is available in two versions:

  • Base Version: It focuses on basic speech conversion tasks, making it ideal for general speech recognition and generation without emotional nuance.

  • Expressive Version: It goes further by capturing emotions in speech, generating audio that conveys feelings like happiness, anger, or excitement, and allowing for emotionally rich outputs by adjusting the tone and style.

What makes it unique?

  • Unlike traditional AI voice tools that convert speech to text before processing, Spirit LM uses phonetic, pitch, and tone tokens to retain the expressive elements of speech.

  • It can perform complex tasks such as automatic speech recognition, text-to-speech, and speech classification without requiring large amounts of data.

Are there any other similar models?

Similar to Spirit LM, Google recently launched NotebookLM, which can convert any text into a podcast.

AI AVATAR

👥 AI avatars can now join Zoom calls

Source: HeyGen

On Thursday, HeyGen unveiled an exciting new feature called Interactive Avatars. These avatars can represent you in live settings like Zoom meetings and other online interactions.

What are it‘s capabilities?

  • Join multiple Zoom meetings simultaneously.

  • Mimic your appearance, voice, and decision-making style.

  • Provide meeting summaries and recordings after the sessions.

What are it’s use cases?

  • Customer support.

  • Online coaching and training.

  • Sales calls and job interviews.

  • Language learning assistance.

  • Therapy sessions.

The system integrates with Google Calendar, allowing users to preview upcoming Zoom meetings and assign avatars to attend specific sessions. After each meeting, the avatar provides a summary and recording of what transpired.

Is it available now?

While the avatars show great promise, the product is still in its beta phase.

QUICK HITS

  • OpenAI launched its ChatGPT app for Windows. It allows desktop users to interact with emails, files, and on-screen content. (link)

  • OpenAI, and Microsoft reportedly hire banks to renegotiate partnership terms as OpenAI plans to restructure to reincorporate its for-profit arm. (link)

  • San Francisco techies are loving Self-Driving Cars. Waymo is currently serving over 100,000 paid rides a week although rides generally cost slightly more than Uber or Lyft. (link)

  • Chinese startup Robot Era has introduced the STAR1, claimed to be the world's fastest humanoid robot, which can run just over 8 miles per hour. (link)

  • Apple internally believes that it’s at least two years behind in AI development. (link)

TRENDING TOOLS

  • Danelfin > AI-powered stock picking. Bringing AI technology once exclusive to hedge funds and elite investors to everyone. (link)

  • Suno Scenes > App that turns photos and videos into unique songs. (link)

  • Finic > Provides web browser infrastructure for bots, scrapers, automations, and AI agents. (link)

  • Browser Copilot > An AI companion that understands the context of your work on any website. (link)

  • AI Desk > AI-powered Customer Service on your website. (link)

COOL FINDINGS / RESOURCES

  • AI Business idea: Build Virtual Try-On using Whatsapp bot. Here’s how. (link)

  • Prompt engineering methods that reduce hallucinations. (link)

  • Upgrade Google sheets with the ChatGPT API. (link)

  • How good is GPT-4 now with data visualization generation? (link)

  • Human drivers are to blame for most serious Waymo collisions? (link)

  • 4 lessons learned from 2 years of AI-assisted coding. (link)

DAILY DOSE OF CONTENTS

1/ AI is already creating it's own communities, and creating their own memecoins.

What’s going on?

  • Two AI’s made meme

  • Turned it into crypto

  • Promoted it

  • It’s now worth $150mil

  • AI comes out with $300,000

  • And it’s getting richer, more people keep sending it tokens hoping to make it go viral.

2/ The OpenAI team finally reveals the best OpenAI o1 use cases.

3/ All time hilarious day. I genuinely thought this was AI.

4/ The real reason Tesla is developing a Robovan.

THAT’S ALL FOR TODAY

That’s all for today’s issue, folks.

💡 Help me get better and suggest new ideas at [email protected] or @heyBarsee

👍️ Like what you see? Subscribe here

Thanks for being here.

REACH 100K+ READERS

Acquire new customers and drive revenue by partnering with us

Sponsor AI Valley and reach over 100,000+ entrepreneurs, founders, software engineers, investors, etc.

If you’re interested in sponsoring us, email [email protected] with the subject “AI Valley Ads”.