- AI Valley
- Posts
- OpenAI has introduced a next-gen Voice Engine
OpenAI has introduced a next-gen Voice Engine
PLUS: Anthropic adds web search to Claude
Together with
Howdy again. It’s Barsee, and welcome back to AI Valley.
Another day, another AI adventure.
Today’s climb through the Valley reveals:
OpenAI has introduced a next-gen Voice Engine
Anthropic adds web search to Claude
Plus trending AI tools, posts, and resources
Let’s dive into the Valley of AI…
PEAK OF THE DAY
OpenAI has introduced a next-gen Voice Engine
OpenAI has introduced a next-gen Voice Engine capable of generating realistic, emotive speech from just a 15-second audio sample.
Here's what you need to know:
OpenAI introduced two speech-to-text models, gpt-4o-transcribe and gpt-4o-mini-transcribe, which are especially good at handling accents, background noise, and different speech speeds than its previous voice models.
Both the models deliver state-of-the-art accuracy with lower word error rates (WER) across multiple languages. In the FLEURS multilingual speech benchmark, which tests transcription accuracy in 100+ languages, they outperformed OpenAI’s existing Whisper models and other competitors.
They also launched gpt-4o-mini-tts, a new text-to-speech model with a feature called "steerability," allowing developers to control not just what is said but how it is delivered. This allows for more expressive and adaptable voice outputs by specifying emotional tones and delivery styles.
The company has launched openai.fm as an interactive demo site where developers can experiment with different voice variations in real time.
These models come at competitive rates: $0.6/min for gpt-4o-transcribe, $0.3/min for gpt-4o-mini-transcribe, and $0.015/min for gpt-4o-mini-tts.
OpenAI has also revamped its Agents SDK, making it easier to transform text-based AI agents into voice agents with minimal coding. This update helps developers integrate real-time voice interactions into their existing applications.
The models are available via OpenAI’s API, with real-time speech-to-speech capabilities supported through OpenAI’s Realtime API.
Why it matters:
OpenAI’s latest audio models bring AI voice interactions closer to natural human conversation, making them more effective for real-world applications like customer support and language learning. By enabling greater customization and expressiveness, these advancements help developers build AI agents that communicate more intuitively and adapt to different user needs.
The Smart Home disruptor with 200% growth..
Could RYSE be the next Ring?
Imagine skipping out on Ring before its $1.2B buyout by Amazon
That’s exactly what happened to all the sharks, including Kevin O’Leary, missing a 66,756% return..
By the time we hear about industry-changing companies, it’s usually too late. But right now, there’s a smart home startup making their way to homes in America.
This tech startup is RYSE, and unlike Ring, you can still invest early—before it takes off.
Like how Ring disrupted home security, this company is revolutionizing smart blinds & shades—bringing automation to every home and business without the need for expensive replacements.
Shares are now available at $1.90 each in their newly launched funding round.
*This is sponsored
VALLEY VIEW
OpenAI has introduced high-performing o1-pro model via its developer API. It features a 200k context window and supports features like image inputs, function calling, and structured outputs. However, access is currently limited to select developers in tiers 1-5, and pricing is steep at $150/million input tokens and $600/million output tokens, making it significantly pricier than previous models.
Anthropic has introduced web search functionality to its Claude chatbot, enabling it to access up-to-date information online. The feature is exclusive to its latest Claude 3.7 Sonnet model and includes built-in citations for easier source verification. Currently, it's available in preview mode for paid users in the U.S., with plans to expand to free users and other regions soon.
Microsoft has partnered with Swiss AI startup Inait to build AI models inspired by how mammalian brains reason. Backed by over 20 years of neuroscience research, the partnership aims to develop energy-efficient AI that learns from real experiences rather than just data. These "digital brains" promise faster learning and adaptability, with potential applications in finance and robotics for dynamic environments.
Apple is restructuring its AI leadership to accelerate Siri’s development, appointing Mike Rockwell, the creator of the Vision Pro, to lead the virtual assistant. This shift follows reports that CEO Tim Cook lost confidence in John Giannandrea, who previously oversaw Siri. While Rockwell takes over, Giannandrea will remain focused on AI research. The shake-up comes amid delays in Siri’s upgrades, with some features postponed until 2026.
Over 400 Hollywood actors, directors, writers, and musicians have signed an open letter urging the U.S. government to enforce copyright protections against AI companies like OpenAI and Google. The letter warns that allowing AI to train on copyrighted works without permission could harm the entertainment industry, which supports over 2.3 million jobs and generates $229 billion in wages annually. Prominent figures, including Ben Stiller and Mark Ruffalo, are calling for AI companies to negotiate licenses instead of relying on "fair use" to exploit creative content.
TRENDING TOOLS
Soul machines > Create digital humans for interactive customer engagement.
Skyvern 2.0 > Automate web tasks using AI vision and language models.
Credle > Build secure agents that collaborate and take action across tools.
Gemini Canvas > Write, code, create – all in one interactive space.
Tencent Hunyuan 3D-2 > Transforms 2D pictures into highly detailed 3D models that can be easily manipulated.
THINK PIECES / BRAIN BOOST
Here’s how I use LLMs to help me write code - Simon Willison.
Why AI will never replace human code review.
OpenAI’s Sam Altman on building consumer tech company.
Economic model predicts trillions of dollars of investment into AI.
Study finds AI-generated meme captions funnier than human ones on average.
Let's talk about AI and the games industry.
VALLEY GEMS
1/ A video of Claude using Ableton (music production software) to create music. He can create a pretty decent piece of music with just two instructions.
🎵💿Built an MCP that lets Claude talk directly to Ableton. Now you can create music with just prompts!
Here’s a demo of me creating a lush, 80s synthwave track in just two prompts. It picks the right instruments, creates melodies, and adds effects like reverb and distortion 🔊
— siddharth ahuja (@sidahuj)
1:50 PM • Mar 20, 2025
2/ Who said LLMs were running out of information.
Book scanning robot preparing food for his LLM brethren
— Justine Moore (@venturetwins)
4:35 AM • Mar 21, 2025
3/ Imagine this being a swarm and fleet of AI agents helping people, brands and companies do marketing.
Builders sleep on distribution.
At THE QUEST, we run a farm of 60 TikTok accounts to push viral short and drive maximum growth.
This is legit vibe marketing
— Julian (@julianivaldy)
3:58 PM • Mar 20, 2025
4/ A new model that encodes text prompts (e.g. detect windows) and point clouds projected to an LLM.
SpatialLM is very interesting
it's a new model that encodes text prompts (e.g. detect windows) and point clouds projected to an LLM 🤯
the LLM outputs 3D bounding boxes, very simple yet effective approach
two models based on Llama and Qwen-0.5B and Llama-1B on @huggingface
— merve (@mervenoyann)
11:34 AM • Mar 21, 2025
SUNSET IN THE VALLEY
Thank you for reading today’s edition. That’s all for today’s issue.

💡 Help me get better and suggest new ideas at [email protected] or @heyBarsee
👍️ New reader? Subscribe here
Thanks for being here.
HOW WAS TODAY'S NEWSLETTER |
REACH 100K+ READERS
Acquire new customers and drive revenue by partnering with us
Sponsor AI Valley and reach over 100,000+ entrepreneurs, founders, software engineers, investors, etc.
If you’re interested in sponsoring us, email [email protected] with the subject “AI Valley Ads”.