Alibaba Unveils Wan 2.1

PLUS: DeepSeek is rushing to get its next-gen R2 model out

Together with

Howdy. It’s Barsee again.

Happy Wednesday, AI family, and welcome back to AI Valley.

In today’s edition:

  • Alibaba’s Wan2.1 beats OpenAI Sora

  • DeepSeek is rushing to get its next-gen R2 model out

  • Google launches a free AI coding assistant with very high usage caps

  • Plus trending AI tools, posts, and resources

Ready, set, go…

WAN 2.1

Alibaba’s Wan2.1 beats OpenAI Sora

Image Source: Alibaba

Alibaba has just released Wan2.1, a family of open-source AI video models designed to generate high-quality images and videos from text and image inputs.

Here's what you need to know:

  • The model is available in two versions: T2V-1.3B, a lightweight model with 1.3 billion parameter, that can run easily on Laptops with just 8.19GB of video memory, and a more powerful T2V-14B version with 14 billion parameters.

  • For image-to-video generation, it offers two resolution options: I2V-14B-720P for 720p videos and I2V-14B-480P for 480p videos.

  • The model excels at simulating complex body movements and tracking spatial-temporal details, which means it creates smooth, natural motion sequences while enhancing image quality and adhering to physical laws.

  • Wan2.1 ranks first on the VBench leaderboard with a score of 86.22%, outperforming OpenAI's Sora model, which scored 84.28%, showcasing its superior video generation capabilities.

  • It is also the first video model capable of generating multilingual text effects in both Chinese and English, making it versatile for a global audience.

  • The model is available across multiple platforms, including Alibaba Cloud’s AI Model Community, ModelScope, and Hugging Face.

Why it matters: 

By making Wan2.1 open-source, Alibaba lets developers, researchers, and creators tap into advanced video generation technology without the usual cost or proprietary hurdles, fostering innovation and collaboration that can drive groundbreaking applications.

TOGETHER WITH FLOW AI

3x faster than typing using AI

Image Source: Flow AI

Meet Wispr Flow: the dictation tool that sounds just like a more polished version of yourself, minus the "ums" and "uhs." Whether it's an email, doc, or a prompt, Flow transcribes your words instantly. It's 3x faster than typing and automatically cleans up mistakes. That's time back in your day and fewer embarrassing typos.

SIDE UPDATES

Image Source: Inc

DeepSeek is accelerating the release of its R2 model, aiming to build on the momentum of its R1 reasoning model. Originally slated for an early May launch, the company now wants to roll it out even sooner. The R2 model is expected to feature stronger coding capabilities and support for multiple languages beyond English. With GPT-4.5 set to launch in a few weeks, DeepSeek is racing to stay ahead in the AI arms race.

Chegg has filed a lawsuit against Google, alleging the tech giant's AI-generated search summaries are harming its web traffic and revenue. Chegg claims that Google is using its market power to force companies to let its content be used in these summaries, effectively profiting from their material without proper compensation. Google, however, says its AI Overviews actually enhance traffic to many websites and plans to defend itself in court.

Microsoft has announced that all Copilot users now have free, unlimited access to the Voice and Think Deeper features. Voice allows users to interact with the AI using verbal commands, while Think Deeper handles more complex queries with advanced reasoning powered by OpenAI's o1 model. However, Microsoft warns that free users may experience delays during peak times or if security concerns arise.

Researchers at Columbia Engineering have developed a robot that learns new motions and adapts to damage by observing itself, a process they call “kinematic self-awareness." It is designed to mimic how humans learn by watching themselves in a mirror and building a 'self-model' without relying on pre-programmed simulations. This approach enables faster adaptation to real-world challenges, making robotic learning more efficient and flexible.

ElevenLabs is now allowing authors to publish AI-generated audiobooks on its Reader app, following a recent partnership with Spotify for AI-narrated audiobooks. The company aims to compete with Audible by offering better royalty rates and a dedicated marketplace for indie creators. The platform offers authors a payout of roughly $1.10 per listeners engagement for 11 minutes or more. Currently, it is available for U.S. authors with English-only titles, with plans to expand to 32 languages.

Convergence AI has introduced Proxy Lite, a 3B parameter Vision-Language Model built on the Qwen2.5-VL-3B-Instruct foundation. The model excels in UI navigation and completing automated tasks in a web browser without heavy computational demands. It scored an impressive 72.4% on the WebVoyager benchmark, outperforming many general-purpose open-source models not optimized for web automation.

GOOGLE GEMINI

Google launches a free AI coding assistant with very high usage caps

Image Source: Google

Google has launched a free version of Gemini Code Assist for individual developers, intensifying competition with Microsoft and its subsidiary GitHub in the developer tools arena.

Here's what you need to know:

  • Gemini Code Assist is built on a fine-tuned version of Google's Gemini 2.0 model, optimized for coding tasks. It allows developers to fix bugs, complete code, and get logical explanations for complex sections through natural language interaction.

  • The tool integrates with popular development environments like VS Code and JetBrains and supports multiple programming languages.

  • Gemini Code Assist for Individuals offers significantly more functionality and higher usage limits than GitHub Copilot’s free version. It provides up to 180,000 code completions per month—90 times more than Copilot’s 2,000.

  • With a 128K token context window, the assistant can handle much larger codebases than competitors. This allows it to process more code in a single prompt and better analyze complex projects.

  • The company also introduced Gemini Code Assist for GitHub, a code review “agent” designed to automatically look for bugs in code and offer suggestions directly within GitHub.

Why it matters: 

By offering a powerful free tool, Google is directly challenging GitHub Copilot and expanding AI-assisted coding options for developers. As AI-powered code assistants become more widespread, competition in the developer tools market will intensify, and it remains to be seen whether Gemini Code Assist can set a new industry standard.

TRENDING TOOLS

  • DeepTutor > Paper reading assistant with deeper understanding.

  • Fabi Analyst Agent > Build, deploy & share specialized AI data analyst agents.

  • Phind > AI search engine with visual answers & agentic capabilities.

THINK PIECES / RESOURCES

CONTENT CORNER

1/ Claude 3.7 is more significant than its name implies (ft DeepSeek R2 + GPT 4.5 coming soon).

2/ Marketing with AI is 100x efficient/cheaper if done well in 2025.

3/ I hope this works.

4/ So much power at our fingertips.

THAT’S ALL FOR TODAY

That’s all for today’s issue, folks.

💡 Help me get better and suggest new ideas at [email protected] or @heyBarsee

👍️ Like what you see? Subscribe here

Thanks for being here.

REACH 100K+ READERS

Acquire new customers and drive revenue by partnering with us

Sponsor AI Valley and reach over 100,000+ entrepreneurs, founders, software engineers, investors, etc.

If you’re interested in sponsoring us, email [email protected] with the subject “AI Valley Ads”.