Yiwen Song

About Me

I am currently a Senior Research Scientist at Google Cloud AI Research. My research focuses on Large Language Models, particularly at the intersection of multimodality, generative AI, and agentic systems. I am passionate about advancing how models reason, plan, and interact with complex, real-world environments.

Prior to joining Google, I spent four years at Meta conducting research and building training frameworks in computer vision domain. I was a core contributor to Llama 3 and served as a maintainer for the TorchVision library.

News

[2026-03] Our paper "Watch and Learn: Learning to Use Computers from Online Videos" was accepted to CVPR 2026!
[2026-02] Our paper "Synapse: Adaptive Arbitration of Complementary Expertise in Time Series Foundational Models" was accepted to TMLR.
[2026-01] Released a new preprint: "ScholarPeer: A Context-Aware Multi-Agent Framework for Automated Peer Review."
[2025-07] Released the technical report for Gemini 2.5.
[2025-04] Released the technical report for ShieldGemma2.

Selected Publications

Watch and Learn: Learning to Use Computers from Online Videos
Chan Hee Song, Yiwen Song, Palash Goyal, Yu Su, Oriana Riva, Hamid Palangi, Tomas Pfister
CVPR (2026)
Computer Use Agents Inverse Dynamics Model
Synapse: Adaptive Arbitration of Complementary Expertise in Time Series Foundational Models
Sarkar Snigdha Sarathi Das, Palash Goyal, Mihir Parmar, Yiwen Song, Long T Le, Lesly Miculicich, Jinsung Yoon, Rui Zhang, Hamid Palangi, Tomas Pfister
TMLR (2026)
Time Series Forecasting Foundation Models
Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities
Gemini Team (G Comanici, E Bieber... Yiwen Song, et al.)
arXiv (2025)
Frontier Models Multimodality
Plan-tuning: Post-training language models to learn step-by-step planning for complex problem solving
Mihir Parmar, Palash Goyal, Xin Liu, Yiwen Song, Mingyang Ling, Chitta Baral, Hamid Palangi, Tomas Pfister
EMNLP Main (2025)
NLP LLMs
HEART: Emotionally-driven test-time scaling of Language Models
Gabriela Pinto, Palash Goyal, Mihir Parmar, Yiwen Song, Souradip Chakraborty, Zifeng Wang, Jingsun Yoon, Hamid Palangi, Tomas Pfister
arXiv (2025)
LLM Reasoning Test-Time Scaling
Llm-based multi-agent blackboard system for information discovery in data science
Alireza Salemi, Mihir Parmar, Palash Goyal, Yiwen Song, Jinsung Yoon, Hamed Zamani, Tomas Pfister, Hamid Palangi
arXiv (2025)
Multi-Agent Systems Data Discovery
The Llama 3 Herd of Models
Llama Team (A Dubey, A Jauhri... Yiwen Song, et al.)
arXiv (2024)
Foundation Models