Yiwen Song | Research Scientist

About Me

I am currently a Senior Research Scientist at Google Cloud AI Research. My research focuses on Large Language Models, particularly at the intersection of multimodality, generative AI, and agentic systems. I am passionate about advancing how models reason, plan, and interact with complex, real-world environments.

Prior to joining Google, I spent four years at Meta conducting research and building training frameworks in computer vision domain. I was a core contributor to Llama 3 and served as a maintainer for the TorchVision library.

News

[2026-07] Excited to give a demo for Co-Director at the Google booth & kiosk at ICML 2026.

[2026-07] Two papers accepted to COLM 2026: "Co-Director: Agentic Generative Video Storytelling" and "PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing".

[2026-05] Two papers accepted to ICML 2026: "TFRBench: A Reasoning Benchmark for Evaluating Forecasting Systems" and "The ACUTE Protocol: Operationalizing Language Model Activations for Better Calibration, Utility, and Trust".

[2026-04] Gave a talk about PaperOrchestra at the BAAI online seminar.

[2026-04] Released two new preprints: "CANVAS: Continuity-Aware Narratives via Visual Agentic Storyboarding" and "Co-Director: Agentic Generative Video Storytelling".

[2026-04] Released a new preprint: "PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing".

[2026-03] Released a new preprint: "VQQA: An Agentic Approach for Video Evaluation and Quality Improvement".

[2026-03] Our paper "Watch and Learn: Learning to Use Computers from Online Videos" was accepted to CVPR 2026.

[2026-02] Our paper "Synapse: Adaptive Arbitration of Complementary Expertise in Time Series Foundational Models" was accepted to TMLR.

[2026-01] Released a new preprint: "ScholarPeer: A Context-Aware Multi-Agent Framework for Automated Peer Review."

[2025-07] Released the technical report for Gemini 2.5.

[2025-04] Released the technical report for ShieldGemma2.

Selected Publications

The ACUTE Protocol: Operationalizing Language Model Activations for Better Calibration, Utility, and Trust

Nishant Subramani, Palash Goyal, Yiwen Song, Mani Malek, Yuan Xue, Tomas Pfister, Hamid Palangi

ICML (2026)

Paper

Language Models Calibration Trust

TFRBench: A Reasoning Benchmark for Evaluating Forecasting Systems

Md Atik Ahamed, Mihir Parmar, Palash Goyal, Yiwen Song, Long T. Le, Qiang Cheng, Chun-Liang Li, Hamid Palangi, Jinsung Yoon, Tomas Pfister

ICML (2026)

Paper Project Page

Time Series Forecasting Benchmarking LLM Reasoning

Co-Director: Agentic Generative Video Storytelling

Yale Song, Yiwen Song, Nick Losier, Nathan Hodson, Ye Jin, Rhyard Zhu, Yan Xu, Daniel Vlasic, Carina Claassen, Jasmine Leon, Khanh G. LeViet, Zack Chomyn, Joe Timmons, Brett Slatkin, Scott Penberthy, Tomas Pfister

COLM (2026)

Paper Project Page Code

Video Generation Agentic Storytelling

CANVAS: Continuity-Aware Narratives via Visual Agentic Storyboarding

Ishani Mondal, Yiwen Song, Mihir Parmar, Palash Goyal, Jordan Boyd-Graber, Tomas Pfister, Yale Song

arXiv (2026)

Paper

Multimodal Generation Continuity-Aware Narratives

PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing

Yiwen Song, Yale Song, Tomas Pfister, Jinsung Yoon

COLM (2026)

Paper Project Page Code

AI Paper Writing Multi-agent System

VQQA: An Agentic Approach for Video Evaluation and Quality Improvement

Yiwen Song, Tomas Pfister, Yale Song

arXiv (2026)

Paper Project Page

Video Evaluation Quality Improvement Prompt Optimization

Watch and Learn: Learning to Use Computers from Online Videos

Chan Hee Song, Yiwen Song, Palash Goyal, Yu Su, Oriana Riva, Hamid Palangi, Tomas Pfister

CVPR (2026)

Paper Project Page Media Coverage

Computer Use Agents Inverse Dynamics Model

Synapse: Adaptive Arbitration of Complementary Expertise in Time Series Foundational Models

Sarkar Snigdha Sarathi Das, Palash Goyal, Mihir Parmar, Yiwen Song, Long T Le, Lesly Miculicich, Jinsung Yoon, Rui Zhang, Hamid Palangi, Tomas Pfister

TMLR (2026)

Paper

Time Series Forecasting Foundation Models

Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities

Gemini Team (G Comanici, E Bieber... Yiwen Song, et al.)

arXiv (2025)

Paper Blog

Frontier Models Multimodality

Plan-tuning: Post-training language models to learn step-by-step planning for complex problem solving

Mihir Parmar, Palash Goyal, Xin Liu, Yiwen Song, Mingyang Ling, Chitta Baral, Hamid Palangi, Tomas Pfister

EMNLP Main (2025)

Paper Code