TALK [MERL Seminar Series 2026] Jialong Wu presents talk titled World Models and Human-like Reasoning

Date released: March 25, 2026

TALK [MERL Seminar Series 2026] Jialong Wu presents talk titled World Models and Human-like Reasoning
(Learn more about the MERL Seminar Series.)
Date & Time:

Wednesday, March 25, 2026; 11:00 AM
Abstract:

This talk introduces the background and key findings of our recent work, "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models," which answers the question of when and how visual generation enabled by unified multimodal models (UMMs) benefits reasoning. We take a world model perspective, inspired by human cognition. Specifically, humans construct mental models of the world, representing information and knowledge through two complementary channels—verbal and visual—to support reasoning, planning, and decision-making. In contrast, recent advances in large language models (LLMs) and vision–language models (VLMs) largely rely on verbal chain-of-thought reasoning, leveraging primarily symbolic and linguistic world knowledge. Unified multimodal models (UMMs) open a new paradigm by using visual generation for visual world modeling, advancing more human-like reasoning on tasks grounded in the physical world. In this work, we formalize the atomic capabilities of world models and world model-based chain-of-thought reasoning. We highlight the richer informativeness and complementary prior knowledge afforded by visual world modeling, leading to our visual superiority hypothesis for tasks grounded in the physical world. We identify and design tasks that necessitate interleaved visual-verbal CoT reasoning, constructing a new evaluation suite, VisWorld-Eval. Through controlled experiments on BAGEL, we show that interleaved CoT significantly outperforms purely verbal CoT on tasks that favor visual world modeling, strongly supporting our insights.

[MERL Seminar Series Spring 2026] World Models and Human-like Reasoning

Speaker:

Jialong Wu
Tsinghua University
Jialong Wu is a fourth-year Ph.D. student at Tsinghua University, advised by Prof. Mingsheng Long, and is currently a research intern at ByteDance Seed. He obtained his bachelor’s degrees in Software Engineering and Pure and Applied Mathematics (second major) from Tsinghua University. His research aims to develop fundamental techniques toward general autonomous intelligence, with a primary focus on world models and broader interests in scalable reasoning and planning with learned priors.
MERL Host:

Anoop Cherian
Research Areas:

Artificial Intelligence, Computer Vision, Machine Learning

Date & Time:

Abstract:

Speaker:

MERL Host:

Research Areas: